Saturday, June 20, 2015

Using badblocks to prepare an offsite USB backup drive

Part of my backup strategy is to write my backups to external USB drives which are protected by LUKS encryption.  However, before I will put a drive into service, I like to heavily test any mechanical drive for a few days to see whether it will hold up to the wear-and-tear of being a portable drive.

(There's little or no point in doing this on a SSD.)

Currently, my preferred method is to use "badblocks" in destructive write-testing mode to test the drive.  For example:

# badblocks -p 3 -wsv -t random /dev/disk/by-id/usb-SAMSUNG_HM502JX_C######-0\:0

The "-p 3" tells badblocks that the drive has to survive (3) passes without finding any new bad sectors before badblocks will stop running.  Most modern mechanical hard drives have spare sectors that can be used when a bad spot is located on the surface.  By repeatedly writing to bad or dying parts of the drive surface, we can force the drive's firmware to remap those failing areas to the spare sectors.

The downside of "-p 3" is that this increases the amount of time needed to test the drive before placing it into service.  A rough estimate is that a 1TB drive over USB 2.0 will require 3-4 days of testing with "-p 3".  If you are using a USB 3.0 drive and it is hooked up to a USB 3.0 port, then it might only take 20-30 hours to test.

The "-wsv" tells badblocks to do write-testing (which destroys all data on the drive), as well as giving status output and being verbose about what it is doing.

The "-t random" specifies that we want to use a random pattern for the test.  Please note that this is not a suitable replacement for "shred" when wiping a disk or preparing it for LUKS.  You should still run "shred" on the drive prior to using it (or giving it away to someone else).

Drives that have started to fail will often sound like they are seeking with big pauses during the badblocks write pass.  If you are seeing big pauses in reads/writes from a drive during testing, it's possible that the disk is damaged or about to permanently fail.  You will have to use your best judgement whether you trust the disk for backups.

(If I think a drive is failing, then it gets a second or third pass with badblocks and the "-p 3" option.  It will usually die during the 2nd or 3rd pass, or all of the bad spots will have been remapped and successive runs will go quickly.)

No comments: