Hard drive data recovery


Here are notes from a recent project to recover data from a failing hard drive. The exact recovery procedure will depend upon the type, severity, and location of the error. These notes may provide some hints to an alternative to high fees charged by the local repair store. Proceed at your own risk. Only software recovery is discussed here, but hardware recovery is also possible.

Important message

Computers store far too much personal data. Data recovery and data forensics are related in techniques, but may deviate in goal. The job of data recovery ends with the actual recovery. When doing data recovery for a client, be respectful of their trust and privacy.

Image creation

When a hard drive is failing, any additional accesses may degrade the drive further. Therefore, the first step is to make an image of the drive. Then, make a copy of the drive image. Always operate as if we'll only have a single chance to make a drive image. Even if you are not able to complete the recovery, if an image is made successfully, the data is likely there, and you may want to save the image and return to it some day to reattempt the recovery.

If performing a recovery of a USB drive, consider removing it from from its chassis. The drive is likely uses standard SATA connectors and can be directly attached to a SATA port. With a direct connection, you eliminate the USB adaptor as a potential source of problems and obtain the maximum performance of the drive.

All tools mentioned from this point forward are Linux oriented.

Image creation: dd

To create the disk image, one could use something like dd if=/dev/sdb of=image bs=512 conv=sync,noerror iflag=direct, which copies if=/dev/sdb to a file called image. The rest of the options are the interesting part. If dd encounters an error reading, it'll stop the copy process. To stop that we add conv=noerror. However, with only that option, a partial block could be returned, which would cause subsequent data to be offset. To write complete blocks in the case of a read error we add conv=sync, which will pad incomplete blocks.

The blocks we have discussed are those used in dd, whose size can be controlled. For performance dd is likely to read a block that is a multiple of the device's block. With the above options, if dd encounters an error, on a drive block, while filling its block, it'll pad the remainder, skipping remaining device blocks that possibly could have been read successfully. To avoid this and capture as much information as possible, we force dd to use a block size equal to the device's with bs=512. Likewise, the OS is likely to read ahead, as an optimization. However, the read ahead may not be desirable, as the OS will read enough to fill its own buffer, which will be larger than a device block. To disable the read ahead we enable direct I/O on reads with the iflag=direct.

Image creation: Beyond dd

The dd approaches should work to create an image, and while the above options are conservative when creating the image, they also result in slow image creation. For performance, we would rather read larger blocks from the device, failing back to small block sizes only when read errors are encountered. This is one of the optimizations available in dd_rescue and [gnu]ddrescue. Yes, the only difference in their names is the underscore, and to be more confusing, some Linux distributions may have renamed them. Both of these utilities were receiving active development in 2014, and if still true, both should be considered.

An interesting example in the dd_rescue manual shows how to rewrite each block, encouraging the hard drive to reallocate failing sectors. While not best for disaster recovery, perhaps it would be useful as a preventive step. Similar to RAID scrubbing.

GNU ddrescue is called as simple as ddrescue /dev/sdb image image.log. The use of an optional log file allows for additional improvements to the image, and one should review the manual. Below is an example of the output of ddrescue and its log file. The strcture of the log file is described in the manual, and there is a separate viewer, though it may be difficult to install. The read rate of ddrescue is much faster than the error tolerate dd command.


[root@linux-1 kent]# ddrescue /dev/sdb image image.log


GNU ddrescue 1.17
Press Ctrl-C to interrupt
rescued:    10472 MB,  errsize:   32768 B,  current rate:   48234 kB/s
ipos:    10472 MB,   errors:       1,    average rate:   18903 kB/s
opos:    10472 MB,    time since last successful read:       0 s
Copying non-tried blocks...



[root@linux-1 kent]# cat image.log
# Rescue Logfile. Created by GNU ddrescue version 1.17
# Command line: ddrescue /dev/sdb image image.log
# current_pos  current_status
0x2DF530000     ?
#      pos        size  status
0x00000000  0x167D08000  +
0x167D08000  0x00008000  *
0x167D10000  0x177830000  +
0x2DF540000  0x5A47CD6000  ?


A more interesting example the ddrescue is after the drive being imaged stopped responding and had to be reattached to be visible. This example shows the use of the log file to continue the image creation of a 400GB drive following the interruption halfway to completion. Because the interruption occurred halfway, the remainder of the drive was marked as bad in the log file, hence the large errsize field. As ddrescue continues to run, and retry blocks that were previously unreadable, the errsize field will hopefully decrease, though the errors may increase. The re-read process could be performed as many times as desired, simply be restarting ddrescue with the same arguments including the log file.


[root@linux-1 kent]# ddrescue /dev/sdb image image.log


GNU ddrescue 1.17
Press Ctrl-C to interrupt
Initial status (read from logfile)
rescued:   168227 MB,  errsize:    231 GB,  errors:      50
Current status
rescued:   171142 MB,  errsize:    228 GB,  current rate:   63994 kB/s
ipos:   171279 MB,   errors:      62,    average rate:   35986 kB/s
opos:   171279 MB,    time since last successful read:       0 s
Splitting failed blocks...


Because ddrescue may use sparse files, the image length from ls may be misleading. Instead use du as in the following example.


[root@linux-1 kent]# ls -l image
-rw-r--r--. 1 root root 400088456704 Aug 25 02:01 image
[root@linux-1 kent]# du --apparent-size image
390711384 image
[root@linux-1 kent]# du image
199009632 image


Recovery

Once the image is made, make a copy of it.

Any recovery is going to take time, a lot of time. Depending on the severity of the error, there may be several attempts that simply take the wrong path. With each attempt you should recopy the original disk image. The copy process also slow, and its pain is proportional to the drive size. We all love big drives... until they crash and burn.

One should review the partition table with fdisk image to learn important offsets, partition types, if the partition table is in good shape, etc. If needed the partition table can be updated and written. If needed, to enable fdisk to create partitions at sectors before 2048, add the -c=dos option.

One could attempt to mount the image read-only, such as with mount image /mnt -o ro,loop,offset=32256, where this image was an older FAT32 partition where the first partition starts at sector 63. Since we are mounting a partition of a drive image, not a partition image, we need the offset option.

But the easiest thing may to use TestDisk to automate sophisticated recover techniques. Also, from the same folks is PhotoRec for file specific recoveries. See these links for much more detailed information.

What you are hoping to see in TestDisk is either a correction of an easy problem, such as the main boot sector is bad but the backup is good. In the case of a more complicated problem, with some finesse, you hope to be presented with a list of files and directories, which can then be traversed and copied out of the image. In this particular case, I enabled the expert mode, and when rebuilding the boot sector, I was presented with several possibilities for the root clusters, each with a score. After selecting the 1st and 2nd root clusters, the directory list appeared and the situation appeared hopeful.


TestDisk 6.14, Data Recovery Utility, July 2013
Christophe GRENIER 
http://www.cgsecurity.org
1 P FAT32 LBA                0   1  1 48640 254 63  781417602
Directory /
Copying, please wait... 64440 ok, 9 failed
drwxr-xr-x     0     0         0 15-May-2014 17:06 New folder
drwxr-xr-x     0     0         0 15-May-2014 17:06 _G_2014
-rwxr-xr-x     0     0      1132 10-Mar-2012 15:40 Wende.csv
-rwxr-xr-x     0     0      3072 21-Sep-2012 17:21 DG1__DS_VOL_HDR
drwxr-xr-x     0     0         0  6-Mar-2009 07:18 HeadNode
drwxr-xr-x     0     0         0  6-Mar-2009 07:18 System Volume Information
drwxr-xr-x     0     0         0  6-Mar-2009 15:17 Pio Oth Stuff
>drwxr-xr-x     0     0         0  9-Mar-2009 11:15 Old Dell
drwxr-xr-x     0     0         0  9-Mar-2009 15:47 Recycled
-rwxr-xr-x     0     0      3072 21-Sep-2012 17:21 DG1__DS_DIR_HDR
-rwxr-xr-x     0     0        36 29-Dec-2012 12:17 SYNCGUID.DAT
drwxr-xr-x     0     0     32768 28-Feb-2008 04:04 _OUND.000
-rwxr-xr-x     0     0      1489 28-Mar-2013 07:44 Bad_sectors_3_27_13.txt
drwxr-xr-x     0     0     32768 19-Sep-2009 11:41 MSI28cdb.tmp
drwxr-xr-x     0     0     32768 12-Jan-2010 14:19 MSIfa9b9.tmp
Next
Use Right to change directory, h to hide deleted files
q to quit, : to select the current file, a to select all files
C to copy the selected files, c to copy the current file


In all, 365GB of the 400GB drive was recovered. TestDisk recovered deleted files, so some of the recovered data is junk, but recovering the files, sifting the files is the easiest part of the process.

Recovery desperation

You may have heard the folklore of placing a failing hard drive in the freezer to resurrect the drive. Part way through the imaging of the drive, the drive no longer responded. Reattaching the drive did not, and it seemed the drive was likely dead. Lots of errors such as shown in the first panel below. Having nothing to loose, the drive went into a plastic bag and into the freezer for a couple of hours. The drive out of the freezer, reattached, but still in the plastic bag to protect it from condensation, started to respond, with the message in the second panel. I mention this almost anecdotally, as an example of what someone might do in desperation, that obviously appeared to help, but I have no explanation. In general, the freezer treatment is probably a bad idea, but if the drive is dead, perhaps a leap of faith is needed.

Before the freezer treatment.

[ 8042.312138] sd 2:0:0:0: [sdb] CDB:
[ 8042.312139] Read(10): 28 00 00 0f 1f 59 00 00 01 00
[ 8042.312172] sd 2:0:0:0: [sdb] Unhandled error code
[ 8042.312173] sd 2:0:0:0: [sdb]
[ 8042.312174] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK


After the freezer treatment.

[26115.734779] ata3: exception Emask 0x10 SAct 0x0 SErr 0x4050002 action 0xe frozen
[26115.734785] ata3: irq_stat 0x00400040, connection status changed
[26115.734789] ata3: SError: { RecovComm PHYRdyChg CommWake DevExch }
[26115.734796] ata3: hard resetting link
[26125.746169] ata3: softreset failed (device not ready)
[26125.746175] ata3: hard resetting link
[26135.757726] ata3: softreset failed (device not ready)
[26135.757741] ata3: hard resetting link
[26146.738639] ata3: link is slow to respond, please be patient (ready=0)
[26158.219869] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[26158.235424] ata3.00: ATA-8: WDC WD4000BEVT-11ZAT0, 01.01A01, max UDMA/133
[26158.235430] ata3.00: 781422768 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
[26158.238797] ata3.00: configured for UDMA/133
[26158.238807] ata3: EH complete
[26158.238927] scsi 2:0:0:0: Direct-Access     ATA      WDC WD4000BEVT-1 01.0 PQ: 0 ANSI: 5
[26158.239290] sd 2:0:0:0: [sdb] 781422768 512-byte logical blocks: (400 GB/372 GiB)
[26158.239422] sd 2:0:0:0: [sdb] Write Protect is off
[26158.239427] sd 2:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[26158.239456] sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[26158.239580] sd 2:0:0:0: Attached scsi generic sg1 type 0
[26158.242667]  sdb: sdb1
[26158.243192] sd 2:0:0:0: [sdb] Attached SCSI disk