Heads up–this post is exceedingly boring and more than a little technical. But at least the data loss was not as bad as it sounds. I recently had one large RAID 5 NAS (network-attached storage) array lose its partition table and wind up in a pre-format, factory default situation. With sufficient redundancy built into the system, it would have been easy enough to reinitialize the RAID, reformat the drives, and re-copy the data. But that wouldn’t be much of a learning experience, really…
So instead I decided , for future reference, to see how much data could be recovered from the situation, and how one might undertake such an adventure. It turns out that it was actually easy, if time consuming. Now, I’m a confident *nix user but I’d never created a RAID manually or recovered data from one. I decided to start by asking some experts directly before Googling a suggestion. As it turns out, QNAP may keep odd customer service hours from their home base in Hong Kong, but if you wait for a bit after work and catch them at the start of their day (around 7:30PM Central) they’re more than happy to help via Skype or MSN. Their preferred method of involvement in my particular problem was for me to forward some necessary ports, using our router, to the NAS device itself so that they could use our static IP to log directly into it. This wouldn’t have given me much of a window into what was going on. It was clear, though, that, while eager to help, the rep’s language barrier might have been a bit much to overcome for me to ask a lot of complicated questions via chat. With both those options off the table, when the QNAP customer service person offered to use TeamViewer, a free remote desktop sharing application, to take control of my laptop and, with me watching, access the NAS box via the local network it seemed ideal.
I downloaded and installed TeamViewer and, within a few minutes, the service rep was tooling around with the PuTTY terminal on my laptop, which was connected via SSH to the device. He had me turn off the device, remove the drives, turn it back on, and then insert the drives one by one in the original order (to facilitate adding them to the array in the correct order). He then used fidsk -l to see the drives on the device and discover their mount points: sda, sdb, etc. This should be pretty familiar to any linux user. He then demonstrated for me the use of the command mdadm to build an array from command line. Not too hard:
mdadm –assemble /dev/md0 /dev/sda3 /dev/sdb3 …
The mdadm “assemble” option rebuilds a previously created array, so no writing to the drives’ partition tables happens. This is pretty crucial to preserve as many drive sectors in their previous state as possible and avoid further data loss. “/dev/md0” is the mount point for the assembled array and the following pieces, “/dev/sda3,” etc., point to the partitions on the drives, in order, that need to be added to the array. It would most likely be more successful if the individual drives were devoid of partition tables so that all of their sectors could be considered, but in lieu of that the largest partition is the one I selected. With this accomplished, the RAID array should be available at /dev/md0.
Unfortunately for me, because the partition tables of the array had been hosed, the RAID array itself wouldn’t mount as a drive on the system. It wasn’t going to be that simple to get the files. Fortunately, there are tools for just such a situation and the QNAP rep, before taking his leave, recommended one to me: TestDisk. Now, I’m always leery of software that doesn’t have an exceedingly legit website (and sometimes even of software that does), but the rep (and forum posts and websites) recommended it. Ready to download and try TestDesk, I made another crucial data consideration and went in search of some external storage onto which to drop the executable. I couldn’t mount the array anyway, but even if I could it would still be wiser to use an external storage option so that no data loss would occur by overwriting sectors on the drives. I ended up connecting my HTC Evo to my laptop, downloading TestDisk onto it, and then connecting it to the USB port on the NAS box. Unfortunately, it turns out that the Evo doesn’t use ajournaling file system like linux’s ext2/3/4, so I wasn’t unable to chmod the TestDisk files and, therefore, couldn’t execute them. I ended up borrowing an external drive from Mike, which I formatted to ext3. I copied TestDisk to it and was able to use chmod to ready the files for execution.
TestDisk comes with two pieces of software: TestDisk and PhotoRec. Both can be used for file recovery, but operate in slightly different ways. Since the rep had recommended trying to repair the array’s partition table, I started with TestDisk. The software defaults to the choices it thinks are correct, and, as expected, it defaulted to “unpartitioned space.” This meant asking it to search through all of the sectors on the drive—the drive is a representation of the RAID array’s combined drives—looking for a way to rebuild the array’s table. This took a very, very long time given the size of the array and, as you guessed, upon completion didn’t successfully identify the partition table. It did identify some files, but TestDisk is only viable for recovering small numbers of files. PhotoRec, on the other hand, is less of a fishing pole for data and more of a broadly-cast net. So I fired that up.
PhotoRec searches whatever partition you indicate (or an entire unpartitioned space) for all of the file headers with which it’s familiar, which is a pretty reasonable list. It also has a very handy feature: you can individually select the file types for which you want it to search. This is useful because some headers are more common than others and PhotoRec will blindly assume that any header it encounters is a file until it encounters a file’s close. This can lead to situations where PhotoRec will scan sectors until it encounters what it falsely believes is a .mov or .pct header (both relatively common false-hits) and will assume that all the following sectors are part of that false file until a file-end is encountered. The result is that sectors with headers for other files types can end up being absorbed into garbage files with false headers. Though drastically more time-consuming, searching for single header types in a pass results in much more reliable data recovery by generally eschewing this problem.
PhotoRec is also not without speed-bumps, however. Because it will recover garbage files along with real ones, it can easily output more data than you anticipate with regards to storage space. Likewise, if you’re recovering a very large number (i.e. greater than several hundred thousand) of files, it can hit an arbitrary limit to the number of folders it will create for storing these recovered files. So while you’re likely, with large amounts of storage space to be search, to have waiting times in excess of several days while it searches, it will need regular checks to ensure that it’s not hanging on a technicality. Fortunately, previous sessions can be restarted.
In the end I was able to, with enough time, recover nearly all of the data, though I was extremely grateful that it wasn’t through any necessity that I did so as the process was somewhat arduous and would have been extremely nerve-wracking without the knowledge that the data was safe elsewhere.