File System corruption and Recovery

So this is the 2nd time I ran into a problem like this again. My FAT32 file system on the external USB HDD, all of a sudden, started reporting:

00:47:32 rrs@champaran:/tmp$ sudo dosfsck /dev/sdb1dosfsck 3.0.9, 31 Jan 2010, FAT32, LFNLogical sector size is zero.

I had been taking a lot of care to ensure that I don’t run into situation like this. No body likes losing data. The good part is that I’ve been lucky enough that, even without backups (now who’s gonna backup a backup disk), I have recovered all my data. All thanks to Christophe GRENIER for Testdisk.

So what caused the problem

I don’t know. I do remember what I did last that must have triggered the problem. I started 5 copy operations from my Laptop HDD to the External HDD (FAT32 which got corrupted) using the File Manager, effectively triggering a random write for the I/O Scheduler.

And at the very same time, I was also running Handbrake to try re-encode a corrupted MP4 video from my camera - CPU Intensive task.

Well nothing RTOS or Mission critical, but unfortunately, the linux kernel couldn’t take much. The moment it ran out of VM, it started paging. And looks like paging is the ugliest state for the linux kernel. Because the moment it starts paging, you have a very high probability of hitting an OOM. And that’s what happened in my case.

I wish Linux actually thawed processes instead of trying to give every a fair share, and thus ending up in an OOM situation. But anyways, having become good at predicting Linux’s behavior, I decided to not touch the laptop at all. Left it as it is over night thinking it would eventually trigger OOM and the prime candidate would be Handbrake. And once Handbrake is killed, everything would recover.

In the morning, every thing was back to normal. The HDD was idle and showed no more signs of the paging abuse the kernel did last night. The only evidence was syslog which did impress me for my predictability of Linux’s OOM. The kernel did trigger OOM and just kill the most abusive (CPU intensive) process, Handbrake, and everything else had recovered to normal.

Well. All good. I did not have to reboot my laptop. So just hibernated and pushed to work.

Why FAT32? Is that the best?

My beautiful Playstation 3, with which I like to share some of the files, does not understand anything else apart from FAT32.

So back to home, plugged-in the External HDD and……….. sigh!!! Does not detect.

Plugged it into my laptop …… No KDE automount…

Something wrong….

00:47:32 rrs@champaran:/tmp$ sudo dosfsck /dev/sdb1dosfsck 3.0.9, 31 Jan 2010, FAT32, LFNLogical sector size is zero.

I wonder why does a file system have to get corrupted for extensive I/O on it..

The Recovery..

Done is done. Having run into similar problems before, I looked back at testdisk.

It started off with a disappointment stating that the file system was damaged.

Luckily, doing an advanced mode lookup did show some hope.

And doing a listing further yielded that the boot sector was available.

Which when rebuilt, allowed me access to the partition.

For some reason, the [undelete] option listed no data. It reported that there was no data available.

Selecting the [Boot] option listed down all my files, which I quickly copied over to my other External USB HDD with a btrfs file system ;-)

Testdisk has twice turned out to be my favorite data recovery tool from b0rken file systems.


See also