[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

When LVM Goes Bad



Hi folks -

A story about LVM. I believe LVM is the default on Fedora partitioning now, at least I didn't love it that much that I would have selected it, and it is on all my boxes now.

LVM can make a lot of sense for large storage binding together multiple devices or raids into a single logical storage device, in fact I use it for that too. However LVM makes less sense on, say, a laptop which has and will only ever have a single 2.5" HDD for storage that is permanently available with the laptop.

Now it doesn't matter too much when everything is working, because LVM is a fairly lightweight additional layer AFAIK. However on a box here its sole SATA drive went bad without warning, basically some dozens of sectors were goneski after a recent period of high temperature here. The resulting symptom was that the partition contents were no longer recognized as containing a logical volume or a volume group, nor pvscan, although pvdisplay could see it was a physical volume if pointed directly at the partition.

Recovery from LVM metadata corruption is not something that is overburdened by tools to help out, in fact I couldn't find anything useful. By using dd I probed the damaged region and found that it started 33214 512-byte blocks into the partition, and ended 33336 512-byte blocks in, it trashed something like 60Kbytes. Touching this region spewed IO errors to the console. Whether this explained the loss of LVMness or a subsequent logical brain damage that happened elsewhere did it I don't know.

What I did was to add a new HDD and install FC5 on it and boot into it, with the old HDD on as /dev/sdb. I then used dd to copy the first 33214 512-byte blocks to a file on the new drive, dd'ed 122 512-byte blocks from /dev/zero and appended that on the end of the first file, and then used dd with bs=512 skip=33336 to copy the remainder of the damaged partition to this file also. So after this I had a copy of the partition as a file on the new HDD with everything in the right place and the damaged area zeroed out.

Now naturally this file will not mount loop because of the LVM, it's not a valid ext3 image. I googled around some more and went on the LVM IRC channel and explained my problem. No help, in fact no response. There don't seem to be any tools or readily findable advice for recovering from this situation.

I created a new 10MB file with dd and used mkfs.ext3 on it, and examined the first part of it using hexdump. With the help of Google I found that the ext3 magic is present at offset +0x438, and I noticed that the first 1Kbytes of it is zeroed. I then used hexdump and grep to search for this situation in the copied LVM partition file, and found such a situation was present at offset 0x30438.

I decided to remove the first 0x30000 bytes of my copied partition image, which took a while because the partition was 60GB, in fact the whole process was agonizingly slow.

After this, I was able to mount the resulting file -text3 -oloop successfully and I recovered my data. The zeroed/damaged region trashed a small part of two directories whose contents where noncritical. This story is offered in the hope that future Googlers will have better luck than I did.

I wouldn't say that LVM is evil from this, but I would suggest that you simply turn it off for partitioning actions where you know there will be no expansion, because the only thing it will ever do for you in that case is to stress you out when you least need it.

-Andy

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]