When LVM Goes Bad
Paul Howarth
paul at city-fan.org
Tue Jun 20 12:41:03 UTC 2006
Andy Green wrote:
> A story about LVM. I believe LVM is the default on Fedora partitioning
> now, at least I didn't love it that much that I would have selected it,
> and it is on all my boxes now.
>
> LVM can make a lot of sense for large storage binding together multiple
> devices or raids into a single logical storage device, in fact I use it
> for that too. However LVM makes less sense on, say, a laptop which has
> and will only ever have a single 2.5" HDD for storage that is
> permanently available with the laptop.
>
> Now it doesn't matter too much when everything is working, because LVM
> is a fairly lightweight additional layer AFAIK. However on a box here
> its sole SATA drive went bad without warning, basically some dozens of
> sectors were goneski after a recent period of high temperature here. The
> resulting symptom was that the partition contents were no longer
> recognized as containing a logical volume or a volume group, nor pvscan,
> although pvdisplay could see it was a physical volume if pointed
> directly at the partition.
>
> Recovery from LVM metadata corruption is not something that is
> overburdened by tools to help out, in fact I couldn't find anything
> useful. By using dd I probed the damaged region and found that it
> started 33214 512-byte blocks into the partition, and ended 33336
> 512-byte blocks in, it trashed something like 60Kbytes. Touching this
> region spewed IO errors to the console. Whether this explained the loss
> of LVMness or a subsequent logical brain damage that happened elsewhere
> did it I don't know.
>
> What I did was to add a new HDD and install FC5 on it and boot into it,
> with the old HDD on as /dev/sdb. I then used dd to copy the first 33214
> 512-byte blocks to a file on the new drive, dd'ed 122 512-byte blocks
> from /dev/zero and appended that on the end of the first file, and then
> used dd with bs=512 skip=33336 to copy the remainder of the damaged
> partition to this file also. So after this I had a copy of the
> partition as a file on the new HDD with everything in the right place
> and the damaged area zeroed out.
>
> Now naturally this file will not mount loop because of the LVM, it's not
> a valid ext3 image. I googled around some more and went on the LVM IRC
> channel and explained my problem. No help, in fact no response. There
> don't seem to be any tools or readily findable advice for recovering
> from this situation.
>
> I created a new 10MB file with dd and used mkfs.ext3 on it, and examined
> the first part of it using hexdump. With the help of Google I found
> that the ext3 magic is present at offset +0x438, and I noticed that the
> first 1Kbytes of it is zeroed. I then used hexdump and grep to search
> for this situation in the copied LVM partition file, and found such a
> situation was present at offset 0x30438.
>
> I decided to remove the first 0x30000 bytes of my copied partition
> image, which took a while because the partition was 60GB, in fact the
> whole process was agonizingly slow.
>
> After this, I was able to mount the resulting file -text3 -oloop
> successfully and I recovered my data. The zeroed/damaged region trashed
> a small part of two directories whose contents where noncritical. This
> story is offered in the hope that future Googlers will have better luck
> than I did.
>
> I wouldn't say that LVM is evil from this, but I would suggest that you
> simply turn it off for partitioning actions where you know there will be
> no expansion, because the only thing it will ever do for you in that
> case is to stress you out when you least need it.
Had a similar issue last week actually. It's not put me off LVM but it
made me glad I do regular backups.
Paul.
More information about the fedora-list
mailing list