When LVM Goes Bad

Paul Howarth paul at city-fan.org
Tue Jun 20 12:41:03 UTC 2006


Andy Green wrote:
> A story about LVM.  I believe LVM is the default on Fedora partitioning 
> now, at least I didn't love it that much that I would have selected it, 
> and it is on all my boxes now.
> 
> LVM can make a lot of sense for large storage binding together multiple 
> devices or raids into a single logical storage device, in fact I use it 
> for that too.  However LVM makes less sense on, say, a laptop which has 
> and will only ever have a single 2.5" HDD for storage that is 
> permanently available with the laptop.
> 
> Now it doesn't matter too much when everything is working, because LVM 
> is a fairly lightweight additional layer AFAIK.  However on a box here 
> its sole SATA drive went bad without warning, basically some dozens of 
> sectors were goneski after a recent period of high temperature here. The 
> resulting symptom was that the partition contents were no longer 
> recognized as containing a logical volume or a volume group, nor pvscan, 
> although pvdisplay could see it was a physical volume if pointed 
> directly at the partition.
> 
> Recovery from LVM metadata corruption is not something that is 
> overburdened by tools to help out, in fact I couldn't find anything 
> useful.  By using dd I probed the damaged region and found that it 
> started 33214 512-byte blocks into the partition, and ended 33336 
> 512-byte blocks in, it trashed something like 60Kbytes.  Touching this 
> region spewed IO errors to the console.  Whether this explained the loss 
> of LVMness or a subsequent logical brain damage that happened elsewhere 
> did it I don't know.
> 
> What I did was to add a new HDD and install FC5 on it and boot into it, 
> with the old HDD on as /dev/sdb.  I then used dd to copy the first 33214 
> 512-byte blocks to a file on the new drive, dd'ed 122 512-byte blocks 
> from /dev/zero and appended that on the end of the first file, and then 
> used dd with bs=512 skip=33336 to copy the remainder of the damaged 
> partition to this file also.  So after this I had a copy of the 
> partition as a file on the new HDD with everything in the right place 
> and the damaged area zeroed out.
> 
> Now naturally this file will not mount loop because of the LVM, it's not 
> a valid ext3 image.  I googled around some more and went on the LVM IRC 
> channel and explained my problem.  No help, in fact no response.  There 
> don't seem to be any tools or readily findable advice for recovering 
> from this situation.
> 
> I created a new 10MB file with dd and used mkfs.ext3 on it, and examined 
> the first part of it using hexdump.  With the help of Google I found 
> that the ext3 magic is present at offset +0x438, and I noticed that the 
> first 1Kbytes of it is zeroed.  I then used hexdump and grep to search 
> for this situation in the copied LVM partition file, and found such a 
> situation was present at offset 0x30438.
> 
> I decided to remove the first 0x30000 bytes of my copied partition 
> image, which took a while because the partition was 60GB, in fact the 
> whole process was agonizingly slow.
> 
> After this, I was able to mount the resulting file -text3 -oloop 
> successfully and I recovered my data.  The zeroed/damaged region trashed 
> a small part of two directories whose contents where noncritical.  This 
> story is offered in the hope that future Googlers will have better luck 
> than I did.
> 
> I wouldn't say that LVM is evil from this, but I would suggest that you 
> simply turn it off for partitioning actions where you know there will be 
> no expansion, because the only thing it will ever do for you in that 
> case is to stress you out when you least need it.

Had a similar issue last week actually. It's not put me off LVM but it 
made me glad I do regular backups.

Paul.




More information about the fedora-list mailing list