[linux-lvm] Disk space reporting inconsistencies - lvm half baked

Thu Jan 19 19:50:57 UTC 2006

At 03:31 AM 1/18/2006, Chris bolton wrote:
>Hi,
>
>just added a new PV to my VG but to me there seems to be an 
>inconsistency between what lvm says is the disk space and what df says.
>
>pvscan
>  PV /dev/sda2   VG VolGroup00   lvm2 [68.38 GB / 0    free]
>  PV /dev/sdb    VG VolGroup00   lvm2 [68.50 GB / 0    free]
>  PV /dev/hdc2   VG VolGroup00   lvm2 [37.12 GB / 0    free]
>  PV /dev/hdb1   VG VolGroup00   lvm2 [37.22 GB / 0    free]
>  PV /dev/hdd1   VG VolGroup00   lvm2 [37.22 GB / 0    free]
>  PV /dev/hde1   VG VolGroup00   lvm2 [37.25 GB / 0    free]
>  Total: 6 [285.69 GB] / in use: 6 [285.69 GB] / in no VG: 0 [0   ]
>
>df -h
>/dev/mapper/VolGroup00-LogVol00   265G  227G   28G  90% /
>
>I tried resizing the filesystem but it just says this..
>
>ext2online  /dev/VolGroup00/LogVol00
>    ext2online v1.1.18 - 2001/03/18 for EXT2FS 0.5b
>    ext2online: ext2_ioctl: No space left on device
>
>ext2online: unable to resize /dev/mapper/VolGroup00-LogVol00
>
>Am I missing something obvious here? or have I ballsed it up along the way?
>
>Cheers,
>Chris.

Chris,

I believe the above is an example similar to the second of the two 
situations that I
ran into when I came to the conclusion that LVM was "half baked" and not
(yet?) suitable for production use on server systems (below).

I repeat my message below for your information and in case there might
have been any changes that might allow recovery from the problems I
ran into.  I will be interested to see if you find any way to recover from
the problem you ran into:
________________________________________________________
Date: Wed, 12 Oct 2005 18:30:18 -0700
To: linux-lvm at redhat.com, rhn-users at redhat.com
From: Jed Donnelley <jed at nersc.gov>
Subject: Linux LVM - half baked?

Redhat LVM users,

Since I mentioned a minor bug in Redhat/LVM (9/28 LVM(2) bug in RH ES 
4.1 /etc/rc.d/sysinit.rc, RAID-1+0) I've done quite a number of 
additional installs using LVM.  I've now had my second system that 
got into an essentially unrecoverable state.  That's enough for me 
and LVM.  I very much like the facilities that LVM provides, but if 
I'm going to lose production file systems with it - well, I will have to wait.

Below are descriptions of the two problems I've run into.  I have run 
linux rescue from a CD for both systems.  The difficulty of course is 
that since the problem seems to be in the LVM layer, there are no 
file systems to work on (e.g. with fsck).  Perhaps there are some 
tools that I'm not yet familiar with to recover logical volumes in 
some way?  These are test/development systems, but if anybody has any 
thoughts on how to recover their file systems (e.g. to get more 
confidence in LVM) I'd be quite interested to hear them - just for 
the experience and perhaps to regain some confidence in LVM.  Thanks!

<I've since recycled the disks from these systems and the problems 
might now be difficult to recreate, though if there are suggestions 
on how to recover from them that seem workable I'd be willing to give it a try>

In one system after doing nothing more than an up2date on a x86_64 
system and rebooting I see:
...
4 logical volume(s) in volume group "VolGroup00" now active
ERROR: failed in exec of defaults
ERROR: failed in exec of ext3
mount: error 2 mounting none
switchroot: mount failed: 23
ERROR: ext3 exited abnormally! (pid 284)
...  <three more similar to the above>
kernel panic - not syncing: Attempted to kill init!

When I look at the above disks (this is a 6 disk system,
one RAID-1 pair for /boot - not LVM - and a 4 disk RAID-10
system for /data) the partitions all look fine.  I'm not sure
what else to look for.
______________________

In the other system (an x86 system) I had a disk failure in a software RAID-1
file system for the system file system (/boot /).  I replaced the
disk and resynced it apparently successfully.  However, after
a short time that replacement disk apparently failed (wouldn't
spin up on boot).  I removed the second disk and restarted
the system.  Here is how that went:
...
Your System appears to have shut down uncleanly
fsck.ext3 -a /dev/VolGroup00/LogVol02 contains a file system with 
errors, check forced
/dev/VolGroup00/LogVol02 Inodes that were part of a corrupted orphan 
linked list found.
/dev/VolGroup00/LogVol02 UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY 
(i.e. without -a or -p options)
[FAILED]
*** An error occurred during the file system check.
*** Dropping you to a shell;  The system will reboot when you leave the shell.

Give root password for maintenance (or type Control-D to continue)

---------------------

All stuff very familiar to those who've worked on corrupted file 
systems.  However, in this
case if I type Control-D or enter the root password the system goes 
through a sequence
like:

unmounting ...
automatic reboot

and reboots.  This starts the problem all over again.  As with the 
first system above
if I use a rescue disk there is no file system to run fsck on.

At this point, despite the value I see in LVM, I plan to back off on 
production deployment.
I'd be interested to hear the experiences of others.
_____________________________________________________________________

I did back off LVM.  We don't use LVM on any of our many (50+, though 
not so many Linux) production server systems.
We use RAID on all those systems.  I still don't trust LVM for 
production use.  I'd be quite interested to hear any
defense of LVM for application to production servers.

In my current opinion anybody using LVM for production servers over 
RAID (at least for the /boot and / partitions)
is walking on shaky ground.  I'd be quite interested to be shown to 
be wrong in that opinion.

--Jed http://www.webstart.com/jed/