[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[linux-lvm] Linux LVM - half baked?



Redhat LVM users,

Since I mentioned a minor bug in Redhat/LVM (9/28 LVM(2) bug in RH ES 4.1 /etc/rc.d/sysinit.rc, RAID-1+0) I've done quite a number of additional installs using LVM. I've now had my second system that got into an essentially unrecoverable state. That's enough for me and LVM. I very much like the facilities that LVM provides, but if I am going to lose production file systems with it - well, I will have to wait.

Below are descriptions of the two problems I've run into. I have run linux rescue from a CD for both systems. The difficulty of course is that since the problem seems to be in the LVM layer, there are no file systems to work on (e.g. with fsck). Perhaps there are some tools that I'm not yet familiar with to recover logical volumes in some way? These are test/development systems, but if anybody has any thoughts on how to recover their file systems (e.g. to get more confidence in LVM) I'd be quite interested to hear them - just for the experience and perhaps to regain some confidence in LVM. Thanks!

In one system after doing nothing more than an up2date on a x86_64 system and rebooting I see:
...
4 logical volume(s) in volume group "VolGroup00" now active
ERROR: failed in exec of defaults
ERROR: failed in exec of ext3
mount: error 2 mounting none
switchroot: mount failed: 23
ERROR: ext3 exited abnormally! (pid 284)
...  <three more similar to the above>
kernel panic - not syncing: Attempted to kill init!

When I look at the above disks (this is a 6 disk system,
one RAID-1 pair for /boot - not LVM - and a 4 disk RAID-10
system for /data) the partitions all look fine.  I'm not sure
what else to look for.
______________________

In the other system (an x86 system) I had a disk failure in a software RAID-1
file system for the system file system (/boot /).  I replaced the
disk and resynced it apparently successfully.  However, after
a short time that replacement disk apparently failed (wouldn't
spin up on boot).  I removed the second disk and restarted
the system.  Here is how that went:
...
Your System appears to have shut down uncleanly
fsck.ext3 -a /dev/VolGroup00/LogVol02 contains a file system with errors, check forced /dev/VolGroup00/LogVol02 Inodes that were part of a corrupted orphan linked list found. /dev/VolGroup00/LogVol02 UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY (i.e. without -a or -p options)
[FAILED]
*** An error occurred during the file system check.
*** Dropping you to a shell;  The system will reboot when you leave the shell.

Give root password for maintenance (or type Control-D to continue)

---------------------

All stuff very familiar to those who've worked on corrupted file systems. However, in this case if I type Control-D or enter the root password the system goes through a sequence
like:

unmounting ...
automatic reboot

and reboots. This starts the problem all over again. As with the first system above
if I use a rescue disk there is no file system to run fsck on.

At this point, despite the value I see in LVM, I plan to back off on production deployment.
I'd be interested to hear the experiences of others.

--Jed http://www.nersc.gov/~jed/
--Jed http://www.nersc.gov/~jed/


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]