[linux-lvm] progress, but... - re. fixing LVM/md snafu

Miles Fidelman mfidelman at traversetechnologies.com
Sun Apr 5 21:12:10 UTC 2009


Jayson,

This is VERY helpful. Thanks!

Miles

Jayson Vantuyl wrote:
> Miles,
>
> It seems like what's probably happened is that LVM detected the raw 
> device instead of the MD device at some point early in the boot 
> process. This may be because the MD detection happened after LVM 
> setup. I'm unsure if it's possible for LVM to "steal" the device from MD.
>
> Depending on your distribution, this may require different things to 
> fix. Stop worrying about downtime. If the data is important, just 
> don't worry about downtime. If downtime is really important, build a 
> second machine, get it working right, and transfer the data. Being in 
> a hurry and attempting to "optimize" the recovery process is a really 
> good way to lose the data.
>
> Assuming that you're going to try to fix this setup, I'd start out 
> with a backup. This is critical. Everybody always says to do a backup. 
> Nobody ever does it. Really, do one. Get an S3 account, use an S3 
> backup utility. There's just not an excuse these days. Your data is 
> one-MD-mistake away from oblivion.
>
> So, right now MD should have sda/sdb but only has sda. sdb is now 
> newer than sda and may have important data if this server stores 
> anything like that. The challenge is that, according to MD, sda is 
> newer. Since MD isn't handling writes to sdb, it won't be updating its 
> metadata to know that it's newer. There are two options that I can 
> think of, both ugly. Pick one of:
>
> 1. Destroy the MD. Create a new one with the same UUID and sdb3 as the 
> source. (which you listed, the UUID part can trip you up)
> 2. Sync the updated data from sdb3 onto md2. Wipe sdb3. Add it back 
> into md2. (might be less downtime depending on data size, doesn't nuke MD)
> 3. Build another machine. Get it working right. Transfer data with 
> Rsync. (least downtime, most expensive)
>
> In the first two cases, this only sets you up for it to break again. 
> The core problem is figuring out what happened during boot. In a 
> perfect world, you would just tell LVM to only consider MD devices. 
> That's not hard, but it's complicated by the fact that you have LVM on 
> /. This means that the configuration that's used is likely not the 
> version on / but a copy of it that is made when you set up your boot 
> ramdisk (a.k.a. initrd, or possibly an initramfs). Even if we get LVM 
> locked down to use just MDs and get that config used to boot-time, 
> there's the possibility that the MD won't get assembled (since it 
> already may not have been when LVM was first activated) and the system 
> won't boot. Again, fraught with peril.
>
> If you want to fix the MD, first steps will be using a rescue LiveCD 
> to boot up and do all of this. With that LiveCD, you can also adjust 
> the LVM configuration and update the initrd (or whatever is used for 
> boot). You may need to chroot into the system and/or trick the initrd 
> into seeing the right devices. I don't really think I can walk you 
> through this via an e-mail.
>
> The LVM part is pretty easy. Just set a filter line (you only get one, 
> so disable any other filter lines) in <root of system>/etc/lvm.conf to:
>
>> filter = [ "a|^/dev/md.*$|", "r/.*/" ]
>
> That will prevent you from using anything but the MD.
>
> To update the initrd with this information depends on distro (and 
> distro version)�. It's usually either some invocation of "mkinitrd" or 
> some script that wraps it. It will get the LVM configuration available 
> at boot-time. This *MIGHT* sort out the MD problem. It might not. If 
> it doesn't, I'm not sure where to tell you to start. If mdadm is being 
> used by your initrd, you'll need to tweak its configuration. If it's 
> relying on MD autodetection, you might have turned that off in your 
> kernel. If you have an IDE controller that takes too long to 
> initialize, that can also cause this sort of thing (although that's 
> REALLY unlikely these days).
>
> I hope that some of this helps. Although, it will be hard for anyone 
> to give you really solid advice without a little more insight into why 
> the MD isn't getting assembled prior to LVM's scan.
>
> On Apr 5, 2009, at 10:05 AM, Miles Fidelman wrote:
>
>> Hello again Folks,
>>
>> So.. I'm getting closer to fixing this messed up machine.
>>
>> Where things stand:
>>
>> I have root defined as an LVM2 LV, that should use /dev/md2 as it's PV.
>> /dev/md2 in turn is a RAID1 array built from /dev/sda3 /dev/sdb3 and 
>> /dev/sdc3
>>
>> Instead, LVM is reporting: "Found duplicate PV 
>> 2ppSS2q0kO3t0tuf8t6S19qY3ypWBOxF: using /dev/sdb3 not /dev/sda3"
>> and the /dev/md2 is reporting itself as inactive (cat /proc/mdstat) 
>> and active,degraded (mdadm --detail)
>>
>> ---
>> I'm guessing that, during boot:
>>
>> - the raid array failed to start
>> - LVM found both copies of the PV, and picked one (/dev/sdb3)
>> - everything then came up and my server is humming away
>>
>> but: the md array can't rebuild because the most current device in it 
>> is already in use
>>
>> so... I'm looking for the right sequence of events, with the minimum 
>> downtime to:
>>
>> 1. stop changes to /dev/sdb3 (actually, to / - which complicates things)
>> 2. rebuild the RAID1 array, making sure to use /dev/sdb3 as the 
>> starting point for current data
>> 3. restart in such a way that LVM finds /dev/md2 as the right PVM 
>> instead of one of its components
>>
>> Each of these is just tricky enough that I'm sure there are lots of 
>> gotchas to watch out for.
>>
>> So.. any suggestions?
>>
>> Thanks very much,
>>
>> Miles Fidelman
>>
>>
>>
>>
>> _______________________________________________
>> linux-lvm mailing list
>> linux-lvm at redhat.com <mailto:linux-lvm at redhat.com>
>> https://www.redhat.com/mailman/listinfo/linux-lvm
>> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>
> -- 
> Jayson Vantuyl
> Founder and Architect
> *Engine Yard <http://www.engineyard.com>*
> jvantuyl at engineyard.com <mailto:jvantuyl at engineyard.com>
> 1 866 518 9275 ext 204
> IRC (freenode): kagato
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


-- 
Miles R. Fidelman, Director of Government Programs
Traverse Technologies 
145 Tremont Street, 3rd Floor
Boston, MA  02111
mfidelman at traversetechnologies.com
857-362-8314
www.traversetechnologies.com




More information about the linux-lvm mailing list