[linux-lvm] progress, but... - re. fixing LVM/md snafu
Miles Fidelman
mfidelman at traversetechnologies.com
Sun Apr 5 21:12:10 UTC 2009
Jayson,
This is VERY helpful. Thanks!
Miles
Jayson Vantuyl wrote:
> Miles,
>
> It seems like what's probably happened is that LVM detected the raw
> device instead of the MD device at some point early in the boot
> process. This may be because the MD detection happened after LVM
> setup. I'm unsure if it's possible for LVM to "steal" the device from MD.
>
> Depending on your distribution, this may require different things to
> fix. Stop worrying about downtime. If the data is important, just
> don't worry about downtime. If downtime is really important, build a
> second machine, get it working right, and transfer the data. Being in
> a hurry and attempting to "optimize" the recovery process is a really
> good way to lose the data.
>
> Assuming that you're going to try to fix this setup, I'd start out
> with a backup. This is critical. Everybody always says to do a backup.
> Nobody ever does it. Really, do one. Get an S3 account, use an S3
> backup utility. There's just not an excuse these days. Your data is
> one-MD-mistake away from oblivion.
>
> So, right now MD should have sda/sdb but only has sda. sdb is now
> newer than sda and may have important data if this server stores
> anything like that. The challenge is that, according to MD, sda is
> newer. Since MD isn't handling writes to sdb, it won't be updating its
> metadata to know that it's newer. There are two options that I can
> think of, both ugly. Pick one of:
>
> 1. Destroy the MD. Create a new one with the same UUID and sdb3 as the
> source. (which you listed, the UUID part can trip you up)
> 2. Sync the updated data from sdb3 onto md2. Wipe sdb3. Add it back
> into md2. (might be less downtime depending on data size, doesn't nuke MD)
> 3. Build another machine. Get it working right. Transfer data with
> Rsync. (least downtime, most expensive)
>
> In the first two cases, this only sets you up for it to break again.
> The core problem is figuring out what happened during boot. In a
> perfect world, you would just tell LVM to only consider MD devices.
> That's not hard, but it's complicated by the fact that you have LVM on
> /. This means that the configuration that's used is likely not the
> version on / but a copy of it that is made when you set up your boot
> ramdisk (a.k.a. initrd, or possibly an initramfs). Even if we get LVM
> locked down to use just MDs and get that config used to boot-time,
> there's the possibility that the MD won't get assembled (since it
> already may not have been when LVM was first activated) and the system
> won't boot. Again, fraught with peril.
>
> If you want to fix the MD, first steps will be using a rescue LiveCD
> to boot up and do all of this. With that LiveCD, you can also adjust
> the LVM configuration and update the initrd (or whatever is used for
> boot). You may need to chroot into the system and/or trick the initrd
> into seeing the right devices. I don't really think I can walk you
> through this via an e-mail.
>
> The LVM part is pretty easy. Just set a filter line (you only get one,
> so disable any other filter lines) in <root of system>/etc/lvm.conf to:
>
>> filter = [ "a|^/dev/md.*$|", "r/.*/" ]
>
> That will prevent you from using anything but the MD.
>
> To update the initrd with this information depends on distro (and
> distro version)�. It's usually either some invocation of "mkinitrd" or
> some script that wraps it. It will get the LVM configuration available
> at boot-time. This *MIGHT* sort out the MD problem. It might not. If
> it doesn't, I'm not sure where to tell you to start. If mdadm is being
> used by your initrd, you'll need to tweak its configuration. If it's
> relying on MD autodetection, you might have turned that off in your
> kernel. If you have an IDE controller that takes too long to
> initialize, that can also cause this sort of thing (although that's
> REALLY unlikely these days).
>
> I hope that some of this helps. Although, it will be hard for anyone
> to give you really solid advice without a little more insight into why
> the MD isn't getting assembled prior to LVM's scan.
>
> On Apr 5, 2009, at 10:05 AM, Miles Fidelman wrote:
>
>> Hello again Folks,
>>
>> So.. I'm getting closer to fixing this messed up machine.
>>
>> Where things stand:
>>
>> I have root defined as an LVM2 LV, that should use /dev/md2 as it's PV.
>> /dev/md2 in turn is a RAID1 array built from /dev/sda3 /dev/sdb3 and
>> /dev/sdc3
>>
>> Instead, LVM is reporting: "Found duplicate PV
>> 2ppSS2q0kO3t0tuf8t6S19qY3ypWBOxF: using /dev/sdb3 not /dev/sda3"
>> and the /dev/md2 is reporting itself as inactive (cat /proc/mdstat)
>> and active,degraded (mdadm --detail)
>>
>> ---
>> I'm guessing that, during boot:
>>
>> - the raid array failed to start
>> - LVM found both copies of the PV, and picked one (/dev/sdb3)
>> - everything then came up and my server is humming away
>>
>> but: the md array can't rebuild because the most current device in it
>> is already in use
>>
>> so... I'm looking for the right sequence of events, with the minimum
>> downtime to:
>>
>> 1. stop changes to /dev/sdb3 (actually, to / - which complicates things)
>> 2. rebuild the RAID1 array, making sure to use /dev/sdb3 as the
>> starting point for current data
>> 3. restart in such a way that LVM finds /dev/md2 as the right PVM
>> instead of one of its components
>>
>> Each of these is just tricky enough that I'm sure there are lots of
>> gotchas to watch out for.
>>
>> So.. any suggestions?
>>
>> Thanks very much,
>>
>> Miles Fidelman
>>
>>
>>
>>
>> _______________________________________________
>> linux-lvm mailing list
>> linux-lvm at redhat.com <mailto:linux-lvm at redhat.com>
>> https://www.redhat.com/mailman/listinfo/linux-lvm
>> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>
> --
> Jayson Vantuyl
> Founder and Architect
> *Engine Yard <http://www.engineyard.com>*
> jvantuyl at engineyard.com <mailto:jvantuyl at engineyard.com>
> 1 866 518 9275 ext 204
> IRC (freenode): kagato
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
--
Miles R. Fidelman, Director of Government Programs
Traverse Technologies
145 Tremont Street, 3rd Floor
Boston, MA 02111
mfidelman at traversetechnologies.com
857-362-8314
www.traversetechnologies.com
More information about the linux-lvm
mailing list