[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [linux-lvm] progress, but... - re. fixing LVM/md snafu



Hi Jayson,

Thanks for all the detailed information yesterday. I've done some more digging into my system, and I wonder if you'd be willing to comment on what I found, and the recovery procedure I'm considering.

Quick summary of situation:
- machine comes up, but LVM builds / on top of /dev/sdb3 instead of /dev/md2 of which /dev/sdb3 is a part - looks like md2 isn't starting, so I need to fix it (presumably offline, using a LiveCD), then reboot and get LVM to use the mirror device

What's confusing is that the raid isn't starting at boot time, but depending on which tools I use shows different status. So first, I have to get the raid working again and make sure it has the up-to-date data.

Here are some more details, broken into four sections: RAID, LVM, boot process, recovery procedure - the RAID section has a summary at the front, followed by details of command listings, the other sections are much shorter :-):

Comments on the recovery procedure, please!

---------- re. the RAID array --------
RE. the raid array:

summary:
- /proc/mdstat thinks the array is inactive, containing sdb3 and sdd3

- mdadm thinks it's active, degraded, also containing sdb3 and sdd3 (mdadm -D /dev/md2)

- looking at superblocks, mdadm seems to think it's active, degraded (mdadm -E /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3)
-- containing sda3, only (mdadm -E /dev/sda3)
-- containing sda3, with sdb3 spare (mdadm -E /dev/sdb3)
-- containing sda3 and sdb3, with sdc3 spare (mdadm -E /dev/sdc3) - with the same Magic #, different UUID from above
-- no superblock on /dev/sdd3 (mdadm -E /dev/sdd3)

details:
more /proc/mdstat:
md2 : inactive sdd3[0] sdb3[2]
    195318016 blocks

<looking at RAID>
mdadm -D /dev/md2:
/dev/md2:
      Version : 00.90.01
Creation Time : Thu Jul 20 06:15:18 2006
   Raid Level : raid1
  Device Size : 97659008 (93.13 GiB 100.00 GB)
 Raid Devices : 2
Total Devices : 2
Preferred Minor : 2
  Persistence : Superblock is persistent

  Update Time : Fri Apr  3 10:06:41 2009
        State : active, degraded
Active Devices : 0
Working Devices : 2
Failed Devices : 0
Spare Devices : 2

  Number   Major   Minor   RaidDevice State
     0       8       51        0      spare rebuilding   /dev/sdd3
     1       0        0        -      removed

     2       8       19        -      spare   /dev/sdb3

<looking at component devices>
server1:/etc/lvm# mdadm -E  /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3
/dev/sda3:
        Magic : a92b4efc
      Version : 00.90.00
         UUID : 3a32acee:8a132ab9:545792a8:0df49d99
Creation Time : Thu Jul 20 06:15:18 2006
   Raid Level : raid1
 Raid Devices : 2
Total Devices : 1
Preferred Minor : 2

  Update Time : Fri Apr  3 22:40:39 2009
        State : clean
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0
     Checksum : 71d21f34 - correct
       Events : 0.114704240


    Number   Major   Minor   RaidDevice State
this     0       8        3        0      active sync   /dev/sda3

 0     0       8        3        0      active sync   /dev/sda3
 1     1       0        0        1      faulty removed
/dev/sdb3:
        Magic : a92b4efc
      Version : 00.90.00
         UUID : 3a32acee:8a132ab9:545792a8:0df49d99
Creation Time : Thu Jul 20 06:15:18 2006
   Raid Level : raid1
 Raid Devices : 2
Total Devices : 2
Preferred Minor : 2

  Update Time : Fri Apr  3 10:06:41 2009
        State : clean
Active Devices : 1
Working Devices : 2
Failed Devices : 1
Spare Devices : 1
     Checksum : 71d1d1fa - correct
       Events : 0.114716950


    Number   Major   Minor   RaidDevice State
this     2       8       19        2      spare   /dev/sdb3

 0     0       8        3        0      active sync   /dev/sda3
 1     1       0        0        1      faulty removed
 2     2       8       19        2      spare   /dev/sdb3
/dev/sdc3:
        Magic : a92b4efc
      Version : 00.90.00
         UUID : 635fb32e:6a83a5be:12735af4:74016e66
Creation Time : Wed Jul  2 12:48:36 2008
   Raid Level : raid1
 Raid Devices : 2
Total Devices : 3
Preferred Minor : 2

  Update Time : Fri Apr  3 06:42:50 2009
        State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 0
Spare Devices : 1
     Checksum : 95973481 - correct
       Events : 0.26


    Number   Major   Minor   RaidDevice State
this     2       8       35        2      spare   /dev/sdc3

 0     0       8        3        0      active sync   /dev/sda3
 1     1       8       19        1      active sync   /dev/sdb3
 2     2       8       35        2      spare   /dev/sdc3
mdadm: No super block found on /dev/sdd3 (Expected magic a92b4efc, got 00000000)

<looking at devices with --scan>
server1:/etc/lvm# mdadm -E  --scan /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=635fb32e:6a83a5be:12735af4:74016e66
 devices=/dev/sdc3
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=3a32acee:8a132ab9:545792a8:0df49d99
 devices=/dev/sda3,/dev/sdb3

-------- re. LVM ---------

/etc/lvm.conf contains the line:
md_component_detection = 0

I expect that if I set it to 1 that would tell LVM to look for RAIDs first.

Also, /etc/lvm/backup/rootvolume contains:
pv0 {
          id = "2ppSS2-q0kO-3t0t-uf8t-6S19-qY3y-pWBOxF"
          device = "/dev/md2"    # Hint only

which suggests that if the RAID is running, lvm will do the right thing

---------- re. boot process ------------
looks like detailed events are:

- MBR loads grub

- grub knows about md and lvm, mounts read-only
-- kernel /vmlinuz-2.6.8-3-686 root=/dev/mapper/rootvolume-rootlv ro mem=4

- during main boot md comes up first, then lvm
-- from rcS.d/S25mdadm-raid: if not already running ... mdadm -A -s -a
---- I'm guessing this fails for /dev/md2

-- from rcS.d/S26lvm:
-- creates lvm device
-- creates dm device
-- does a vgscan
---- which is where this happens:
Found duplicate PV 2ppSS2q0kO3t0tuf8t6S19qY3ypWBOxF: using /dev/sdb3 not /dev/sda3
Found volume group "backupvolume" using metadata type lvm2
Found volume group "rootvolume" using metadata type lvm2
-- does a vgchange -a -y
---- which looks like it's picking up on sdb3

-- I'm guessing that if the mirror were active, and based on /dev/sdb3 - lvm would pick that up as the volume group
** is this where setting md_component_detection = 1 would be helpful?

------------ recovery procedure ------------

here's what I'm thinking of doing - comments please!

1. turn logging on in lvm.conf, reboot, examine logs to confirm above guesses (or find out what's really happening)
-- based on the logging, maybe set md_component_detection = 1 in lvm.conf

2. shutdown, boot from LiveCD (I'm using systemrescuecd - great tool by the way)

3. backup /dev/sdb3 using partimage (just in case!)

4. try to fix /dev/md2

if it's not running - start it, with only /dev/sdb3; then add in other devices - A /dev/md2 --add /dev/sdb3 --run (**is this the right way to do this?**) - add each device back (mdadm -a /dev/sda3; mdadm -a /dev/sdb3; mdadm -a /dev/sdd3)
- grow to 3 active devices: mdadm --grow -n 3 /dev/md2

if it's running:
- fail all except /dev/sdb3 (mdadm -f /dev/sda3; mdadm -f /dev/sdb3; mdadm -f /dev/sdd3) - remove all except /dev/sdb3 (mdadm -r /dev/sda3; mdadm -r /dev/sdb3; mdadm -r /dev/sdd3) - add each device back (mdadm -a /dev/sda3; mdadm -a /dev/sdb3; mdadm -a /dev/sdd3)
- grow to 3 active devices: mdadm --grow -n 3 /dev/md2

question: do I need to update mdadm.conf?
question: do I need to anything to get rid of the superblock containing a different UUID

5. reboot the system

- it may just come up

- if it comes up and lvm is still operating off a single partition, repeat the above, but first add a filter to lvm.conf (wash, rinse, repeat as necessary)

*** does this seem like a reasonable game plan? ***

Thanks again for  your help!

Miles




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]