Software RAID problem

Ron McKeever rmckeever at earthlink.net
Fri Jul 14 18:12:47 UTC 2006


This might help:

Basics of Linux Software RAID
The status of a running software RAID in Linux can be obtained from /proc/mdstat, here's a sample: 
md1 : active raid1 sdb1[1] sda1[0]
      1999936 blocks [2/2] [UU]

This is for a software RAID (Meta-Disk) named /dev/md1 which is comprised of /dev/sda1 and /dev/sdb1 devices in a RAID-1 (mirroring) setup. The Dell 1U server machines will have all their disks in software RAID-1 arrays.
When you have a disk installed and recognised by Linux you can then add partitions to degraded RAID arrays at any time with the raidhotadd command. Here is the data you see in /proc/mdstat for a degraded RAID array: 

md1 : active raid1 sdb1[1](F) sda1[0]
      1999936 blocks [2/1] [U_]

Device /dev/sdb1 has failed (to generate this error I unplugged the disk /dev/sdb). Now I have just swapped the hard drive (see below) and want to put the new drive back in the array. Firstly I must remove the record for the disk in the failed (F) state with the command raidhotremove /dev/md1 /dev/sdb1, which gives the following state in /proc/mdstat: 
md1 : active raid1 sda1[0]
      1999936 blocks [2/1] [U_]

Once this has been done for every partition that was in a RAID array the drive will be regarded as being unsed by Linux which allows it to be repartitioned or unregistered (see below for details of hardware recognition). 
If you want to instruct the software RAID driver to stop using a partition on a functional disk then you would use the raidsetfaulty command, EG: raidsetfaulty /dev/md1 /dev/sdb1 to set the partition in failed state so that you can then use raidhotremove to remove it. 

When you have a new partition you want to add to add to a RAID set you can use the command raidhotadd ARRAY DEVICE to add it, EG raidhotadd /dev/md1 /dev/sdb1 which results in the following data in /proc/mdstat: 

md1 : active raid1 sdb1[2] sda1[0]
      1999936 blocks [2/1] [U_]
      [=======>.............]  recovery = 37.8% (755976/1999936) finish=0.4min speed=48732K/sec

Note that when a device name is followed by [2] then it's in a reconstruction state.
When running raidhotadd commands there is no need to wait for one command to finish before running the next, the kernel maintains a queue of devices to reconstruct. You can schedule several RAID partitions to reconstruct and then go for a coffee break (or a lunch break depending on the speed of the drives). 

Ron

-----Original Message-----
>From: redhat at buglecreek.com
>Sent: Jul 14, 2006 11:00 AM
>To: Getting started with Red Hat Linux <redhat-install-list at redhat.com>
>Subject: Re: Software RAID problem
>
>
>On Thu, 13 Jul 2006 15:06:03 -0700, "Rick Stevens"
><rstevens at vitalstream.com> said:
>> On Thu, 2006-07-13 at 14:07 -0600, redhat at buglecreek.com wrote:
>> > We have a critical system that has Redhat 8.0 installed.  The system
>> > uses the older raidtools not mdadm. We are in the process of rebuilding
>> > a new box, but in the meantime we have a software raid issue.  The
>> > system had to be rebooted and we ended up with the following raid
>> > problem: 
>> > cat /proc/mdstat shows: 
>> > 
>> > Personalities : [raid0] [raid1]
>> > read_ahead 1024 sectors
>> > md1 : active raid1 hda2[0]
>> >       119684160 blocks [2/1] [U_]
>> > 
>> > md2 : active raid0 hda3[0] hdb2[1]
>> >       208640 blocks 64k chunks
>> > 
>> > md0 : active raid1 hda1[0] hdb1[1]
>> >       264960 blocks [2/2] [UU]
>> > 
>> > Looks like we have a problem with md1 device which is the / partition.
>> > lsraid -A -a /dev/md1 shows:
>> > 
>> > [dev   9,   1] /dev/md1         C27DAE7E.7C02AF01.5143DCC8.62FD07C3
>> > online
>> > [dev   3,   2] /dev/hda2        C27DAE7E.7C02AF01.5143DCC8.62FD07C3 good
>> > [dev   ?,   ?] (unknown)        00000000.00000000.00000000.00000000
>> > missing
>> > 
>> > The applicable section of /etc/raidtab is:
>> > 
>> > raiddev             /dev/md1
>> > raid-level                  1
>> > nr-raid-disks               2
>> > chunk-size                  64k
>> > persistent-superblock       1
>> > nr-spare-disks              0
>> >     device          /dev/hda2
>> >     raid-disk     0
>> >     device          /dev/hdb3
>> >     raid-disk     
>> > 
>> > It seems that /dev/hdb3 has issues.  Is there a way to get /dev/hdb3
>> > back online.  Can you do something with raidhotadd:
>> > raidhotadd /dev/md1 /dev/hdb3
>> > 
>> > This is a very critical system and I want to make sure we don't do
>> > anything that would totally bring the system down, at least until we can
>> > build a new system.  Any help would be appreciated.
>> 
>> The FIRST thing you do is back up /dev/md1 (or what's left of it) in
>> case the remediation doesn't work or does something evil (it shouldn't).
>> And you can continue to run in the degraded state.
>> 
>> You can use raidhotadd to try to bring the drive back into the fold, but
>> it may not join if the drive is indeed defective.  Try the raidhotadd,
>> then check /proc/mdstat again.  If you see a "(F)" following the
>> "hdb3[1]" bit, the drive failed.  That doesn't mean the drive is fried,
>> but SOMETHING is wrong.
>> 
>> Try to raidhotremove the drive from the RAID, then run badblocks on the
>> partition in question (/dev/hdb3).  When it completes, try the
>> raidhotadd again and see if it joins and starts the resync.
>> 
>> Probably none of my business, but why is such a critical machine still
>> running RH8?  RH8.0 is farking ancient and, IMHO, the absolute worst
>> release of RH ever...which is why RH9 came out so quickly after it.
>> 
>> ----------------------------------------------------------------------
>> - Rick Stevens, Senior Systems Engineer     rstevens at vitalstream.com -
>> - VitalStream, Inc.                       http://www.vitalstream.com -
>> -                                                                    -
>> -      A day for firm decisions!!!   Well, then again, maybe not!    -
>> ----------------------------------------------------------------------
>> 
>> _______________________________________________
>> Redhat-install-list mailing list
>> Redhat-install-list at redhat.com
>> https://www.redhat.com/mailman/listinfo/redhat-install-list
>> To Unsubscribe Go To ABOVE URL or send a message to:
>> redhat-install-list-request at redhat.com
>> Subject: unsubscribe
>
>Thanks Rick,
>
>I knew that using RH8.0 would raise a few eyebrows, but due to personnel
>changes etc it slipped through the cracks.  Anyway, when you use the
>raidhotremove command can you execute it on a partition like this:
>raidhotremove /dev/md1 /dev/hdb3 ?  Just like raidhotadd?  For the
>badblocks command, simply run badblocks /dev/hdb3?
>
>Thanks
>
>_______________________________________________
>Redhat-install-list mailing list
>Redhat-install-list at redhat.com
>https://www.redhat.com/mailman/listinfo/redhat-install-list
>To Unsubscribe Go To ABOVE URL or send a message to:
>redhat-install-list-request at redhat.com
>Subject: unsubscribe




More information about the Redhat-install-list mailing list