RAID failure already!!!!!

Sat Nov 18 04:28:28 UTC 2006

On Thu, Nov 16, 2006 at 09:59:24 -0500,
  James Pifer <jep at obrien-pifer.com> wrote:
> [root at storage ~]# mdadm --query --examine /dev/hdb

Note:
You can query the array directly rather than looking at specific members.
That's probably safer, in that if the members have conflicting information
you get to see what is getting used currently.

>       Number   Major   Minor   RaidDevice State
> this     0       3       64        0      active sync   /dev/hdb
> 
>    0     0       3       64        0      active sync   /dev/hdb
>    1     1      22        0        1      active sync   /dev/hdc
>    2     2       0        0        2      faulty removed
> 
> Is there anyway to force it to try and reload the array even with the
> failed device? I'm not getting drive errors on the device any longer. Is
> the failed device the "dev/hdd" where I missed the leading "/"? Or, is
> the failed device /dev/hdb?
> 
> What else can I look at? What other commands should I run? 

It looks like the array is running in degraded mode. If hdb is not acting up
at the moment you might be able to add in /dev/hdd and then after it is
rebuilt, fail out /dev/hdb for testing.
Even if /dev/hdb has some bad sectors, you might get most of your data
safely on the good disks.
I believe the command you want is:
mdadm /dev/md0 -a /dev/hdd
I think that is unlikely to make things worse as long as you don't care
about what currently is on /dev/hdd.

The only thing that seems odd about the output above is that /dev/hdb is
listed twice as a working device and seems to be counted twice in the
total number of devices.