RAID failure already!!!!!

Thu Nov 16 14:59:24 UTC 2006

On Thu, 2006-11-16 at 14:17 +0000, Andy Green wrote:
> James Pifer wrote:
> 
> >>> mdadm --create --verbose /dev/md0 --level=5 --raid-devices=3 /dev/hdb /dev/hdc dev/hdd
> 
> I don't suppose it's possible your cwd was / at that time?  Just 
> wondering why mdadm didn't complain.
> 
> -Andy
> 

Looking back through the history it's the only time I ran it. I guess at
this point it's a hard lesson learned... 

The worst thing is that I thought everything was good because I was
reading and writing to the array yesterday. So because I needed the
space on the other machine the copy of the data was using, I removed it
late last night.

Wish I kept it til I had a better understanding of RAID and the
commands, such as looking at /proc/mdstat. I apparently had a false
sense of security not realizing I screwed up the setup of the raid
array.

Is any of this helpful?

[root at storage ~]# mdadm --query /dev/hdb
/dev/hdb: is not an md array
/dev/hdb: device 0 in 3 device undetected raid5 /dev/md0.  Use mdadm --examine for more detail.
[root at storage ~]# mdadm --query /dev/hdc
/dev/hdc: is not an md array
/dev/hdc: device 1 in 3 device undetected raid5 /dev/md0.  Use mdadm --examine for more detail.
[root at storage ~]# mdadm --query /dev/hdd
/dev/hdd: is not an md array
/dev/hdd: device 3 in 3 device undetected raid5 /dev/md0.  Use mdadm --examine for more detail.
[root at storage ~]# mdadm --query --examine /dev/hdb
/dev/hdb:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 44cfc75c:25ae3e32:0fbb5311:50edfa8a
  Creation Time : Wed Nov 15 09:42:22 2006
     Raid Level : raid5
    Device Size : 156290816 (149.05 GiB 160.04 GB)
     Array Size : 312581632 (298.10 GiB 320.08 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 0

    Update Time : Thu Nov 16 02:49:29 2006
          State : active
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 85c9306 - correct
         Events : 0.57595

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     0       3       64        0      active sync   /dev/hdb

   0     0       3       64        0      active sync   /dev/hdb
   1     1      22        0        1      active sync   /dev/hdc
   2     2       0        0        2      faulty removed

Is there anyway to force it to try and reload the array even with the
failed device? I'm not getting drive errors on the device any longer. Is
the failed device the "dev/hdd" where I missed the leading "/"? Or, is
the failed device /dev/hdb?

What else can I look at? What other commands should I run? 

Can I force it to rebuid md0 with hdb and hdc? Right now I get:
[root at storage ~]# mdadm -v --run --force /dev/md0
mdadm: failed to run array /dev/md0: Invalid argument

Right now it looks like md0 does not exist.

James
(sorry for the top post earlier...)