[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: IBM xSeries / Linux / RAID / Failure of whole system



Mark requested additional information - here is what we have - any ideas 
gratefully received!
Many thanks
Mandy Shaw

******

We have 2 logical drives both of which are raided.

It is the 2nd logical drive which has had disk failures causing the 
machine to crash.

Raid dumplog is appended. Note presence of hot spare, which surely should 
have made the thing even more resilient.

/ and /boot are on separate partitions but both are on logical drive 1.

Only the /home directory is on logical drive 2. Output from df is...

Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/sda7               294367    253083     26084  91% /
/dev/sda1                30985     30009         0 100% /boot
/dev/sdb1            349896904 272680752  59442336  83% /home
/dev/sda8             47417172   1451600  43556876   4% /home2
none                   1158300         0   1158300   0% /dev/shm
/dev/sda2             10072676   1206136   8354880  13% /usr
/dev/sda5             10080332     97668   9470604   2% /var

The nature of the failure is that it appears that a disk fails but then 
the whole raided set of disks (logical drive 2) is marked as down. All 
that appears in /var/log/messages is

Apr  3 17:03:42 tin kernel: SCSI disk error : host 2 channel 0 id 1 lun 0 
return code = 70000
Apr  3 17:03:42 tin kernel:  I/O error: dev 08:11, sector 40894664
Apr  3 17:13:28 tin kernel: (ips0) Resetting controller.

The full log of which this is part is available if any help.

******

RAID dumplog:

Serveraid Log Collection Utility for RedHat Linux systems Version v1.0
Date Logs Taken        : Tue Mar 11 11:40:13 GMT 2003
Nodename of this system: tin
Model Type             : 86695RX
Serial Number          : 551983W
Operating system       : Linux
Kernel Version         : 2.4.9-e.3smp
Raid Manager Version   : package RaidMan is not installed
Driver                 : -rw-r--r--    1 root     root        51740 Feb 20 
14:07 /lib/modules/2.4.9-e.3smp/kernel/drivers/scsi/ips.o
Number of Serveraid adapters found in this machine: 1
************************************
***IPSSEND CONFIG AL LOG************
************************************

Found 1 IBM ServeRAID controller(s).
Read configuration has been initiated for controller 1...
-------------------------------------------------------------------------------
Controller information
-------------------------------------------------------------------------------
   Controller type                : ServeRAID-4Mx
   BIOS version                   : 5.11.05
   Firmware version               : 5.11.05
   Boot block version             : 5.11.05
   Device driver version          : 5.11.05 
   Controller slot information    : 4
   Controller Name                : Main
   SCSI channel description       : 2 parallel SCSI wide
   Initiator IDs (Channel/SCSI ID): 1/7 2/7
   Maximum physical devices       : 30
   Defunct disk drive count       : 0
   Logical drives/Offline/Critical: 2/0/0
   Read ahead                     : Adaptive
   Stripe-unit size               : 16 KB
   Rebuild rate (Low/Medium/High) : High
   Hot-swap rebuild               : Enabled
   Data scrubbing                 : Enabled
   Part of cluster (Yes/No)       : Yes
   Unattended mode (Yes/No)       : Yes
   Concurrent commands supported  : 96
   Configuration update count     : 295
-------------------------------------------------------------------------------
Logical drive information
-------------------------------------------------------------------------------
 Logical drive number 1
   Status of logical drive        : Okay (OKY)
   RAID level                     : 5 
   Size (in MB)                   : 69430
   Write cache status             : Write back (WB)
   Number of chunks               : 3 
   Stripe-unit size               : 16 KB
   Access blocked                 : No 
   Part of array                  : A 
   Part of merge group            : 207 
 Logical drive number 2
   Status of logical drive        : Okay (OKY)
   RAID level                     : 5 
   Size (in MB)                   : 347150
   Write cache status             : Write back (WB)
   Number of chunks               : 11 
   Stripe-unit size               : 16 KB
   Access blocked                 : No 
   Part of array                  : B 
   Part of merge group            : 207 

   Array A stripe order (Channel/SCSI ID)  : 1,0 1,1 1,2 
   Array B stripe order (Channel/SCSI ID)  : 2,1 2,2 2,3 2,4 2,5 2,8 2,9 
2,10 
                                    2,11 2,12 2,13 
-------------------------------------------------------------------------------
Physical device information
-------------------------------------------------------------------------------
   Channel #1:
      Initiator at SCSI ID 7
      Target on SCSI ID 0
         Device is a Hard disk
         SCSI ID                  : 0
         PFA (Yes/No)             : No
         State                    : Online (ONL)
         Size (in MB)/(in sectors): 34715/71096368
         Device ID                : IBM-ESXSST336752B8413ET0G2C3
         FRU part number          : 06P5352 
      Target on SCSI ID 1
         Device is a Hard disk
         SCSI ID                  : 1
         PFA (Yes/No)             : No
         State                    : Online (ONL)
         Size (in MB)/(in sectors): 34715/71096368
         Device ID                : IBM-ESXSST336752B8413ET0FVZX
         FRU part number          : 06P5352 
      Target on SCSI ID 2
         Device is a Hard disk
         SCSI ID                  : 2
         PFA (Yes/No)             : No
         State                    : Online (ONL)
         Size (in MB)/(in sectors): 34715/71096368
         Device ID                : IBM-ESXSST336752B8413ET0GJVC
         FRU part number          : 06P5352 
      Target on SCSI ID 8
         Device is a Processor device
         SCSI ID                  : 8
         PFA (Yes/No)             : No
         State                    : Standby (SBY)
         Size (in MB)/(in sectors): 0/0
         Device ID                : IBM     YGLv3 S20   000 
   Channel #2:
      Initiator at SCSI ID 7
      Target on SCSI ID 0
         Device is a Hard disk
         SCSI ID                  : 0
         PFA (Yes/No)             : No
         State                    : Hot spare (HSP)
         Size (in MB)/(in sectors): 34715/71096368
         Device ID                : IBM-ESXSST336752B8413ET0DH5H
         FRU part number          : 06P5352 
      Target on SCSI ID 1
         Device is a Hard disk
         SCSI ID                  : 1
         PFA (Yes/No)             : No
         State                    : Online (ONL)
         Size (in MB)/(in sectors): 34715/71096368
         Device ID                : IBM-ESXSST336752B8413ET0G1ND
         FRU part number          : 06P5352 
      Target on SCSI ID 2
         Device is a Hard disk
         SCSI ID                  : 2
         PFA (Yes/No)             : No
         State                    : Online (ONL)
         Size (in MB)/(in sectors): 34715/71096368
         Device ID                : IBM-ESXSST336752B8413ET0G9RY
         FRU part number          : 06P5352 
      Target on SCSI ID 3
         Device is a Hard disk
         SCSI ID                  : 3
         PFA (Yes/No)             : No
         State                    : Online (ONL)
         Size (in MB)/(in sectors): 34715/71096368
         Device ID                : IBM-ESXSST336752B8413ET0GE60
         FRU part number          : 06P5352 
      Target on SCSI ID 4
         Device is a Hard disk
         SCSI ID                  : 4
         PFA (Yes/No)             : No
         State                    : Online (ONL)
         Size (in MB)/(in sectors): 34715/71096368
         Device ID                : IBM-ESXSST336752B8413ET0FY63
         FRU part number          : 06P5352 
      Target on SCSI ID 5
         Device is a Hard disk
         SCSI ID                  : 5
         PFA (Yes/No)             : No
         State                    : Online (ONL)
         Size (in MB)/(in sectors): 34715/71096368
         Device ID                : IBM-ESXSST336752B8413ET0GDNC
         FRU part number          : 06P5352 
      Target on SCSI ID 8
         Device is a Hard disk
         SCSI ID                  : 8
         PFA (Yes/No)             : No
         State                    : Online (ONL)
         Size (in MB)/(in sectors): 34715/71096368
         Device ID                : IBM-ESXSST336752B8433ET0M2SL
         FRU part number          : 06P5352 
      Target on SCSI ID 9
         Device is a Hard disk
         SCSI ID                  : 9
         PFA (Yes/No)             : No
         State                    : Online (ONL)
         Size (in MB)/(in sectors): 34715/71096368
         Device ID                : IBM-ESXSST336752B8413ET0FYB0
         FRU part number          : 06P5352 
      Target on SCSI ID 10
         Device is a Hard disk
         SCSI ID                  : 10
         PFA (Yes/No)             : No
         State                    : Online (ONL)
         Size (in MB)/(in sectors): 34715/71096368
         Device ID                : IBM-ESXSST336752B8413ET0G03X
         FRU part number          : 06P5352 
      Target on SCSI ID 11
         Device is a Hard disk
         SCSI ID                  : 11
         PFA (Yes/No)             : No
         State                    : Online (ONL)
         Size (in MB)/(in sectors): 34715/71096368
         Device ID                : IBM-ESXSST336752B8413ET0GDGH
         FRU part number          : 06P5352 
      Target on SCSI ID 12
         Device is a Hard disk
         SCSI ID                  : 12
         PFA (Yes/No)             : No
         State                    : Online (ONL)
         Size (in MB)/(in sectors): 34715/71096368
         Device ID                : IBM-ESXSST336753B8553HX08YMN
         FRU part number          : 06P5770 
      Target on SCSI ID 13
         Device is a Hard disk
         SCSI ID                  : 13
         PFA (Yes/No)             : No
         State                    : Online (ONL)
         Size (in MB)/(in sectors): 34715/71096368
         Device ID                : IBM-ESXSST336753B8553HX0966Y
         FRU part number          : 06P5770 
      Target on SCSI ID 15
         Device is a Processor device
         SCSI ID                  : 15
         PFA (Yes/No)             : No
         State                    : Standby (SBY)
         Size (in MB)/(in sectors): 0/0
         Device ID                : IBM     EXP300  D0146828627 
Command completed successfully.
************************************
***IPSSEND GETBST BAD STRIPE TABLE**
************************************

Found 1 IBM ServeRAID controller(s).
Get bad stripe information has been initiated for controller 1...
   Logical drive 1   -   0 bad stripe table entries
   Logical drive 2   -   0 bad stripe table entries
Command completed successfully.
************************************
***IPSSEND GETEVENT DEVICE LOG******
************************************

Found 1 IBM ServeRAID controller(s).
Get event table has been initiated for controller 1...
   BIOS version                   : 5.11.05
   Firmware version               : 5.11.05
Device event table:
   |Channel|SCSI ID|Parity |Soft   |Hard   |PFA    |Misc   |
   |-------|-------|-------|-------|-------|-------|-------|
   | 1     | 0     | 0     | 0     | 0     | No    | 0     |
   | 1     | 1     | 0     | 0     | 0     | No    | 0     |
   | 1     | 2     | 0     | 0     | 0     | No    | 0     |
   | 1     | 3     | 0     | 0     | 0     | No    | 0     |
   | 1     | 4     | 0     | 0     | 0     | No    | 0     |
   | 1     | 5     | 0     | 0     | 0     | No    | 0     |
   | 1     | 6     | 0     | 0     | 0     | No    | 0     |
   | 1     | 7     | 0     | 0     | 0     | No    | 0     |
   | 1     | 8     | 0     | 0     | 0     | No    | 0     |
   | 1     | 9     | 0     | 0     | 0     | No    | 0     |
   | 1     | 10    | 0     | 0     | 0     | No    | 0     |
   | 1     | 11    | 0     | 0     | 0     | No    | 0     |
   | 1     | 12    | 0     | 0     | 0     | No    | 0     |
   | 1     | 13    | 0     | 0     | 0     | No    | 0     |
   | 1     | 14    | 0     | 0     | 0     | No    | 0     |
   | 1     | 15    | 0     | 0     | 0     | No    | 0     |
   |-------|-------|-------|-------|-------|-------|-------|
   | 2     | 0     | 0     | 0     | 0     | No    | 0     |
   | 2     | 1     | 0     | 0     | 0     | No    | 0     |
   | 2     | 2     | 0     | 0     | 0     | No    | 0     |
   | 2     | 3     | 0     | 0     | 0     | No    | 0     |
   | 2     | 4     | 0     | 0     | 0     | No    | 0     |
   | 2     | 5     | 0     | 0     | 0     | No    | 0     |
   | 2     | 6     | 0     | 0     | 0     | No    | 0     |
   | 2     | 7     | 0     | 0     | 0     | No    | 0     |
   | 2     | 8     | 0     | 0     | 0     | No    | 0     |
   | 2     | 9     | 0     | 0     | 0     | No    | 0     |
   | 2     | 10    | 0     | 0     | 0     | No    | 0     |
   | 2     | 11    | 0     | 0     | 0     | No    | 0     |
   | 2     | 12    | 0     | 0     | 0     | No    | 0     |
   | 2     | 13    | 0     | 0     | 0     | No    | 0     |
   | 2     | 14    | 0     | 0     | 0     | No    | 0     |
   | 2     | 15    | 0     | 0     | 0     | No    | 0     |
Command completed successfully.
************************************
***IPSSEND GETEVENT SOFT LOG********
************************************

Found 1 IBM ServeRAID controller(s).
Get event table has been initiated for controller 1...
   BIOS version                   : 5.11.05
   Firmware version               : 5.11.05
Controller soft event log (16 entries):
70100001
800006F2
20FB0008
80001421
70100001
80001F03
20FB0009
80003BB1
70100001
800007ED
20FB0008
80001519
70100001
8000206D
20FB0009
80003CAB
Command completed successfully.
************************************
***IPSSEND GETEVENT HARD LOG********
************************************

Found 1 IBM ServeRAID controller(s).
Get event table has been initiated for controller 1...
   BIOS version                   : 5.11.05
   Firmware version               : 5.11.05
Controller hard event log (0 entries):
Command completed successfully.





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]