[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]
Re: IBM xSeries / Linux / RAID / Failure of whole system
- From: Mandy Shaw Notability com
- To: redhat-list redhat com
- Cc: Peter Lancaster Notability com, Liz Knight Notability com,Alex Boulton Notability com
- Subject: Re: IBM xSeries / Linux / RAID / Failure of whole system
- Date: Thu May 1 02:36:01 2003
Mark requested additional information - here is what we have - any ideas
gratefully received!
Many thanks
Mandy Shaw
******
We have 2 logical drives both of which are raided.
It is the 2nd logical drive which has had disk failures causing the
machine to crash.
Raid dumplog is appended. Note presence of hot spare, which surely should
have made the thing even more resilient.
/ and /boot are on separate partitions but both are on logical drive 1.
Only the /home directory is on logical drive 2. Output from df is...
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/sda7 294367 253083 26084 91% /
/dev/sda1 30985 30009 0 100% /boot
/dev/sdb1 349896904 272680752 59442336 83% /home
/dev/sda8 47417172 1451600 43556876 4% /home2
none 1158300 0 1158300 0% /dev/shm
/dev/sda2 10072676 1206136 8354880 13% /usr
/dev/sda5 10080332 97668 9470604 2% /var
The nature of the failure is that it appears that a disk fails but then
the whole raided set of disks (logical drive 2) is marked as down. All
that appears in /var/log/messages is
Apr 3 17:03:42 tin kernel: SCSI disk error : host 2 channel 0 id 1 lun 0
return code = 70000
Apr 3 17:03:42 tin kernel: I/O error: dev 08:11, sector 40894664
Apr 3 17:13:28 tin kernel: (ips0) Resetting controller.
The full log of which this is part is available if any help.
******
RAID dumplog:
Serveraid Log Collection Utility for RedHat Linux systems Version v1.0
Date Logs Taken : Tue Mar 11 11:40:13 GMT 2003
Nodename of this system: tin
Model Type : 86695RX
Serial Number : 551983W
Operating system : Linux
Kernel Version : 2.4.9-e.3smp
Raid Manager Version : package RaidMan is not installed
Driver : -rw-r--r-- 1 root root 51740 Feb 20
14:07 /lib/modules/2.4.9-e.3smp/kernel/drivers/scsi/ips.o
Number of Serveraid adapters found in this machine: 1
************************************
***IPSSEND CONFIG AL LOG************
************************************
Found 1 IBM ServeRAID controller(s).
Read configuration has been initiated for controller 1...
-------------------------------------------------------------------------------
Controller information
-------------------------------------------------------------------------------
Controller type : ServeRAID-4Mx
BIOS version : 5.11.05
Firmware version : 5.11.05
Boot block version : 5.11.05
Device driver version : 5.11.05
Controller slot information : 4
Controller Name : Main
SCSI channel description : 2 parallel SCSI wide
Initiator IDs (Channel/SCSI ID): 1/7 2/7
Maximum physical devices : 30
Defunct disk drive count : 0
Logical drives/Offline/Critical: 2/0/0
Read ahead : Adaptive
Stripe-unit size : 16 KB
Rebuild rate (Low/Medium/High) : High
Hot-swap rebuild : Enabled
Data scrubbing : Enabled
Part of cluster (Yes/No) : Yes
Unattended mode (Yes/No) : Yes
Concurrent commands supported : 96
Configuration update count : 295
-------------------------------------------------------------------------------
Logical drive information
-------------------------------------------------------------------------------
Logical drive number 1
Status of logical drive : Okay (OKY)
RAID level : 5
Size (in MB) : 69430
Write cache status : Write back (WB)
Number of chunks : 3
Stripe-unit size : 16 KB
Access blocked : No
Part of array : A
Part of merge group : 207
Logical drive number 2
Status of logical drive : Okay (OKY)
RAID level : 5
Size (in MB) : 347150
Write cache status : Write back (WB)
Number of chunks : 11
Stripe-unit size : 16 KB
Access blocked : No
Part of array : B
Part of merge group : 207
Array A stripe order (Channel/SCSI ID) : 1,0 1,1 1,2
Array B stripe order (Channel/SCSI ID) : 2,1 2,2 2,3 2,4 2,5 2,8 2,9
2,10
2,11 2,12 2,13
-------------------------------------------------------------------------------
Physical device information
-------------------------------------------------------------------------------
Channel #1:
Initiator at SCSI ID 7
Target on SCSI ID 0
Device is a Hard disk
SCSI ID : 0
PFA (Yes/No) : No
State : Online (ONL)
Size (in MB)/(in sectors): 34715/71096368
Device ID : IBM-ESXSST336752B8413ET0G2C3
FRU part number : 06P5352
Target on SCSI ID 1
Device is a Hard disk
SCSI ID : 1
PFA (Yes/No) : No
State : Online (ONL)
Size (in MB)/(in sectors): 34715/71096368
Device ID : IBM-ESXSST336752B8413ET0FVZX
FRU part number : 06P5352
Target on SCSI ID 2
Device is a Hard disk
SCSI ID : 2
PFA (Yes/No) : No
State : Online (ONL)
Size (in MB)/(in sectors): 34715/71096368
Device ID : IBM-ESXSST336752B8413ET0GJVC
FRU part number : 06P5352
Target on SCSI ID 8
Device is a Processor device
SCSI ID : 8
PFA (Yes/No) : No
State : Standby (SBY)
Size (in MB)/(in sectors): 0/0
Device ID : IBM YGLv3 S20 000
Channel #2:
Initiator at SCSI ID 7
Target on SCSI ID 0
Device is a Hard disk
SCSI ID : 0
PFA (Yes/No) : No
State : Hot spare (HSP)
Size (in MB)/(in sectors): 34715/71096368
Device ID : IBM-ESXSST336752B8413ET0DH5H
FRU part number : 06P5352
Target on SCSI ID 1
Device is a Hard disk
SCSI ID : 1
PFA (Yes/No) : No
State : Online (ONL)
Size (in MB)/(in sectors): 34715/71096368
Device ID : IBM-ESXSST336752B8413ET0G1ND
FRU part number : 06P5352
Target on SCSI ID 2
Device is a Hard disk
SCSI ID : 2
PFA (Yes/No) : No
State : Online (ONL)
Size (in MB)/(in sectors): 34715/71096368
Device ID : IBM-ESXSST336752B8413ET0G9RY
FRU part number : 06P5352
Target on SCSI ID 3
Device is a Hard disk
SCSI ID : 3
PFA (Yes/No) : No
State : Online (ONL)
Size (in MB)/(in sectors): 34715/71096368
Device ID : IBM-ESXSST336752B8413ET0GE60
FRU part number : 06P5352
Target on SCSI ID 4
Device is a Hard disk
SCSI ID : 4
PFA (Yes/No) : No
State : Online (ONL)
Size (in MB)/(in sectors): 34715/71096368
Device ID : IBM-ESXSST336752B8413ET0FY63
FRU part number : 06P5352
Target on SCSI ID 5
Device is a Hard disk
SCSI ID : 5
PFA (Yes/No) : No
State : Online (ONL)
Size (in MB)/(in sectors): 34715/71096368
Device ID : IBM-ESXSST336752B8413ET0GDNC
FRU part number : 06P5352
Target on SCSI ID 8
Device is a Hard disk
SCSI ID : 8
PFA (Yes/No) : No
State : Online (ONL)
Size (in MB)/(in sectors): 34715/71096368
Device ID : IBM-ESXSST336752B8433ET0M2SL
FRU part number : 06P5352
Target on SCSI ID 9
Device is a Hard disk
SCSI ID : 9
PFA (Yes/No) : No
State : Online (ONL)
Size (in MB)/(in sectors): 34715/71096368
Device ID : IBM-ESXSST336752B8413ET0FYB0
FRU part number : 06P5352
Target on SCSI ID 10
Device is a Hard disk
SCSI ID : 10
PFA (Yes/No) : No
State : Online (ONL)
Size (in MB)/(in sectors): 34715/71096368
Device ID : IBM-ESXSST336752B8413ET0G03X
FRU part number : 06P5352
Target on SCSI ID 11
Device is a Hard disk
SCSI ID : 11
PFA (Yes/No) : No
State : Online (ONL)
Size (in MB)/(in sectors): 34715/71096368
Device ID : IBM-ESXSST336752B8413ET0GDGH
FRU part number : 06P5352
Target on SCSI ID 12
Device is a Hard disk
SCSI ID : 12
PFA (Yes/No) : No
State : Online (ONL)
Size (in MB)/(in sectors): 34715/71096368
Device ID : IBM-ESXSST336753B8553HX08YMN
FRU part number : 06P5770
Target on SCSI ID 13
Device is a Hard disk
SCSI ID : 13
PFA (Yes/No) : No
State : Online (ONL)
Size (in MB)/(in sectors): 34715/71096368
Device ID : IBM-ESXSST336753B8553HX0966Y
FRU part number : 06P5770
Target on SCSI ID 15
Device is a Processor device
SCSI ID : 15
PFA (Yes/No) : No
State : Standby (SBY)
Size (in MB)/(in sectors): 0/0
Device ID : IBM EXP300 D0146828627
Command completed successfully.
************************************
***IPSSEND GETBST BAD STRIPE TABLE**
************************************
Found 1 IBM ServeRAID controller(s).
Get bad stripe information has been initiated for controller 1...
Logical drive 1 - 0 bad stripe table entries
Logical drive 2 - 0 bad stripe table entries
Command completed successfully.
************************************
***IPSSEND GETEVENT DEVICE LOG******
************************************
Found 1 IBM ServeRAID controller(s).
Get event table has been initiated for controller 1...
BIOS version : 5.11.05
Firmware version : 5.11.05
Device event table:
|Channel|SCSI ID|Parity |Soft |Hard |PFA |Misc |
|-------|-------|-------|-------|-------|-------|-------|
| 1 | 0 | 0 | 0 | 0 | No | 0 |
| 1 | 1 | 0 | 0 | 0 | No | 0 |
| 1 | 2 | 0 | 0 | 0 | No | 0 |
| 1 | 3 | 0 | 0 | 0 | No | 0 |
| 1 | 4 | 0 | 0 | 0 | No | 0 |
| 1 | 5 | 0 | 0 | 0 | No | 0 |
| 1 | 6 | 0 | 0 | 0 | No | 0 |
| 1 | 7 | 0 | 0 | 0 | No | 0 |
| 1 | 8 | 0 | 0 | 0 | No | 0 |
| 1 | 9 | 0 | 0 | 0 | No | 0 |
| 1 | 10 | 0 | 0 | 0 | No | 0 |
| 1 | 11 | 0 | 0 | 0 | No | 0 |
| 1 | 12 | 0 | 0 | 0 | No | 0 |
| 1 | 13 | 0 | 0 | 0 | No | 0 |
| 1 | 14 | 0 | 0 | 0 | No | 0 |
| 1 | 15 | 0 | 0 | 0 | No | 0 |
|-------|-------|-------|-------|-------|-------|-------|
| 2 | 0 | 0 | 0 | 0 | No | 0 |
| 2 | 1 | 0 | 0 | 0 | No | 0 |
| 2 | 2 | 0 | 0 | 0 | No | 0 |
| 2 | 3 | 0 | 0 | 0 | No | 0 |
| 2 | 4 | 0 | 0 | 0 | No | 0 |
| 2 | 5 | 0 | 0 | 0 | No | 0 |
| 2 | 6 | 0 | 0 | 0 | No | 0 |
| 2 | 7 | 0 | 0 | 0 | No | 0 |
| 2 | 8 | 0 | 0 | 0 | No | 0 |
| 2 | 9 | 0 | 0 | 0 | No | 0 |
| 2 | 10 | 0 | 0 | 0 | No | 0 |
| 2 | 11 | 0 | 0 | 0 | No | 0 |
| 2 | 12 | 0 | 0 | 0 | No | 0 |
| 2 | 13 | 0 | 0 | 0 | No | 0 |
| 2 | 14 | 0 | 0 | 0 | No | 0 |
| 2 | 15 | 0 | 0 | 0 | No | 0 |
Command completed successfully.
************************************
***IPSSEND GETEVENT SOFT LOG********
************************************
Found 1 IBM ServeRAID controller(s).
Get event table has been initiated for controller 1...
BIOS version : 5.11.05
Firmware version : 5.11.05
Controller soft event log (16 entries):
70100001
800006F2
20FB0008
80001421
70100001
80001F03
20FB0009
80003BB1
70100001
800007ED
20FB0008
80001519
70100001
8000206D
20FB0009
80003CAB
Command completed successfully.
************************************
***IPSSEND GETEVENT HARD LOG********
************************************
Found 1 IBM ServeRAID controller(s).
Get event table has been initiated for controller 1...
BIOS version : 5.11.05
Firmware version : 5.11.05
Controller hard event log (0 entries):
Command completed successfully.
[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]