[linux-cluster] multipath issue... Smells of hardware issue.

Fri Jun 29 15:23:20 UTC 2007

Hi,

I have a setup with two identical RX200s3 FuSi servers talking to a SAN
(SX60 + extra controller), and that works fine with gfs1.

I do however see some errors on one of the servers. It's in my message
log and only now and then now and then (though always under load, but i
cant load it and thereby force it to give the error).

The error says:
Jun 28 15:44:17 app02 multipathd: 8:16: mark as failed
Jun 28 15:44:17 app02 multipathd: main_disk_volume1: remaining active
paths: 1
Jun 28 15:44:17 app02 kernel: sd 2:0:0:0: SCSI error: return code =
0x00070000
Jun 28 15:44:17 app02 kernel: end_request: I/O error, dev sdb, sector
705160231
Jun 28 15:44:17 app02 kernel: device-mapper: multipath: Failing path
8:16.
Jun 28 15:44:22 app02 multipathd: sdb: readsector0 checker reports path
is up
Jun 28 15:44:22 app02 multipathd: 8:16: reinstated
Jun 28 15:44:22 app02 multipathd: main_disk_volume1: remaining active
paths: 2
Jun 28 15:46:02 app02 multipathd: 8:32: mark as failed
Jun 28 15:46:02 app02 multipathd: main_disk_volume1: remaining active
paths: 1
Jun 28 15:46:02 app02 kernel: sd 3:0:0:0: SCSI error: return code =
0x00070000
Jun 28 15:46:02 app02 kernel: end_request: I/O error, dev sdc, sector
739870727
Jun 28 15:46:02 app02 kernel: device-mapper: multipath: Failing path
8:32.
Jun 28 15:46:06 app02 multipathd: sdc: readsector0 checker reports path
is up
Jun 28 15:46:06 app02 multipathd: 8:32: reinstated
Jun 28 15:46:06 app02 multipathd: main_disk_volume1: remaining active
paths: 2

To me i looks like a fiber that bounces up and down. (There is no switch
involved).

Sometimes i only get a slightly shorter version:
Jun 29 09:04:32 app02 kernel: sd 2:0:0:0: SCSI error: return code =
0x00070000
Jun 29 09:04:32 app02 kernel: end_request: I/O error, dev sdb, sector
2782490295
Jun 29 09:04:32 app02 kernel: device-mapper: multipath: Failing path
8:16.
Jun 29 09:04:32 app02 multipathd: 8:16: mark as failed
Jun 29 09:04:32 app02 multipathd: main_disk_volume1: remaining active
paths: 1
Jun 29 09:04:37 app02 multipathd: sdb: readsector0 checker reports path
is up
Jun 29 09:04:37 app02 multipathd: 8:16: reinstated
Jun 29 09:04:37 app02 multipathd: main_disk_volume1: remaining active
paths: 2

Any sugestions, but start swapping hardware?

Mvh / Kind regards

Kristoffer Lippert
Systemansvarlig
JP/Politiken A/S
Online Magasiner

Tlf. +45 8738 3032
Cell. +45 6062 8703

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20070629/9be6414e/attachment.htm>