[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[dm-devel] Failover on StorageWorks HSG80



First I want to thank everyone for helping with my original
question:

  http://www.redhat.com/archives/dm-devel/2006-April/msg00086.html

Now that I have basic connectivity working I have started
to test failover. Here is what I see before initiating failover:

sfeehan dogwood:~$ sudo multipath -ll
red (360001fe10015bf500009947159810015)
[size=203 GB][features=1 queue_if_no_path][hwhandler=1 hp_sw]
\_ round-robin 0 [prio=2][active]
 \_ 0:0:0:1 sda 8:0   [active][ready]
 \_ 0:0:1:1 sdb 8:16  [active][ready]
\_ round-robin 0 [prio=2][enabled]
 \_ 0:0:2:1 sdc 8:32  [active][ghost]
 \_ 0:0:3:1 sdd 8:48  [active][ghost]

I connect to the controller that is not "active" for the unit and do:

>>> shutdown other

Things seem to go well at first. In syslog I see:

Apr 21 11:52:22 dogwood -- MARK --
Apr 21 11:53:12 dogwood kernel: [42950687.100000]  rport-0:0-0: blocked FC remote port time out: removing target and saving binding
Apr 21 11:53:12 dogwood kernel: [42950687.210000]  rport-0:0-1: blocked FC remote port time out: removing target and saving binding
Apr 21 11:53:12 dogwood kernel: [42950687.330000]  0:0:0:1: SCSI error: return code = 0x10000
Apr 21 11:53:12 dogwood kernel: [42950687.400000] end_request: I/O error, dev sda, sector 20981638
Apr 21 11:53:12 dogwood kernel: [42950687.480000] end_request: I/O error, dev sda, sector 20981646
Apr 21 11:53:12 dogwood kernel: [42950687.560000] device-mapper: dm-multipath: Failing path 8:0.
Apr 21 11:53:12 dogwood multipathd: 8:0: hp_sw checker reports path is down
Apr 21 11:53:12 dogwood kernel: [42950687.630000]  0:0:1:1: rejecting I/O to dead device
Apr 21 11:53:12 dogwood kernel: [42950687.700000] device-mapper: dm-multipath: Failing path 8:16.
Apr 21 11:53:12 dogwood kernel: [42950687.770000] device-mapper: hp_sw: queueing START_STOP command on 8:48
Apr 21 11:53:12 dogwood kernel: [42950687.860000]  0:0:1:1: rejecting I/O to dead device
Apr 21 11:53:12 dogwood multipathd: checker failed path 8:0 in map red
Apr 21 11:53:12 dogwood multipathd: red: remaining active paths: 3
Apr 21 11:53:12 dogwood kernel: [42950687.930000] device-mapper: hp_sw: hp_sw_endio 0x8000002
Apr 21 11:53:13 dogwood kernel: [42950687.930000] dm-hp-sw: Current: sense key: Unit Attention
Apr 21 11:53:13 dogwood kernel: [42950687.930000]     <<vendor>> ASC=0xa0 ASCQ=0x8ASC=0xa0 ASCQ=0x8
Apr 21 11:53:13 dogwood multipathd: 8:16: hp_sw checker reports path is down
Apr 21 11:53:13 dogwood multipathd: checker failed path 8:16 in map red
Apr 21 11:53:13 dogwood multipathd: red: remaining active paths: 2
Apr 21 11:53:13 dogwood multipathd: 8:48: hp_sw checker reports path is up
Apr 21 11:53:13 dogwood multipathd: 8:48: reinstated
Apr 21 11:53:13 dogwood kernel: [42950688.200000] device-mapper: dm-multipath: error getting device
Apr 21 11:53:13 dogwood kernel: [42950688.280000] device-mapper: error adding target to table
Apr 21 11:53:13 dogwood multipathd: sda: remove path (uevent)
Apr 21 11:53:13 dogwood multipathd: red: failed in domap for removal of path sda
Apr 21 11:53:13 dogwood multipathd: uevent trigger error
Apr 21 11:53:13 dogwood multipathd: sdb: remove path (uevent)
Apr 21 11:53:13 dogwood multipathd: red: load table [0 426583554 multipath 1 queue_if_no_path 1 hp_sw 1 1 round-robin 0 2 1 8:32 1000 8:48 1000]
Apr 21 11:53:13 dogwood kernel: [42950688.350000] device-mapper: hp_sw: queueing START_STOP command on 8:48
Apr 21 11:53:33 dogwood multipathd: 8:32: hp_sw checker reports path is up
Apr 21 11:53:33 dogwood multipathd: 8:32: reinstated


So far I think everything is OK. IO appears to continue. And this is
what 'multipath -ll' says:

sfeehan dogwood:~$ sudo multipath -ll
Password:
red (360001fe10015bf500009947159810015)
[size=203 GB][features=1 queue_if_no_path][hwhandler=1 hp_sw]
\_ round-robin 0 [prio=2][active]
 \_ 0:0:2:1 sdc 8:32  [active][ready]
 \_ 0:0:3:1 sdd 8:48  [active][ready]

And then I restart the down controller. A few seconds later, 
the devices (sda and sdb) are detected and I get the "READ CAPACITY
failed" errors and these errors (repeated continuosly):

Apr 21 12:04:15 dogwood multipathd: red: failed in domap for addition of new path sda
Apr 21 12:04:15 dogwood multipathd: red: uev_add_path sleep
Apr 21 12:04:16 dogwood kernel: [42951351.920000] device-mapper: device 8:0 too small for target
Apr 21 12:04:16 dogwood kernel: [42951352.000000] device-mapper: dm-multipath: error getting device
Apr 21 12:04:17 dogwood kernel: [42951352.080000] device-mapper: error adding target to table
Apr 21 12:04:17 dogwood multipathd: red: failed in domap for addition of new path sda
Apr 21 12:04:17 dogwood multipathd: red: uev_add_path sleep
Apr 21 12:04:18 dogwood kernel: [42951353.160000] device-mapper: device 8:0 too small for target
Apr 21 12:04:18 dogwood kernel: [42951353.240000] device-mapper: dm-multipath: error getting device
Apr 21 12:04:18 dogwood kernel: [42951353.320000] device-mapper: error adding target to table

So I suspsect that I need to do the "force path size redetection" trick
(which is also done by the init script at boot):

root dogwood:~# sg_start -start /dev/sda; sleep 1; \
  echo 1 > /sys/block/sda/device/rescan
root dogwood:~# sg_start -start /dev/sdb; sleep 1; \
  echo 1 > /sys/block/sdb/device/rescan

But it doesn't seem to make a difference. The errors continue.
And on top of all this, I see:

root dogwood:~# multipath -ll
red (360001fe10015bf500009947159810015)
[size=203 GB][features=1 queue_if_no_path][hwhandler=1 hp_sw]
\_ round-robin 0 [prio=2][enabled]
 \_ 0:0:2:1 sdc 8:32  [failed][ghost]
 \_ 0:0:3:1 sdd 8:48  [failed][ghost]

At this point, I do:

root dogwood:~# multipath -v2
device-mapper ioctl cmd 9 failed: Invalid argument
device-mapper ioctl cmd 9 failed: Invalid argument
device-mapper ioctl cmd 9 failed: Invalid argument
device-mapper ioctl cmd 9 failed: Invalid argument
root dogwood:~#
root dogwood:~# multipath -ll
red (360001fe10015bf500009947159810015)
[size=203 GB][features=1 queue_if_no_path][hwhandler=1 hp_sw]
\_ round-robin 0 [prio=2][active]
 \_ 0:0:2:1 sdc 8:32  [active][ready]
 \_ 0:0:3:1 sdd 8:48  [active][ready]


And then IO /appears/ (at least according to iostat) to continue 
on that path. But it's still complaining about the size of device
8:0 (sda). So I do the "force path size reduction" trick again,
and again it fails. And on top of that, IO stops and I get the same 
output from 'multipath -ll':

root dogwood:~# multipath -ll
red (360001fe10015bf500009947159810015)
[size=203 GB][features=1 queue_if_no_path][hwhandler=1 hp_sw]
\_ round-robin 0 [prio=2][enabled]
 \_ 0:0:2:1 sdc 8:32  [failed][ghost]
 \_ 0:0:3:1 sdd 8:48  [failed][ghost]

Doing:

root dogwood:~# multipath -v2
device-mapper ioctl cmd 9 failed: Invalid argument
device-mapper ioctl cmd 9 failed: Invalid argument
device-mapper ioctl cmd 9 failed: Invalid argument
device-mapper ioctl cmd 9 failed: Invalid argument
root dogwood:~# multipath -ll
red (360001fe10015bf500009947159810015)
[size=203 GB][features=1 queue_if_no_path][hwhandler=1 hp_sw]
\_ round-robin 0 [prio=2][active]
 \_ 0:0:2:1 sdc 8:32  [active][ready]
 \_ 0:0:3:1 sdd 8:48  [active][ready]

Seems to get me right back where I was before.

I hope I've described the problem accurately (and I apologize
for the length of the post). So does anyone have a comment on 
what's going on and how to resolve this? How can I bring the
devices (sda and sdb) back into the multipath configruration
short of rebooting the system?

Thanks.

ps. I have started (and will continue) to document all of
this on the Wiki. Here is what I have so far (very incomplete):

http://christophe.varoqui.free.fr/wiki/wakka.php?wiki=UbuntuHsg80Install

-- 
Steve Feehan


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]