[dm-devel] two paths remain failed on DS6800 after code upgrade

Fri Jun 13 10:14:15 UTC 2008

Hello,
I have a test server connected to an IBM DS6800 storage.
It is a blade bl480c with two qlogic hbas, connected to 2 fc-switches.
RH EL 4.6 x86_64 installed (kernel 2.6.9-67.ELsmp)
device-mapper-1.02.21-1.el4
device-mapper-multipath-0.4.5-27.RHEL4

In boot messages I have for the hbas:
qla2400 0000:0c:00.0: Found an ISP2432, irq 185, iobase 0xffffff000001c000
QLogic Fibre Channel HBA Driver: 8.01.07-d4
QLogic QMH2462 - SBUS to 2Gb FC, Dual Channel
ISP2432: PCIe (2.5Gb/s x4) @ 0000:0c:00.0 hdma+, host#=0, fw=4.00.150 [IP]
Vendor: IBM       Model: 1750500           Rev: .155
Type:   Direct-Access                      ANSI SCSI revision: 05

On the storage I have access to two luns, so that in total I get 8 paths and
disks from sda to sdh.
In multipath I'm using default os install config for ds6800 (storage
1750500)
so it should be:
#      device {
#               vendor                  "IBM"
#               product                 "1750500"
#               path_grouping_policy    group_by_prio
#               getuid_callout          "/sbin/scsi_id -g -u -s"
#               prio_callout            "/sbin/mpath_prio_alua %d"
#               features                "1 queue_if_no_path"
#               path_checker            tur
#       }

In normal operation the command "multipath -ll" gives:

[root at test-rhel-p ~]# multipath -ll

*mpath1 (3600507630efe05800000000000001700)*

[size=20 GB][features="1 queue_if_no_path"][hwhandler="0"]

\_ round-robin 0 [prio=100]*[active]*                                     **

*     \_ 0:0:1:1 sdd 8:48  [active][ready]*

*     \_ 1:0:1:1 sdh 8:112 [active][ready]*

\_ round-robin 0 [prio=20][enabled]

     \_ 0:0:0:1 sdb 8:16  [active][ready]

     \_ 1:0:0:1 sdf 8:80  [active][ready]

*mpath0 (3600507630efe05800000000000001600)*

[size=20 GB][features="1 queue_if_no_path"][hwhandler="0"]

\_ round-robin 0 [prio=100]*[active] *

*     \_ 0:0:0:0 sda 8:0   [active][ready]*

*     \_ 1:0:0:0 sde 8:64  [active][ready]*

\_ round-robin 0 [prio=20][enabled]

     \_ 0:0:1:0 sdc 8:32  [active][ready]

     \_ 1:0:1:0 sdg 8:96  [active][ready]

We had a code update for the storage, and so I wanted to test the multipath
behaviour.
It was made in concurrent mode.
I get a first path-change whithout problems, probably when fisrt controller
was updated.

mpath1:

\_ round-robin 0 [enabled]

     *\_ 0:0:0:1 sdb 8:16  [failed]*

*     \_ 1:0:0:1 sdf 8:80  [failed]*
and
mpath0:

\_ round-robin 0 [enabled]

*     \_ 0:0:0:0 sda 8:0   [failed]*

*     \_ 1:0:0:0 sde 8:64  [failed]*
while the other two path group remained active.
At the end of upgrade, probably with the second controller update, I get the
situation below.
while other servers with windows and Linux (using sdd) came back with all
paths, this server retains two paths in failed state:

[root at test-rhel-p RPMS]# multipath -l

mpath1 (3600507630efe05800000000000001700)

[size=20 GB][features="1 queue_if_no_path"][hwhandler="0"]

\_ round-robin 0 [enabled]

* \_ 0:0:1:1 sdd 8:48  [failed][faulty]*

 \_ 1:0:1:1 sdh 8:112 [active]

\_ round-robin 0 [enabled]

 \_ 0:0:0:1 sdb 8:16  [active]

 \_ 1:0:0:1 sdf 8:80  [active]

mpath0 (3600507630efe05800000000000001600)

[size=20 GB][features="1 queue_if_no_path"][hwhandler="0"]

\_ round-robin 0 [active]

 \_ 0:0:0:0 sda 8:0   [active]

 \_ 1:0:0:0 sde 8:64  [active]

\_ round-robin 0 [enabled]

* \_ 0:0:1:0 sdc 8:32  [failed][faulty]*

 \_ 1:0:1:0 sdg 8:96  [active]

with messages every 5 seconds of type:

error calling out /sbin/mpath_prio_alua /dev/sdc
error calling out /sbin/mpath_prio_alua /dev/sdd

Other information:
[root at test-rhel-p ]# sg_inq /dev/sdc
sg_inq: error opening file: /dev/sdc: No such device or address

[root at test-rhel-p RPMS]# ll /dev/sdc
brw-rw----  1 root disk 8, 32 Jun 11 19:03 /dev/sdc

[root at test-rhel-p RPMS]# sg_inq /dev/sda
standard INQUIRY:
  PQual=0  Device_type=0  RMB=0  version=0x05  [SPC-3]
  [AERC=0]  [TrmTsk=0]  NormACA=1  HiSUP=1  Resp_data_format=2
  SCCS=0  ACC=0  TGPS=1  3PC=0  Protect=0  BQue=0
  EncServ=0  MultiP=1 (VS=0)  [MChngr=0]  [ACKREQQ=0]  Addr16=0
  [RelAdr=0]  WBus16=0  Sync=0  Linked=0  [TranDis=0]  CmdQue=1
  Clocking=0x0  QAS=0  IUS=0
    length=164 (0xa4)   Peripheral device type: disk
 Vendor identification: IBM
 Product identification: 1750500
 Product revision level: .441
 Unit serial number: 68778501600

Any help to get up the paths?
Could it help a scsi rescan? What should be the correct command in this
case?
The system is operational and without interruption on disk acces for the
users, but I don't understand why the paths don't come up again...

Thanks in advance for help or suggestions.
Gianluca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20080613/5e6478f2/attachment.htm>