[dm-devel] IO error on DM device

Murthy, Narasimha Doraswamy (STSD) narasimha.murthy at hp.com
Wed Mar 29 16:03:03 UTC 2006


Hi Alasdair,

We are seeing an IO error problem on a DM device, when the HBA ports of
another host, seen through the same switch are disabled/enable. We are
not understanding on why the paths are failed when ports on other hosts
are disabled. Please explain. 
Below is the problem description and steps to reproduce.

Problem     :  I/O Error on DM device on one host when HBA ports of
another host are disabled.
OS distros :  RHEL4.0 U2/U3.

HOW-TO reproduce the problem:

1. Configure 2 storage arrays (A1, A2) and two host (H1, H2) in the same
zone, so that both the hosts can see both the arrays. Create and present
LUNs (L1, L2) from array (A1) to host (H1)

2. Stop the multipathd daemon (for testing purpose on why the IO error
when ports of other hosts are failed). Not stopping it may take long
time to reproduce the problem.

3. Start I/O on DM device representing luns L1 and L2 on host H1. We
used dt tool for IO exercising.

4. Disable host ports of host H2 or any port of array A2 one after the
other (few times) OR disable and enable the same port of the other host
- few times (may be 4-5 times).

5. Application (dt tool) aborts with IO error on host H1.


=====
 Snippet of sys log output (while doing I/O on /dev/dm-0)


Feb  1 11:47:14 apwtest52 kernel: SCSI error : <2 0 0 1> return code =
0x20000            
Feb  1 11:47:14 apwtest52 kernel: end_request: I/O error, dev sda,
sector 1584600
Feb  1 11:47:14 apwtest52 kernel: device-mapper: dm-multipath: Failing
path 8:0.     <=================path failed, after disabling/enabling
the H2 host port 1 
Feb  1 11:47:14 apwtest52 kernel: end_request: I/O error, dev sda,
sector 1584608
Feb  1 11:47:45 apwtest52 kernel: SCSI error : <3 0 1 1> return code =
0x20000                 
Feb  1 11:47:45 apwtest52 kernel: end_request: I/O error, dev sdg,
sector 861400 
Feb  1 11:47:45 apwtest52 kernel: device-mapper: dm-multipath: Failing
path 8:96.   <=================path failed, after disabling/enabling
the H2 host port 2
Feb  1 11:47:45 apwtest52 kernel: end_request: I/O error, dev sdg,
sector 861408 
Feb  1 11:47:45 apwtest52 kernel: SCSI error : <3 0 0 1> return code =
0x20000
Feb  1 11:47:45 apwtest52 kernel: end_request: I/O error, dev sde,
sector 452760 
Feb  1 11:47:45 apwtest52 kernel: device-mapper: dm-multipath: Failing
path 8:64.  <=================path failed after disabling/enabling the
H2 host port 1
Feb  1 11:47:45 apwtest52 kernel: end_request: I/O error, dev sde,
sector 452768 
Feb  1 11:47:45 apwtest52 kernel: SCSI error : <3 0 0 1> return code =
0x20000
Feb  1 11:47:45 apwtest52 kernel: end_request: I/O error, dev sde,
sector 453784 
Feb  1 11:47:45 apwtest52 kernel: end_request: I/O error, dev sde,
sector 453792 
Feb  1 11:47:45 apwtest52 kernel: SCSI error : <3 0 0 1> return code =
0x20000
Feb  1 11:47:45 apwtest52 kernel: end_request: I/O error, dev sde,
sector 454808 
Feb  1 11:47:45 apwtest52 kernel: end_request: I/O error, dev sde,
sector 454816
Feb  1 11:47:45 apwtest52 kernel: SCSI error : <3 0 0 1> return code =
0x20000
Feb  1 11:47:45 apwtest52 kernel: end_request: I/O error, dev sde,
sector 863960
Feb  1 11:47:45 apwtest52 kernel: end_request: I/O error, dev sde,
sector 863968 
Feb  1 11:48:40 apwtest52 kernel: SCSI error : <2 0 1 1> return code =
0x20000
Feb  1 11:48:40 apwtest52 kernel: end_request: I/O error, dev sdc,
sector 935384 
Feb  1 11:48:40 apwtest52 kernel: device-mapper: dm-multipath: Failing
path 8:32.  <================= after disabling/enabling  the H2 host
port 2
Feb  1 11:48:40 apwtest52 kernel: end_request: I/O error, dev sdc,
sector 935392 
Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116924    <============All path to the device /dev/dm-0
failed
Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116925
Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116926
Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116927
Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116928
Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116929
Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116930
Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116931
Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116932
Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116933
Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116934
Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116935
Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116936
Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116937
Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116938
Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116939
Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116940
                         
Observations :

      As we do the port failure on the other host, paths of the dm
device is failed and the subsequent port (i.e A2 or H2 ports)
disabling/enabling results into more numbers of path failure and that
leads into all path failure condition, which in turn results into IO
error on RHEL4.0 U2/U3.

     Through the device-mapper debug driver we are finding that the
there is no valid path in __choose_pgpath() and  m->current_pgpath (m is
pointer to struct multipath) is null when it comes to map_io() in
dm-mpath.c.

Another observation is that we are not seeing any IO errors when the
same test is executed on SLES9 SP3/SP4. 

Please provide some pointers on why we are seeing this behavior or is
this a known thing at this point in time?

Thanks and regards
-Murthy







 

  


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20060329/b926b016/attachment.htm>


More information about the dm-devel mailing list