[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [dm-devel] Multipath and HSG80 phase 2



As James Smart suggested, you can try to decrease the HBA driver nodev
timeout value to 1 sec.

I have done the cable unplug / plug test regularly and it work reliably
for me. I'll try the controler restart test tomorow.

regards,
cvaroqui

Le lundi 13 d?embre 2004 à 11:24 +0100, Nicola Ranaldo a écrit :
> > Indeed,
> > can you audit your fixes in
> > http://christophe.varoqui.free.fr/multipath-tools/multipath-tools-0.4.0.tar.bz2
> > before I release it ?
> 
> Ok, now the tools does not segs, but the last check i have to do is about
> the clone syscall, on my system (slackware 10.0)  i have to use fork in 
> order to have multipathd daemons
> run.
> While using clone strace multipathd gives:
> 
> brk(0)                                  = 0x8051000
> brk(0x8052000)                          = 0x8052000
> brk(0)                                  = 0x8052000
> brk(0)                                  = 0x8052000
> brk(0x8056000)                          = 0x8056000
> clone(child_stack=0x8055040, flags=CLONE_NEWNS) = 2443
> exit_group(0)                           = ?
> 
> and the process dies...
> it's the clone call necessary? does the process run properly even if i use
> fork?
> 
> > ... and report on general behaviour.
> 
> Ok, some progress is done :)))
> 
> Failover initiated by an "sg_start /dev/sgx 1" works properly! and i can do
> a lot of switches between active and ghost path, with a 1/2 second delay
> between each other, with no process disruption! great :)
> 
> howewer a failover initiated by a "restart other" on the hsg80 console 
> gives:
> 
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 655327
> Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 656343
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 656351
> Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 657367
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 657375
> Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 658391
> Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 658399
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 658407
> Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 658903
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 658911
> Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 659927
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 659935
> Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 660951
> Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 660959
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 660967
> Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 661983
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 661991
> Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 663007
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 663015
> Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 664031
> Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 664039
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 664047
> Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 664791
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 664799
> Dec 13 11:13:55 m3 kernel: Incorrect number of segments after building list
> Dec 13 11:13:55 m3 kernel: counted 8, received 1
> Dec 13 11:13:55 m3 kernel: req nr_sec 1024, cur_nr_sec 8
> Dec 13 11:13:55 m3 kernel: Buffer I/O error on device dm-1, logical block 
> 81523
> Dec 13 11:13:55 m3 kernel: lost page write due to I/O error on dm-1
> Dec 13 11:13:55 m3 kernel: Buffer I/O error on device dm-1, logical block 
> 81524
> Dec 13 11:13:55 m3 kernel: lost page write due to I/O error on dm-1
> Dec 13 11:13:55 m3 kernel: Buffer I/O error on device dm-1, logical block 
> 81525
> Dec 13 11:13:55 m3 kernel: lost page write due to I/O error on dm-1
> Dec 13 11:13:55 m3 kernel: Buffer I/O error on device dm-1, logical block 
> 81526
> Dec 13 11:13:55 m3 kernel: lost page write due to I/O error on dm-1
> Dec 13 11:13:55 m3 kernel: Buffer I/O error on device dm-1, logical block 
> 81527
> Dec 13 11:13:55 m3 kernel: lost page write due to I/O error on dm-1
> Dec 13 11:13:55 m3 kernel: Buffer I/O error on device dm-1, logical block 
> 81528
> Dec 13 11:13:55 m3 kernel: lost page write due to I/O error on dm-1
> Dec 13 11:13:55 m3 kernel: Buffer I/O error on device dm-1, logical block 
> 81529
> Dec 13 11:13:55 m3 kernel: lost page write due to I/O error on dm-1
> Dec 13 11:13:55 m3 kernel: Buffer I/O error on device dm-1, logical block 
> 81530
> Dec 13 11:13:55 m3 kernel: lost page write due to I/O error on dm-1
> Dec 13 11:13:55 m3 kernel: Buffer I/O error on device dm-1, logical block 
> 81531
> Dec 13 11:13:55 m3 kernel: lost page write due to I/O error on dm-1
> Dec 13 11:13:55 m3 kernel: Buffer I/O error on device dm-1, logical block 
> 81532
> Dec 13 11:13:55 m3 kernel: lost page write due to I/O error on dm-1
> Dec 13 11:13:55 m3 kernel: Incorrect number of segments after building list
> Dec 13 11:13:55 m3 kernel: counted 8, received 1
> Dec 13 11:13:55 m3 kernel: req nr_sec 1024, cur_nr_sec 8
> Dec 13 11:14:06 m3 kernel: SCSI error : <0 0 1 2> return code = 0x10000
> Dec 13 11:14:06 m3 multipathd: 8:16 : tur checker reports path is down
> Dec 13 11:14:06 m3 kernel: SCSI error : <0 0 1 2> return code = 0x10000
> Dec 13 11:14:06 m3 last message repeated 4 times
> Dec 13 11:14:06 m3 multipathd: event checker startup : disk1
> Dec 13 11:14:16 m3 multipathd: 8:0 : tur checker reports path is up
> Dec 13 11:14:18 m3 kernel: SCSI error : <0 0 1 2> return code = 0x10000
> Dec 13 11:14:18 m3 last message repeated 2 times
> Dec 13 11:14:42 m3 kernel: Incorrect number of segments after building list
> Dec 13 11:14:42 m3 kernel: counted 8, received 1
> Dec 13 11:14:42 m3 kernel: req nr_sec 1024, cur_nr_sec 8
> Dec 13 11:14:42 m3 multipathd: devmap event on disk1
> Dec 13 11:14:42 m3 kernel: Incorrect number of segments after building list
> Dec 13 11:14:42 m3 kernel: counted 8, received 1
> Dec 13 11:14:42 m3 kernel: req nr_sec 1024, cur_nr_sec 8
> Dec 13 11:14:42 m3 kernel: Incorrect number of segments after building list
> Dec 13 11:14:42 m3 kernel: counted 8, received 1
> Dec 13 11:14:42 m3 kernel: req nr_sec 1024, cur_nr_sec 8
> Dec 13 11:14:44 m3 kernel: SCSI error : <0 0 1 2> return code = 0x10000
> Dec 13 11:14:44 m3 last message repeated 2 times
> Dec 13 11:14:44 m3 multipathd: event checker startup : disk1
> 
> after a long delay the random write operation (blocked due to the fail) 
> restarts!
> 
> but in the log i have:
> 
> Dec 13 11:15:43 m3 kernel: SCSI error : <0 0 1 2> return code = 0x10000
> Dec 13 11:16:16 m3 last message repeated 12 times
> Dec 13 11:16:44 m3 last message repeated 17 times
> 
> and multipath -l -v3 gives
> 0:0:1:2: sg_io failed status 0x0 0x1 0x0 0x0
> 0:0:1:2: Unable to get INQUIRY vpd 1 page 0x0.
> disk1 (360001fe1001613800009205005470164)
> [size=33 GB][features="0"][hwhandler="0"]
> \_ round-robin 0 [active][first]
>   \_ 0:0:0:2 sda  8:0     [ready ][active]
> 
> the second path is lose!
> 
> to double check giving an sg_start on the lose path i get:
> start_stop: Host_status=0x01 [DID_NO_CONNECT]
> 
> all this without oops
> 
> thanks
> 
>     Nicola Ranaldo
> 
> --
> dm-devel mailing list
> dm-devel redhat com
> https://www.redhat.com/mailman/listinfo/dm-devel
-- 
christophe varoqui <christophe varoqui free fr>



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]