[dm-devel] dm-multipath failover
Jimmie
dm-devel at chaj.com
Mon Nov 29 16:15:46 UTC 2004
I changed the debug level to 7 in Makefile and recompiled. Don't see a
daemon.log. Is it supposed to be in /var/log? Either way, I'll post the
failover sequence.
Multipath startup:
Nov 28 12:27:04 nfstest1 multipathd: --------start up--------
Nov 28 12:27:04 nfstest1 multipathd: read /etc/multipath.conf
Nov 28 12:27:04 nfstest1 multipathd: ramfs maxsize is 94344
Nov 28 12:27:04 nfstest1 multipathd: start DM events thread
Nov 28 12:27:04 nfstest1 multipathd: path checkers start up
Nov 28 12:27:04 nfstest1 multipathd: initial reconfigure multipath maps
Nov 28 12:27:04 nfstest1 multipathd: refresh devmaps list
Nov 28 12:27:04 nfstest1 multipathd: refresh failpaths list
Nov 28 12:27:04 nfstest1 multipathd: set readsector0 path checker for sdc
Nov 28 12:27:04 nfstest1 multipathd: path checker startup : 8:32
Nov 28 12:27:04 nfstest1 multipathd: set readsector0 path checker for sdd
Nov 28 12:27:04 nfstest1 multipathd: path checker startup : 8:48
Nov 28 12:27:04 nfstest1 multipathd: set readsector0 path checker for sde
Nov 28 12:27:04 nfstest1 multipathd: path checker startup : 8:64
Nov 28 12:27:04 nfstest1 multipathd: set readsector0 path checker for sdf
Nov 28 12:27:04 nfstest1 multipathd: path checker startup : 8:80
Nov 28 12:27:04 nfstest1 multipathd: set readsector0 path checker for sdg
Nov 28 12:27:04 nfstest1 multipathd: path checker startup : 8:96
Nov 28 12:27:04 nfstest1 multipathd: set readsector0 path checker for sdh
Nov 28 12:27:04 nfstest1 multipathd: path checker startup : 8:112
Nov 28 12:27:04 nfstest1 multipathd: start up event loops
Nov 28 12:27:04 nfstest1 multipathd: event checker startup : big01
When I pull out the FC from port 1 of the QLogic card:
Nov 29 10:48:55 nfstest1 kernel: qla2300 0000:03:0b.0: LIP reset occured
(f8cb).
Nov 29 10:48:57 nfstest1 kernel: qla2300 0000:03:0b.0: LOOP DOWN detected.
Nov 29 10:49:57 nfstest1 kernel: SCSI error : <1 0 0 1> return code = 0x20000
Nov 29 10:49:57 nfstest1 kernel: end_request: I/O error, dev sdc, sector 5424
Nov 29 10:49:57 nfstest1 kernel: end_request: I/O error, dev sdc, sector 5432
Nov 29 10:49:57 nfstest1 multipathd: devmap event on big01
Nov 29 10:49:57 nfstest1 multipathd: big01 : reconfigure multipath map
Nov 29 10:50:25 nfstest1 kernel: SCSI error : <1 0 0 1> return code = 0x10000
Nov 29 10:50:25 nfstest1 kernel: SCSI error : <1 0 0 1> return code = 0x10000
Nov 29 10:50:26 nfstest1 kernel: SCSI error : <1 0 0 2> return code = 0x10000
Nov 29 10:50:26 nfstest1 kernel: SCSI error : <1 0 0 3> return code = 0x10000
Nov 29 10:50:26 nfstest1 kernel: SCSI error : <1 0 0 1> return code = 0x10000
Nov 29 10:50:26 nfstest1 kernel: SCSI error : <1 0 0 1> return code = 0x10000
Nov 29 10:50:26 nfstest1 kernel: SCSI error : <1 0 0 2> return code = 0x10000
Multipath detects the failure and remaps:
Nov 29 10:50:26 nfstest1 multipathd: refresh devmaps list
Nov 29 10:50:26 nfstest1 multipathd: refresh failpaths list
Nov 29 10:50:26 nfstest1 multipathd: path checker already active : 8:32
Nov 29 10:50:26 nfstest1 multipathd: path checker already active : 8:48
Nov 29 10:50:26 nfstest1 multipathd: path checker already active : 8:64
Nov 29 10:50:26 nfstest1 multipathd: path checker already active : 8:80
Nov 29 10:50:26 nfstest1 multipathd: path checker already active : 8:96
Nov 29 10:50:26 nfstest1 multipathd: path checker already active : 8:112
Nov 29 10:50:26 nfstest1 multipathd: start up event loops
Nov 29 10:50:26 nfstest1 multipathd: event checker startup : big01
After I put port1 back in and then pull out port2 (with a couple of minute
wait in between):
Nov 29 10:53:12 nfstest1 kernel: qla2300 0000:03:0b.1: LIP reset occured
(b5b5).
Nov 29 10:53:13 nfstest1 kernel: qla2300 0000:03:0b.1: LOOP DOWN detected.
Nov 29 10:54:10 nfstest1 kernel: SCSI error : <2 0 0 1> return code = 0x20000
Nov 29 10:54:10 nfstest1 kernel: end_request: I/O error, dev sdf, sector 7944
Nov 29 10:54:10 nfstest1 kernel: end_request: I/O error, dev sdf, sector 7952
Nov 29 10:54:11 nfstest1 multipathd: devmap event on big01
Nov 29 10:54:11 nfstest1 multipathd: big01 : reconfigure multipath map
Nov 29 10:54:11 nfstest1 kernel: Buffer I/O error on device dm-0, logical
block
1000
Nov 29 10:54:11 nfstest1 kernel: lost page write due to I/O error on dm-0
Nov 29 10:54:11 nfstest1 kernel: Aborting journal on device dm-0.
Nov 29 10:54:12 nfstest1 kernel: ext3_abort called.
Nov 29 10:54:12 nfstest1 kernel: EXT3-fs error (device dm-0):
ext3_journal_start
: Detected aborted journal
Nov 29 10:54:12 nfstest1 kernel: Remounting filesystem read-only
and then a bunch of:
Nov 29 10:54:45 nfstest1 kernel: Buffer I/O error on device dm-0, logical
block
56983554
Nov 29 10:54:45 nfstest1 kernel: lost page write due to I/O error on dm-0
Nov 29 10:54:45 nfstest1 kernel: Buffer I/O error on device dm-0, logical
block
56983555
Nov 29 10:54:45 nfstest1 kernel: lost page write due to I/O error on dm-0
Nov 29 10:54:45 nfstest1 kernel: Buffer I/O error on device dm-0, logical
block
56983556
Nov 29 10:54:45 nfstest1 kernel: lost page write due to I/O error on dm-0
Nov 29 10:54:45 nfstest1 kernel: Buffer I/O error on device dm-0, logical
block
56983557
Nov 29 10:54:45 nfstest1 kernel: lost page write due to I/O error on dm-0
Nov 29 10:54:45 nfstest1 kernel: Buffer I/O error on device dm-0, logical
block
56983558
Nov 29 10:54:45 nfstest1 kernel: lost page write due to I/O error on dm-0
Nov 29 10:54:45 nfstest1 kernel: Buffer I/O error on device dm-0, logical
block
56983566
So basically i get some scsi related errors. Is this normal? Does multipath
failover only work one way? Any ideas? Please help.
Jimmie
On Thu, 25 Nov 2004, christophe varoqui wrote:
> The daemon should log in daemon.log
> You can push the debug level to the max and post the trace.
>
> In the mean time, you can also make sure you didn't apply the patchset
> from Mike Christie which used to be appended a the tail of the -udm
> patchset. These patches broke the event model used by the daemon.
>
> regards,
> cvaroqui
>
> Le mercredi 24 novembre 2004 à 17:10 -0500, Jims a écrit :
> > We have a Dell unit with 2 QLogic 23XX series cards which are providing
> > multipathing to 3 EMC volumes. We're looking to have a failover setup (with
> > /dev/sdc and /dev/sdf) so that if one of FC connects is pulled, multipathd
> > will reroute the path to the other card and also be able to reestablish the
> > connection when the Fiber is put back.
> >
> > dmsetup is able to create the device in /udev (/udev/big01) and we're able to
> > mount it. When I pull an FC cable, the mount does indeed failover, however
> > when we put it back in and pull the other, we get a bunch of scsi errors and
> > the mount gets remounted in read-only mode. How can we remedy this? Any
> > similar experiences and/or suggestions? Thanks.
> >
> > By the way, sda and sdb are the system drives. sdd,sde,sdg,sdh are other FC
> > drives that we're not working with right now.
> >
> > our DMsetup table is as follows:
> >
> > DMsetup table <<start>>
> > 0 1885645370 multipath 2 round-robin 1 0 /dev/sdc round-robin 1 0 /dev/sdf
> > DMsetup table <<end>>
> >
> > here is our multipath.conf:
> >
> > multipath.conf <<start>>
> > defaults {
> > multipath_tool "/sbin/multipath -v 0 -S"
> > udev_dir /udev
> > polling_interval 5
> > default_selector round-robin
> > default_selector_args 0
> > default_path_grouping_policy failover
> > default_getuid_callout "/sbin/scsi_id -g -u -s"
> > default_prio_callout "/bin/false"
> > }
> >
> > devnode_blacklist {
> > devnode cciss
> > devnode fd
> > devnode hd
> > devnode md
> > devnode dm
> > devnode sr
> > devnode scd
> > devnode st
> > devnode ram
> > devnode raw
> > devnode loop
> > devnode sda
> > devnode sdb
> > }
> > multipaths {
> > multipath {
> > wwid 501566091000
> > alias big01
> > path_grouping_policy failover
> > path_selector round-robin
> > }
> > }
> > devices {
> > device {
> > vendor "SEMC "
> > product "SYMMETRIX "
> > path_grouping_policy failover
> > getuid_callout "/sbin/scsi_id -g -u -s"
> > path_checker readsector0
> > path_selector round-robin
> > }
> > }
> > multipath.conf <<end>>
> >
> > and finally output of multipath -v2
> >
> > output <<start>>
> > #
> > # all paths :
> > #
> > SEMC_____SYMMETRIX______501566091000 (1 0 0 1) sdc [ready ] (8:32) [SYMMETRIX
> > ]
> > SEMC_____SYMMETRIX______5015660D1000 (1 0 0 2) sdd [ready ] (8:48) [SYMMETRIX
> > ]
> > SEMC_____SYMMETRIX______501566111000 (1 0 0 3) sde [ready ] (8:64) [SYMMETRIX
> > ]
> > SEMC_____SYMMETRIX______501566091000 (2 0 0 1) sdf [ready ] (8:80) [SYMMETRIX
> > ]
> > SEMC_____SYMMETRIX______5015660D1000 (2 0 0 2) sdg [ready ] (8:96) [SYMMETRIX
> > ]
> > SEMC_____SYMMETRIX______501566111000 (2 0 0 3) sdh [ready ] (8:112) [SYMMETRIX
> > ]
> > #
> > # all multipaths :
> > #
> > SEMC_____SYMMETRIX______501566091000 [SYMMETRIX ]
> > \_(1 0 0 1) sdc [ready ] (8:32)
> > \_(2 0 0 1) sdf [ready ] (8:80)
> > SEMC_____SYMMETRIX______5015660D1000 [SYMMETRIX ]
> > \_(1 0 0 2) sdd [ready ] (8:48)
> > \_(2 0 0 2) sdg [ready ] (8:96)
> > SEMC_____SYMMETRIX______501566111000 [SYMMETRIX ]
> > \_(1 0 0 3) sde [ready ] (8:64)
> > \_(2 0 0 3) sdh [ready ] (8:112)
> > #
> > # device maps :
> > #
> > create:SEMC_____SYMMETRIX______501566091000:0 1885655040 multipath 2
> > round-robin 1 0 8:80 round-robin 1 0 8:32
> > create:SEMC_____SYMMETRIX______5015660D1000:0 1885655040 multipath 2
> > round-robin 1 0 8:96 round-robin 1 0 8:48
> > create:SEMC_____SYMMETRIX______501566111000:0 1885655040 multipath 2
> > round-robin 1 0 8:112 round-robin 1 0 8:64
> > output <<end>>
> >
> > Help please.
> >
> > --
> > dm-devel mailing list
> > dm-devel at redhat.com
> > https://www.redhat.com/mailman/listinfo/dm-devel
>
More information about the dm-devel
mailing list