[dm-devel] dm-multipath failover

Jimmie dm-devel at chaj.com
Mon Nov 29 16:15:46 UTC 2004


I changed the debug level to 7 in Makefile and recompiled. Don't see a
daemon.log. Is it supposed to be in /var/log? Either way, I'll post the
failover sequence.

Multipath startup:
Nov 28 12:27:04 nfstest1 multipathd: --------start up--------
Nov 28 12:27:04 nfstest1 multipathd: read /etc/multipath.conf
Nov 28 12:27:04 nfstest1 multipathd: ramfs maxsize is 94344
Nov 28 12:27:04 nfstest1 multipathd: start DM events thread
Nov 28 12:27:04 nfstest1 multipathd: path checkers start up
Nov 28 12:27:04 nfstest1 multipathd: initial reconfigure multipath maps
Nov 28 12:27:04 nfstest1 multipathd: refresh devmaps list
Nov 28 12:27:04 nfstest1 multipathd: refresh failpaths list
Nov 28 12:27:04 nfstest1 multipathd: set readsector0 path checker for sdc
Nov 28 12:27:04 nfstest1 multipathd: path checker startup : 8:32
Nov 28 12:27:04 nfstest1 multipathd: set readsector0 path checker for sdd
Nov 28 12:27:04 nfstest1 multipathd: path checker startup : 8:48
Nov 28 12:27:04 nfstest1 multipathd: set readsector0 path checker for sde
Nov 28 12:27:04 nfstest1 multipathd: path checker startup : 8:64
Nov 28 12:27:04 nfstest1 multipathd: set readsector0 path checker for sdf
Nov 28 12:27:04 nfstest1 multipathd: path checker startup : 8:80
Nov 28 12:27:04 nfstest1 multipathd: set readsector0 path checker for sdg
Nov 28 12:27:04 nfstest1 multipathd: path checker startup : 8:96
Nov 28 12:27:04 nfstest1 multipathd: set readsector0 path checker for sdh
Nov 28 12:27:04 nfstest1 multipathd: path checker startup : 8:112
Nov 28 12:27:04 nfstest1 multipathd: start up event loops
Nov 28 12:27:04 nfstest1 multipathd: event checker startup : big01

When I pull out the FC from port 1 of the QLogic card:
Nov 29 10:48:55 nfstest1 kernel: qla2300 0000:03:0b.0: LIP reset occured
(f8cb).
Nov 29 10:48:57 nfstest1 kernel: qla2300 0000:03:0b.0: LOOP DOWN detected.
Nov 29 10:49:57 nfstest1 kernel: SCSI error : <1 0 0 1> return code = 0x20000
Nov 29 10:49:57 nfstest1 kernel: end_request: I/O error, dev sdc, sector 5424
Nov 29 10:49:57 nfstest1 kernel: end_request: I/O error, dev sdc, sector 5432
Nov 29 10:49:57 nfstest1 multipathd: devmap event on big01
Nov 29 10:49:57 nfstest1 multipathd: big01 : reconfigure multipath map
Nov 29 10:50:25 nfstest1 kernel: SCSI error : <1 0 0 1> return code = 0x10000
Nov 29 10:50:25 nfstest1 kernel: SCSI error : <1 0 0 1> return code = 0x10000
Nov 29 10:50:26 nfstest1 kernel: SCSI error : <1 0 0 2> return code = 0x10000
Nov 29 10:50:26 nfstest1 kernel: SCSI error : <1 0 0 3> return code = 0x10000
Nov 29 10:50:26 nfstest1 kernel: SCSI error : <1 0 0 1> return code = 0x10000
Nov 29 10:50:26 nfstest1 kernel: SCSI error : <1 0 0 1> return code = 0x10000
Nov 29 10:50:26 nfstest1 kernel: SCSI error : <1 0 0 2> return code = 0x10000

Multipath detects the failure and remaps:
Nov 29 10:50:26 nfstest1 multipathd: refresh devmaps list
Nov 29 10:50:26 nfstest1 multipathd: refresh failpaths list
Nov 29 10:50:26 nfstest1 multipathd: path checker already active : 8:32
Nov 29 10:50:26 nfstest1 multipathd: path checker already active : 8:48
Nov 29 10:50:26 nfstest1 multipathd: path checker already active : 8:64
Nov 29 10:50:26 nfstest1 multipathd: path checker already active : 8:80
Nov 29 10:50:26 nfstest1 multipathd: path checker already active : 8:96
Nov 29 10:50:26 nfstest1 multipathd: path checker already active : 8:112
Nov 29 10:50:26 nfstest1 multipathd: start up event loops
Nov 29 10:50:26 nfstest1 multipathd: event checker startup : big01

After I put port1 back in and then pull out port2 (with a couple of minute
wait in between):
Nov 29 10:53:12 nfstest1 kernel: qla2300 0000:03:0b.1: LIP reset occured
(b5b5).
Nov 29 10:53:13 nfstest1 kernel: qla2300 0000:03:0b.1: LOOP DOWN detected.
Nov 29 10:54:10 nfstest1 kernel: SCSI error : <2 0 0 1> return code = 0x20000
Nov 29 10:54:10 nfstest1 kernel: end_request: I/O error, dev sdf, sector 7944
Nov 29 10:54:10 nfstest1 kernel: end_request: I/O error, dev sdf, sector 7952
Nov 29 10:54:11 nfstest1 multipathd: devmap event on big01
Nov 29 10:54:11 nfstest1 multipathd: big01 : reconfigure multipath map
Nov 29 10:54:11 nfstest1 kernel: Buffer I/O error on device dm-0, logical
block
1000
Nov 29 10:54:11 nfstest1 kernel: lost page write due to I/O error on dm-0
Nov 29 10:54:11 nfstest1 kernel: Aborting journal on device dm-0.
Nov 29 10:54:12 nfstest1 kernel: ext3_abort called.
Nov 29 10:54:12 nfstest1 kernel: EXT3-fs error (device dm-0):
ext3_journal_start
: Detected aborted journal
Nov 29 10:54:12 nfstest1 kernel: Remounting filesystem read-only

and then a bunch of:
Nov 29 10:54:45 nfstest1 kernel: Buffer I/O error on device dm-0, logical
block
56983554
Nov 29 10:54:45 nfstest1 kernel: lost page write due to I/O error on dm-0
Nov 29 10:54:45 nfstest1 kernel: Buffer I/O error on device dm-0, logical
block
56983555
Nov 29 10:54:45 nfstest1 kernel: lost page write due to I/O error on dm-0
Nov 29 10:54:45 nfstest1 kernel: Buffer I/O error on device dm-0, logical
block
56983556
Nov 29 10:54:45 nfstest1 kernel: lost page write due to I/O error on dm-0
Nov 29 10:54:45 nfstest1 kernel: Buffer I/O error on device dm-0, logical
block
56983557
Nov 29 10:54:45 nfstest1 kernel: lost page write due to I/O error on dm-0
Nov 29 10:54:45 nfstest1 kernel: Buffer I/O error on device dm-0, logical
block
56983558
Nov 29 10:54:45 nfstest1 kernel: lost page write due to I/O error on dm-0
Nov 29 10:54:45 nfstest1 kernel: Buffer I/O error on device dm-0, logical
block
56983566

So basically i get some scsi related errors. Is this normal? Does multipath
failover only work one way? Any ideas? Please help.

Jimmie



On Thu, 25 Nov 2004, christophe varoqui wrote:

> The daemon should log in daemon.log
> You can push the debug level to the max and post the trace.
>
> In the mean time, you can also make sure you didn't apply the patchset
> from Mike Christie which used to be appended a the tail of the -udm
> patchset. These patches broke the event model used by the daemon.
>
> regards,
> cvaroqui
>
> Le mercredi 24 novembre 2004 à 17:10 -0500, Jims a écrit :
> > We have a Dell unit with 2 QLogic 23XX series cards which are providing
> > multipathing to 3 EMC volumes. We're looking to have a failover setup (with
> > /dev/sdc and /dev/sdf) so that if one of FC connects is pulled, multipathd
> > will reroute the path to the other card and also be able to reestablish the
> > connection when the Fiber is put back.
> >
> > dmsetup is able to create the device in /udev (/udev/big01) and we're able to
> > mount it. When I pull an FC cable, the mount does indeed failover, however
> > when we put it back in and pull the other, we get a bunch of scsi errors and
> > the mount gets remounted in read-only mode. How can we remedy this? Any
> > similar experiences and/or suggestions? Thanks.
> >
> > By the way, sda and sdb are the system drives. sdd,sde,sdg,sdh are other FC
> > drives that we're not working with right now.
> >
> > our DMsetup table is as follows:
> >
> > DMsetup table <<start>>
> > 0 1885645370 multipath 2 round-robin 1 0 /dev/sdc round-robin 1 0 /dev/sdf
> > DMsetup table <<end>>
> >
> > here is our multipath.conf:
> >
> > multipath.conf <<start>>
> > defaults {
> >         multipath_tool  "/sbin/multipath -v 0 -S"
> >         udev_dir        /udev
> >         polling_interval 5
> >         default_selector        round-robin
> >         default_selector_args   0
> >         default_path_grouping_policy    failover
> >         default_getuid_callout  "/sbin/scsi_id -g -u -s"
> >         default_prio_callout    "/bin/false"
> > }
> >
> > devnode_blacklist {
> >         devnode cciss
> >         devnode fd
> >         devnode hd
> >         devnode md
> >         devnode dm
> >         devnode sr
> >         devnode scd
> >         devnode st
> >         devnode ram
> >         devnode raw
> >         devnode loop
> >         devnode sda
> >         devnode sdb
> > }
> > multipaths {
> >         multipath {
> >                 wwid    501566091000
> >                 alias   big01
> >                 path_grouping_policy    failover
> >                 path_selector           round-robin
> >         }
> > }
> > devices {
> >         device {
> >                 vendor                  "SEMC     "
> >                 product                 "SYMMETRIX      "
> >                 path_grouping_policy    failover
> >                 getuid_callout          "/sbin/scsi_id -g -u -s"
> >                 path_checker            readsector0
> >                 path_selector           round-robin
> >         }
> > }
> > multipath.conf <<end>>
> >
> > and finally output of multipath -v2
> >
> > output <<start>>
> > #
> > # all paths :
> > #
> > SEMC_____SYMMETRIX______501566091000 (1 0 0 1) sdc [ready ] (8:32) [SYMMETRIX
> > ]
> > SEMC_____SYMMETRIX______5015660D1000 (1 0 0 2) sdd [ready ] (8:48) [SYMMETRIX
> > ]
> > SEMC_____SYMMETRIX______501566111000 (1 0 0 3) sde [ready ] (8:64) [SYMMETRIX
> > ]
> > SEMC_____SYMMETRIX______501566091000 (2 0 0 1) sdf [ready ] (8:80) [SYMMETRIX
> > ]
> > SEMC_____SYMMETRIX______5015660D1000 (2 0 0 2) sdg [ready ] (8:96) [SYMMETRIX
> > ]
> > SEMC_____SYMMETRIX______501566111000 (2 0 0 3) sdh [ready ] (8:112) [SYMMETRIX
> > ]
> > #
> > # all multipaths :
> > #
> > SEMC_____SYMMETRIX______501566091000 [SYMMETRIX       ]
> >  \_(1 0 0 1) sdc [ready ] (8:32)
> >  \_(2 0 0 1) sdf [ready ] (8:80)
> > SEMC_____SYMMETRIX______5015660D1000 [SYMMETRIX       ]
> >  \_(1 0 0 2) sdd [ready ] (8:48)
> >  \_(2 0 0 2) sdg [ready ] (8:96)
> > SEMC_____SYMMETRIX______501566111000 [SYMMETRIX       ]
> >  \_(1 0 0 3) sde [ready ] (8:64)
> >  \_(2 0 0 3) sdh [ready ] (8:112)
> > #
> > # device maps :
> > #
> > create:SEMC_____SYMMETRIX______501566091000:0 1885655040 multipath 2
> > round-robin 1 0 8:80 round-robin 1 0 8:32
> > create:SEMC_____SYMMETRIX______5015660D1000:0 1885655040 multipath 2
> > round-robin 1 0 8:96 round-robin 1 0 8:48
> > create:SEMC_____SYMMETRIX______501566111000:0 1885655040 multipath 2
> > round-robin 1 0 8:112 round-robin 1 0 8:64
> > output <<end>>
> >
> > Help please.
> >
> > --
> > dm-devel mailing list
> > dm-devel at redhat.com
> > https://www.redhat.com/mailman/listinfo/dm-devel
>




More information about the dm-devel mailing list