[dm-devel] rdac path failure - Sun 6140
Stewart Smith
stew at cleepdar.com
Fri Aug 14 15:58:00 UTC 2009
I will test this today.
Does everything look OK from a configuration standpoint? Should the
RDAC virtual HBA drivers from LSI be a requirement? I am not
currently using them.
Thank you,
--
Stew
On Thu, Aug 13, 2009 at 5:05 PM, Moger, Babu<Babu.Moger at lsi.com> wrote:
> Stew,
>
> I don’t see much information about this failure in the logs. Right now
> device handlers don’t provide much information on failures. We are working
> on to add some more debug levels. I am attaching my draft code
> (scsi_dh_rdac.c) here. Please use this only for your testing. It is not
> been approved/reviewed yet. I still need to submit this one to community for
> approval. The code is attached. Please replace this file with
> scsi_dh_rdac.c in the directory /driver/scsi/device_handlers and rebuild the
> kernel. This should give more information from the target point of view.
> Please send me the /var/log/messages file after the failure. Let see if we
> can get more information..
>
>
>
> Thanks
>
> Babu Moger
>
> ________________________________
>
> From: dm-devel-bounces at redhat.com [mailto:dm-devel-bounces at redhat.com] On
> Behalf Of Stewart Smith
> Sent: Thursday, August 13, 2009 3:35 PM
> To: device-mapper development
> Subject: Re: [dm-devel] rdac path failure - Sun 6140
>
>
>
>
>
> Same sequence of events, with multipathd -v3
>
>
>
> Aug 13 16:28:48.627 kernel: device-mapper: multipath: Failing path 8:208.
>
> Aug 13 16:28:48.000 multipathd: vol1: rr_weight = 2 (LUN setting)
>
> Aug 13 16:28:48.000 multipathd: vol1: pgfailback = -2 (controller setting)
>
> Aug 13 16:28:48.000 multipathd: pg_timeout = NONE (internal default)
>
> Aug 13 16:28:48.000 multipathd: 8:208: mark as failed
>
> Aug 13 16:28:48.000 multipathd: uevent 'change' from
> '/devices/virtual/block/dm-1'
>
> Aug 13 16:28:48.000 multipathd: UDEV_LOG=3
>
> Aug 13 16:28:48.000 multipathd: ACTION=change
>
> Aug 13 16:28:48.000 multipathd: DEVPATH=/devices/virtual/block/dm-1
>
> Aug 13 16:28:48.000 multipathd: SUBSYSTEM=block
>
> Aug 13 16:28:48.000 multipathd: DM_TARGET=multipath
>
> Aug 13 16:28:48.000 multipathd: DM_ACTION=PATH_FAILED
>
> Aug 13 16:28:48.000 multipathd: DM_SEQNUM=1
>
> Aug 13 16:28:48.000 multipathd: DM_PATH=8:208
>
> Aug 13 16:28:48.000 multipathd: DM_NR_VALID_PATHS=3
>
> Aug 13 16:28:48.000 multipathd: DM_NAME=vol1
>
> Aug 13 16:28:48.000 multipathd:
> DM_UUID=mpath-3600a0b800048335200001e5d48b68a9b
>
> Aug 13 16:28:48.000 multipathd: MAJOR=253
>
> Aug 13 16:28:48.000 multipathd: MINOR=1
>
> Aug 13 16:28:48.000 multipathd: DEVTYPE=disk
>
> Aug 13 16:28:48.000 multipathd: SEQNUM=1738
>
> Aug 13 16:28:48.000 multipathd: UDEVD_EVENT=1
>
> Aug 13 16:28:48.000 multipathd: DEVNAME=/dev/dm-1
>
> Aug 13 16:28:50.000 multipathd: 8:208: reinstated
>
> Aug 13 16:28:50.000 multipathd: vol1: remaining active paths: 4
>
> Aug 13 16:28:50.000 multipathd: sdj: rdac prio = 3
>
> Aug 13 16:28:50.000 multipathd: sdn: rdac prio = 3
>
> Aug 13 16:28:50.000 multipathd: sdb: rdac prio = 0
>
> Aug 13 16:28:50.000 multipathd: sdd: rdac prio = 0
>
> Aug 13 16:28:50.763 kernel: device-mapper: multipath: Failing path 8:208.
>
> Aug 13 16:28:50.000 multipathd: uevent 'change' from
> '/devices/virtual/block/dm-1'
>
> Aug 13 16:28:50.000 multipathd: UDEV_LOG=3
>
> Aug 13 16:28:50.000 multipathd: ACTION=change
>
> Aug 13 16:28:50.000 multipathd: DEVPATH=/devices/virtual/block/dm-1
>
> Aug 13 16:28:50.000 multipathd: SUBSYSTEM=block
>
> Aug 13 16:28:50.000 multipathd: DM_TARGET=multipath
>
> Aug 13 16:28:50.000 multipathd: DM_ACTION=PATH_REINSTATED
>
> Aug 13 16:28:50.000 multipathd: DM_SEQNUM=2
>
> Aug 13 16:28:50.000 multipathd: DM_PATH=8:208
>
> Aug 13 16:28:50.000 multipathd: DM_NR_VALID_PATHS=4
>
> Aug 13 16:28:50.000 multipathd: DM_NAME=vol1
>
> Aug 13 16:28:50.000 multipathd:
> DM_UUID=mpath-3600a0b800048335200001e5d48b68a9b
>
> Aug 13 16:28:50.000 multipathd: MAJOR=253
>
> Aug 13 16:28:50.000 multipathd: MINOR=1
>
> Aug 13 16:28:50.000 multipathd: DEVTYPE=diskAug 13 16:28:50.000
> multipathd: SEQNUM=1739Aug 13 16:28:50.000 multipathd: UDEVD_EVENT=1
>
> Aug 13 16:28:50.000 multipathd: vol1: rr_weight = 2 (LUN setting)
>
> Aug 13 16:28:50.000 multipathd: vol1: pgfailback = -2 (controller setting)
>
> Aug 13 16:28:50.000 multipathd: pg_timeout = NONE (internal default)
>
> Aug 13 16:28:50.000 multipathd: 8:208: mark as failed
>
> Aug 13 16:28:50.000 multipathd: vol1: remaining active paths: 3
>
> Aug 13 16:28:50.000 multipathd: vol1: rr_weight = 2 (LUN setting)
>
> Aug 13 16:28:50.000 multipathd: vol1: pgfailback = -2 (controller setting)
>
> Aug 13 16:28:50.000 multipathd: uevent 'change' from
> '/devices/virtual/block/dm-1'
>
> Aug 13 16:28:50.000 multipathd: UDEV_LOG=3
>
> Aug 13 16:28:50.000 multipathd: ACTION=change
>
> Aug 13 16:28:50.000 multipathd: DEVPATH=/devices/virtual/block/dm-1
>
> Aug 13 16:28:50.000 multipathd: SUBSYSTEM=block
>
> Aug 13 16:28:50.000 multipathd: DM_TARGET=multipath
>
> Aug 13 16:28:50.000 multipathd: DM_ACTION=PATH_FAILED
>
> Aug 13 16:28:50.000 multipathd: DM_SEQNUM=3
>
> Aug 13 16:28:50.000 multipathd: DM_PATH=8:208
>
> Aug 13 16:28:50.000 multipathd: DM_NR_VALID_PATHS=3
>
> Aug 13 16:28:50.000 multipathd: DM_NAME=vol1
>
> Aug 13 16:28:50.000 multipathd:
> DM_UUID=mpath-3600a0b800048335200001e5d48b68a9b
>
> Aug 13 16:28:50.000 multipathd: MAJOR=253
>
> Aug 13 16:28:50.000 multipathd: MINOR=1
>
> Aug 13 16:28:50.000 multipathd: DEVTYPE=disk
>
> Aug 13 16:28:50.000 multipathd: SEQNUM=1740
>
> Aug 13 16:28:50.000 multipathd: UDEVD_EVENT=1
>
> Aug 13 16:28:50.000 multipathd: DEVNAME=/dev/dm-1
>
> Aug 13 16:29:00.000 multipathd: 8:208: reinstated
>
> Aug 13 16:29:00.000 multipathd: vol1: remaining active paths: 4
>
> Aug 13 16:29:00.000 multipathd: sdj: rdac prio = 3
>
> Aug 13 16:29:00.000 multipathd: sdn: rdac prio = 3
>
> Aug 13 16:29:00.000 multipathd: sdb: rdac prio = 0
>
> Aug 13 16:29:00.000 multipathd: sdd: rdac prio = 0
>
> Aug 13 16:29:00.000 multipathd: vol1: rr_weight = 2 (LUN setting)
>
> Aug 13 16:29:00.000 multipathd: vol1: pgfailback = -2 (controller setting)
>
> Aug 13 16:29:00.000 multipathd: uevent 'change' from
> '/devices/virtual/block/dm-1'
>
> Aug 13 16:29:00.000 multipathd: UDEV_LOG=3
>
> Aug 13 16:29:00.000 multipathd: ACTION=change
>
> Aug 13 16:29:00.000 multipathd: DEVPATH=/devices/virtual/block/dm-1
>
> Aug 13 16:29:00.000 multipathd: SUBSYSTEM=block
>
> Aug 13 16:29:00.000 multipathd: DM_TARGET=multipath
>
> Aug 13 16:29:00.000 multipathd: DM_ACTION=PATH_REINSTATED
>
> Aug 13 16:29:00.000 multipathd: DM_SEQNUM=4
>
> Aug 13 16:29:00.000 multipathd: DM_PATH=8:208
>
> Aug 13 16:29:00.000 multipathd: DM_NR_VALID_PATHS=4
>
> Aug 13 16:29:00.000 multipathd: DM_NAME=vol1
>
> Aug 13 16:29:00.000 multipathd:
> DM_UUID=mpath-3600a0b800048335200001e5d48b68a9b
>
> Aug 13 16:29:00.000 multipathd: MAJOR=253
>
> Aug 13 16:29:00.000 multipathd: MINOR=1
>
> Aug 13 16:29:00.000 multipathd: DEVTYPE=disk
>
> Aug 13 16:29:00.000 multipathd: SEQNUM=1741
>
> Aug 13 16:29:00.000 multipathd: UDEVD_EVENT=1
>
> Aug 13 16:29:00.000 multipathd: DEVNAME=/dev/dm-1
>
> Aug 13 16:29:02.753 kernel: device-mapper: multipath: Failing path 8:208.
>
> Aug 13 16:29:02.000 multipathd: vol1: rr_weight = 2 (LUN setting)
>
> Aug 13 16:29:02.000 multipathd: vol1: pgfailback = -2 (controller setting)
>
> Aug 13 16:29:02.000 multipathd: pg_timeout = NONE (internal default)
>
> Aug 13 16:29:02.000 multipathd: 8:208: mark as failed
>
> Aug 13 16:29:02.000 multipathd: uevent 'change' from
> '/devices/virtual/block/dm-1'
>
> Aug 13 16:29:02.000 multipathd: UDEV_LOG=3
>
> Aug 13 16:29:02.000 multipathd: ACTION=change
>
> Aug 13 16:29:02.000 multipathd: DEVPATH=/devices/virtual/block/dm-1
>
> Aug 13 16:29:02.000 multipathd: SUBSYSTEM=block
>
> Aug 13 16:29:02.000 multipathd: DM_TARGET=multipath
>
> Aug 13 16:29:02.000 multipathd: DM_ACTION=PATH_FAILED
>
> Aug 13 16:29:02.000 multipathd: DM_SEQNUM=5
>
> Aug 13 16:29:02.000 multipathd: DM_PATH=8:208
>
> Aug 13 16:29:02.000 multipathd: DM_NR_VALID_PATHS=3
>
> Aug 13 16:29:02.000 multipathd: DM_NAME=vol1
>
> Aug 13 16:29:02.000 multipathd:
> DM_UUID=mpath-3600a0b800048335200001e5d48b68a9b
>
> Aug 13 16:29:02.000 multipathd: MAJOR=253
>
> Aug 13 16:29:02.000 multipathd: MINOR=1
>
> Aug 13 16:29:02.000 multipathd: DEVTYPE=disk
>
> Aug 13 16:29:02.000 multipathd: SEQNUM=1742
>
> Aug 13 16:29:02.000 multipathd: UDEVD_EVENT=1
>
> Aug 13 16:29:02.000 multipathd: DEVNAME=/dev/dm-1
>
> Aug 13 16:29:10.000 multipathd: 8:208: reinstated
>
> Aug 13 16:29:10.000 multipathd: vol1: remaining active paths: 4
>
> Aug 13 16:29:10.000 multipathd: sdj: rdac prio = 3
>
> Aug 13 16:29:10.000 multipathd: sdn: rdac prio = 3
>
> Aug 13 16:29:10.000 multipathd: sdb: rdac prio = 0
>
> Aug 13 16:29:10.000 multipathd: sdd: rdac prio = 0
>
> Aug 13 16:29:10.000 multipathd: vol1: rr_weight = 2 (LUN setting)
>
> Aug 13 16:29:10.000 multipathd: vol1: pgfailback = -2 (controller setting)
>
> Aug 13 16:29:10.000 multipathd: uevent 'change' from
> '/devices/virtual/block/dm-1'
>
> Aug 13 16:29:10.000 multipathd: UDEV_LOG=3
>
> Aug 13 16:29:10.000 multipathd: ACTION=change
>
> Aug 13 16:29:10.000 multipathd: DEVPATH=/devices/virtual/block/dm-1
>
> Aug 13 16:29:10.000 multipathd: SUBSYSTEM=block
>
> Aug 13 16:29:10.000 multipathd: DM_TARGET=multipath
>
> Aug 13 16:29:10.000 multipathd: DM_ACTION=PATH_REINSTATED
>
> Aug 13 16:29:10.000 multipathd: DM_SEQNUM=6
>
> Aug 13 16:29:10.000 multipathd: DM_PATH=8:208
>
> Aug 13 16:29:10.000 multipathd: DM_NR_VALID_PATHS=4
>
> Aug 13 16:29:10.000 multipathd: DM_NAME=vol1
>
> Aug 13 16:29:10.000 multipathd:
> DM_UUID=mpath-3600a0b800048335200001e5d48b68a9b
>
> Aug 13 16:29:10.000 multipathd: MAJOR=253
>
> Aug 13 16:29:10.000 multipathd: MINOR=1
>
> Aug 13 16:29:10.000 multipathd: DEVTYPE=disk
>
> Aug 13 16:29:10.000 multipathd: SEQNUM=1743
>
> Aug 13 16:29:10.000 multipathd: UDEVD_EVENT=1
>
> Aug 13 16:29:10.000 multipathd: DEVNAME=/dev/dm-1
>
>
>
>
>
>
>
>
>
> On Thu, Aug 13, 2009 at 1:27 PM, Stewart Smith <stew at cleepdar.com> wrote:
>
>
>
> after a fresh, multipath -F and start of multipathd with -v 2 I see the
> following messages.
>
>
>
> After starting multipathd I mounted /dev/mapper/vol1 and generated some
> simple I/O to it using dd
>
>
>
>
>
> Aug 13 16:23:14.888 localhost kernel: device-mapper: multipath: Failing path
> 8:208.
>
> Aug 13 16:23:14.000 localhost multipathd: 8:208: mark as failed
>
> Aug 13 16:23:16.000 localhost multipathd: 8:208: reinstated
>
> Aug 13 16:23:30.462 localhost kernel: device-mapper: multipath: Failing path
> 8:208.
>
> Aug 13 16:23:30.000 localhost multipathd: 8:208: mark as failed
>
> Aug 13 16:23:39.000 localhost multipathd: 8:208: reinstated
>
> Aug 13 16:23:46.430 localhost kernel: device-mapper: multipath: Failing path
> 8:208.
>
> Aug 13 16:23:46.000 localhost multipathd: 8:208: mark as failed
>
> Aug 13 16:23:51.041 localhost kernel: device-mapper: multipath: Failing path
> 8:208.
>
> Aug 13 16:23:51.000 localhost multipathd: 8:208: mark as failed
>
> Aug 13 16:23:59.000 localhost multipathd: 8:208: reinstated
>
> Aug 13 16:24:06.465 localhost kernel: device-mapper: multipath: Failing path
> 8:208.
>
> Aug 13 16:24:06.000 localhost multipathd: 8:208: mark as failed
>
> Aug 13 16:24:09.000 localhost multipathd: 8:208: reinstated
>
>
>
>
>
> Thanks,
>
> --
>
> Stew
>
>
>
>
>
>
>
> On Thu, Aug 13, 2009 at 12:42 PM, Moger, Babu <Babu.Moger at lsi.com> wrote:
>
> Do you have /var/log/messages file for this problem?
>
> Thanks
> Babu Moger
>
>> -----Original Message-----
>> From: dm-devel-bounces at redhat.com [mailto:dm-devel-bounces at redhat.com] On
>> Behalf Of Stewart Smith
>> Sent: Thursday, August 13, 2009 1:51 PM
>> To: dm-devel at redhat.com
>> Subject: [dm-devel] rdac path failure - Sun 6140
>>
>> Hello All,
>>
>> I am seeing many of these messages when my Sun 6140 array is under heavy
>> I/O
>> device-mapper: multipath: Failing path 8:208.
>> device-mapper: multipath: Failing path 8:208.
>> device-mapper: multipath: Failing path 8:208.
>> device-mapper: multipath: Failing path 8:208.
>> device-mapper: multipath: Failing path 8:208.
>>
>>
>> I am running a Fedora 10 server, with two fiber connections to two
>> different switches. Both controllers on the 6140 have one connection
>> to each switch as well. The end result is that I see four paths to
>> each LUN.
>>
>> When the volume is mounted and under significant load I see the
>> messages above every few seconds. They seem to appear every
>> "no_path_retry" seconds.
>>
>> The 6140 controller firmware is up to date at version 07.50.08.10 and
>> I have installed the latest firmware for my Emulex LPe11002 cards. I
>> have reproduced the problem using both Cisco MDS and Brocade fiber
>> channel switches as well.
>>
>> Using CAM, I have set the initiator Host Type to "Linux" at the
>> moment. I have tried other options as well without success.
>>
>> I have NOT installed the RDAC drivers from either Sun or LSI -
>> primarily because they do not seem to build on my Fedora 10 kernel.
>>
>> Any ideas would be greatly appreciated!!!
>>
>> configs and debugging multipathd output is below.
>>
>>
>>
>>
>>
>> Kernel: 2.6.27.24-170.2.68.fc10.x86_64
>>
>> # multipath -lll
>> vol1 (3600a0b800048335200001e5d48b68a9b) dm-1 SUN,CSM200_R
>> [size=12T][features=1 queue_if_no_path][hwhandler=1 rdac][rw]
>> \_ round-robin 0 [prio=6][active]
>> \_ 5:0:1:2 sdj 8:144 [active][ready]
>> \_ 2:0:1:2 sdn 8:208 [active][ready]
>> \_ round-robin 0 [prio=0][enabled]
>> \_ 2:0:0:2 sdb 8:16 [active][ghost]
>> \_ 5:0:0:2 sdd 8:48 [active][ghost]
>>
>>
>> # cat /etc/multipath.conf
>>
>> blacklist {
>> devnode "^sd[a-z][[0-9]*]"
>> devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
>> devnode "^hd[a-z][0-9]*"
>> devnode "^cciss!c[0-9]d[0-9](p[0-9]*)*"
>> }
>>
>> defaults {
>> udev_dir /dev
>> polling_interval 10
>> selector "round-robin 0"
>> path_grouping_policy multibus
>> getuid_callout "/sbin/scsi_id --whitelisted /dev/%n"
>> prio alua
>> path_checker readsector0
>> rr_min_io 100
>> max_fds 8192
>> rr_weight priorities
>> failback immediate
>> no_path_retry fail
>> user_friendly_names yes
>> }
>> devices {
>> device {
>> vendor "SUN"
>> product "CSM200_R"
>> product_blacklist "Universal Xport"
>> getuid_callout "/sbin/scsi_id --whitelisted
>> /dev/%n"
>> features "0"
>> hardware_handler "1 rdac"
>> path_selector "round-robin 0"
>> path_grouping_policy group_by_prio
>> failback immediate
>> rr_weight uniform
>> no_path_retry queue
>> rr_min_io 1000
>> path_checker rdac
>> prio rdac
>> }
>> }
>>
>> multipaths {
>> multipath {
>> wwid 3600a0b800048335200001e5d48b68a9b
>> alias vol1
>> rr_weight priorities
>> no_path_retry 5
>> rr_min_io 100
>> }
>> }
>>
>>
>>
>> # multipathd -d v3
>>
>>
>> Aug 13 14:48:53 | sdb: ownership set to vol1
>> Aug 13 14:48:53 | sdb: not found in pathvec
>> Aug 13 14:48:53 | sdb: mask = 0xc
>> Aug 13 14:48:53 | sdb: path checker = rdac (controller setting)
>> Aug 13 14:48:53 | sdb: state = 4
>> Aug 13 14:48:53 | sdb: rdac prio = 0
>> Aug 13 14:48:53 | sdd: ownership set to vol1
>> Aug 13 14:48:53 | sdd: not found in pathvec
>> Aug 13 14:48:53 | sdd: mask = 0xc
>> Aug 13 14:48:53 | sdd: path checker = rdac (controller setting)
>> Aug 13 14:48:53 | sdd: state = 4
>> Aug 13 14:48:53 | sdd: rdac prio = 0
>> Aug 13 14:48:53 | sdj: ownership set to vol1
>> Aug 13 14:48:53 | sdj: not found in pathvec
>> Aug 13 14:48:53 | sdj: mask = 0xc
>> Aug 13 14:48:53 | sdj: path checker = rdac (controller setting)
>> Aug 13 14:48:53 | sdj: state = 2
>> Aug 13 14:48:53 | sdj: rdac prio = 3
>> Aug 13 14:48:53 | sdn: ownership set to vol1
>> Aug 13 14:48:53 | sdn: not found in pathvec
>> Aug 13 14:48:53 | sdn: mask = 0xc
>> Aug 13 14:48:53 | sdn: path checker = rdac (controller setting)
>> Aug 13 14:48:53 | sdn: state = 2
>> Aug 13 14:48:53 | sdn: rdac prio = 3
>> Aug 13 14:48:53 | vol1: pgfailback = -2 (controller setting)
>> Aug 13 14:48:53 | vol1: pgpolicy = group_by_prio (controller setting)
>> Aug 13 14:48:53 | vol1: selector = round-robin 0 (controller setting)
>> Aug 13 14:48:53 | vol1: features = 0 (controller setting)
>> Aug 13 14:48:53 | vol1: hwhandler = 1 rdac (controller setting)
>> Aug 13 14:48:53 | vol1: rr_weight = 2 (LUN setting)
>> Aug 13 14:48:53 | vol1: minio = 100 (LUN setting)
>> Aug 13 14:48:53 | vol1: no_path_retry = 5 (multipath setting)
>> Aug 13 14:48:53 | pg_timeout = NONE (internal default)
>> Aug 13 14:48:53 | vol1: set ACT_CREATE (map does not exist)
>> create: vol1 (3600a0b800048335200001e5d48b68a9b) n/a SUN,CSM200_R
>> [size=12T][features=0][hwhandler=1 rdac][n/a]
>> \_ round-robin 0 [prio=6][undef]
>> \_ 5:0:1:2 sdj 8:144 [undef][ready]
>> \_ 2:0:1:2 sdn 8:208 [undef][ready]
>> \_ round-robin 0 [prio=0][undef]
>> \_ 2:0:0:2 sdb 8:16 [undef][ghost]
>> \_ 5:0:0:2 sdd 8:48 [undef][ghost]
>>
>
>> --
>> dm-devel mailing list
>> dm-devel at redhat.com
>> https://www.redhat.com/mailman/listinfo/dm-devel
>
> --
> dm-devel mailing list
> dm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
>
>
>
>
More information about the dm-devel
mailing list