[dm-devel] path coalescing once more

Brian Bunker brian at purestorage.com
Thu May 15 16:59:18 UTC 2014


Thanks Sean. Did you see my latest post where I printed the pp->serial after the problem happens. It looks like the pp->wwid and the pp->serial are pointing to the same thing. I am looking into how pp->wwid gets formed for a path, but the correct serial for that path is there. Since the dm is looked up by the wwid and not the serial though, it gets into this problem. 

Thanks,
Brian

On May 15, 2014, at 9:48 AM, Stewart, Sean <Sean.Stewart at netapp.com> wrote:

> Hi Brian,
> 
> On Tue, 2014-05-13 at 16:55 -0700, Brian Bunker wrote:
>> We continue to run into problems with the device-mapper presumably
>> putting LUNs which do not belong with the dm under its control. Here
>> is the latest (I am picking out just dm-2 but there are others in this
>> same state):
> 
>> 
>> I see the following kernel messages in the syslog when the device sdaj
>> arrives to when it is put in the wrong place:
>> May 13 11:28:27 rb9init4 kernel: scsi 1:0:0:1: Direct-Access     PURE     FlashArray       400B PQ: 0 ANSI: 6
>> May 13 11:28:27 rb9init4 kernel: sd 1:0:0:1: [sdaj] 1048576000 512-byte logical blocks: (536 GB/500 GiB)
>> May 13 11:28:27 rb9init4 kernel: sd 1:0:0:1: Attached scsi generic sg1 type 0
>> May 13 11:28:27 rb9init4 kernel: sd 1:0:0:1: [sdaj] Write Protect is off
>> May 13 11:28:27 rb9init4 kernel: sd 1:0:0:1: [sdaj] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
>> May 13 11:28:27 rb9init4 kernel: sdaj:
>> May 13 11:28:27 rb9init4 kernel: unknown partition table
>> May 13 11:28:27 rb9init4 kernel: sd 1:0:0:1: [sdaj] Attached SCSI disk
>> May 13 11:28:28 rb9init4 multipathd: sdaj: add path (uevent)
>> May 13 11:28:28 rb9init4 multipathd: sdaj path added to devmap 3624a9370c90d0d631ef8783e00010004
>> 
> I ran through the code a little bit the other day, and I don't see how
> it could be making a mistake here.  It runs scsi_id, the result is
> placed in a 128 character buffer, then it is strncmp'd with the wwid of
> the mpath devices.
>> 
>> I don’t understand where it gets 3624a9370c90d0d631ef8783e00010004.
>> When I run the ‘multipath -v6 -d’ so that it prints out what it wants
>> to do but doesn’t do it, I see:
> 
>> May 13 16:45:15 | sdaj: getuid = /lib/udev/scsi_id --whitelisted
>> --device=/dev/%n (config file default)
>> May 13 16:45:15 | sdaj: uid = 3624a9370c90d0d631ef8783e00010002
>> (callout)
>> May 13 16:45:15 | sdaj: state = running
>> May 13 16:45:15 | sdaj: detect_prio = 1 (config file default)
>> May 13 16:45:15 | sdaj: prio = const (config file default)
>> May 13 16:45:15 | sdaj: const prio = 1
>> 
> Running this command later will make it do the same thing, run scsi_id
> which runs an inquiry that gets the wwid..  I would think the only
> reason these should be different is that the inquiries returned
> different values when multipathd did it at 11:28, and when you did it
> through multipath at 16:45.  
> 
> In order to see that, we'd probably need to set verbosity 3 in the
> defaults section of multipath.conf, restart the daemon, and do it
> again..  Does anyone else have any thoughts on this?
>> 
>> So it seems to know thats its serial ends in “02” and not “04” like
>> where it put the device. I don’t understand how to debug this further,
>> so any help would be appreciated.
> 
> 
> Thanks,
> Sean Stewart
> 
> 
> 
> --
> dm-devel mailing list
> dm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

Brian Bunker
brian at purestorage.com







More information about the dm-devel mailing list