[dm-devel] dm-mq and end_clone_request()

Mike Snitzer snitzer at redhat.com
Thu Jul 28 15:40:22 UTC 2016


On Thu, Jul 28 2016 at 11:23am -0400,
Bart Van Assche <bart.vanassche at sandisk.com> wrote:

> On 07/28/2016 06:33 AM, Mike Snitzer wrote:
> >On Wed, Jul 27 2016 at  7:05pm -0400,
> >Bart Van Assche <bart.vanassche at sandisk.com> wrote:
> >>Thanks again for having made this patch available. I will test it as
> >>soon as I have the time. BTW, in the meantime I ran a few tests with
> >>DM_MQ_DEFAULT=n since until now I ran all tests with
> >>DM_MQ_DEFAULT=y. The result of these tests is as follows:
> >>* v4.6.0, v4.6.5 and v4.7.0 with DM_MQ_DEFAULT=y: first simulated
> >>path removal triggers I/O errors.
> >>* v4.6.4, v4.6.5 and v4.7.0 with DM_MQ_DEFAULT=n: test passes more
> >>than 100 iterations.
> >
> >I think this may point to an SRP issue then.  Is the synthetic "cable
> >pull" (by writing to /sys/class/srp_remote_ports/port-*/delete)
> >representitive of what actually happens if a cable is physically pulled?
> >
> >Or is your synthetic method hitting the device way harder than would
> >happen with an actual production fault?
> >
> >Again, there hasn't been any report of failures (EIO or otherwise) with
> >extensive scsi-mq and dm-mq testing on a larger FC testbed.
> 
> Hello Mike,
> 
> Sorry but I disagree that the ib_srp driver would be causing the EIO
> errors because:
> * All tests, including the tests that pass, were run with
>   CONFIG_SCSI_MQ_DEFAULT=y in the kernel config. The same code paths
>   were triggered in the ib_srp driver by all the tests
>   (CONFIG_DM_MQ_DEFAULT=y and CONFIG_DM_MQ_DEFAULT=n).
> * In my previous e-mails I have shown that the EIO error code is
>   generated by the dm-mpath driver after all (SRP) paths have gone. So
>   how could the ib_srp driver be involved?
> 
> There is an important difference between the SCSI FC drivers and
> ib_srp: after dev_loss_tmo expires FC drivers call
> scsi_remove_target() while the SRP transport layer triggers a call
> of scsi_remove_host().
> 
> Both writing into /sys/class/srp_remote_ports/*/delete and pulling a
> cable make the ib_srp driver call scsi_remove_host(). The only
> difference is the timing. With the former method it is more likely
> that the time between submitting I/O and calling scsi_remove_host()
> is small.

Reality is I just need a testbed to reproduce.  This back and forth
isn't really helping us converge on _why_ must_push_back() is returning
false for your case.  I need to know what exactly is causing that method
to return false in your case.

As is, hard to see why blk-mq vs .request_fn interface for DM mpath
device would cause must_push_back() to return false vs true.




More information about the dm-devel mailing list