[dm-devel] block and scsi fail fast fixes

Thu Jun 5 01:41:39 UTC 2008

The following patches fix two problems I have been seeing in Red Hat
bugzillas. The patches are made over scsi-misc, but except for
0006-block-and-drivers-separate-failfast-into-multiple-b.patch
they could also apply over scsi-rc-fixes or linus's tree.
0006-block-and-drivers-separate-failfast-into-multiple-b.patch has a patch
to convert the scsi dh modules so that is why it does not apply to
the other kernels.

The first problem is that when a transport problem is detected and
the classes/drivers block the scsi_devices, there is IO in the driver
and IO in the scsi_device queues. For fibre we have the fast IO fail
tmo infrastructure to allow us to get IO in the driver up to multipath,
but IO in the queues remains until the dev_loss_tmo fires. The
difference between the timers can be minutes, so it looks like hang to
the application. iSCSI has something similar to FC's fast io fail
tmo, but it is called the replacment timeout. With this we will fail
all IO that is in the driver or queued or any incoming IO.

The first 5 patches try to provide common behavior:
0001-scsi-add-transport-host-byte-errors-v2.patch
0002-iscsi-class-libiscsi-and-qla4xxx-convert-to-new-tr.patch
0003-fc-class-Add-support-for-new-transport-errors.patch
0004-qla2xxx-use-new-host-byte-transport-errors.patch
0005-lpfc-start-to-use-new-trasnport-errors.patch

Basically, when we block a device we fail IO with DID_TRANSPORT_DISRUPTED.
When the fast io transport timer fires we fail IO with DID_TRANSPORT_FAILFAST.

I converted qla2xxx and tried to convert lpfc (I was not sure about
some of the errors). zfcp and mpt need to be converted, but it looked
like they would be ok with the patches below. I could only test qla2xxx
and lpfc though.

The second problem is that multipath is not really good at handling a lot
of errors. It just retries all errors on a different path, so for transport
errors it makes a lot of sense to send them up to us pretty quickly. But
device errors or driver errors or weird ones inbetween the scsi layer is
better at handling them because the multipath layer does not know anything
about scsi details.

The patches:
0006-block-and-drivers-separate-failfast-into-multiple-b.patch
0007-scsi-Support-fail-fast-bits.patch

are really simple and just break up the FAILFAST bits into device, driver
and transport bits, so the upper layer can ask the lower layers to only
fail fast certain types of errors. For multipath we only set the transport
fail fast bit, and I thought in the future maybe something like RAID
would set the device failfast error and not want transport errors failed
fast to it.