[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[dm-devel] [PATCH 1/7] scsi: add transport host byte errors (v2)

From: Mike Christie <michaelc cs wisc edu>

Currently, if there is a transport problem the iscsi drivers will return
outstanding commands (commands being exeucted by the driver/fw/hw) with
DID_BUS_BUSY and block the session so no new commands can be queued.
Commands that are caught between the failure handling and blocking are
failed with DID_IMM_RETRY or one of the scsi ml queuecommand return values.
When the recovery_timeout fires, the iscsi drivers then fail IO with

For fcp, some drivers will fail some outstanding IO (disk but possibly not
tape) with DID_BUS_BUSY or DID_ERROR or some other value that causes a retry
and hits the scsi_error.c failfast check, block the rport, and commands
caught in the race are failed with DID_IMM_RETRY. Other drivers, will hold
onto all IO and wait for the terminate_rport_io or dev_loss_tmo_callbk to be

The following patches attempt to unify what upper layers will see drivers
like multipath can make a good guess. This relies on drivers being
hooked into their transport class.

This first patch just defines two new host byte errors so drivers can
return the same value for when a rport/session is blocked and for
when the fast_io_fail_tmo fires.

The idea is that if the LLD/class detects a problem and is going to block
a rport/session, then if the LLD wants or must return the command to scsi-ml,
then it can return it with DID_TRANSPORT_DISRUPTED. This will requeue
the IO into the same scsi queue it came from, until the fast io fail timer
fires and the class decides what to do.

When using multipath and the fast_io_fail_tmo fires then the class
can fail commands with DID_TRANSPORT_FAILFAST or drivers can use
DID_TRANSPORT_FAILFAST in their terminate_rport_io callbacks or
the equivlent in iscsi if we ever implement more advanced recovery methods.
A LLD, like lpfc, could continue to return DID_ERROR and then it will hit
the normal failfast path. The point of the patches is that upper layers will
not see a failure that could be recovered from while the rport/session is
blocked until fast_io_fail_tmo/recovery_timeout fires.

Fixed patch/diff errors and renamed DID_TRANSPORT_BLOCKED to
initial patch.

Signed-off-by: Mike Christie <michaelc cs wisc edu>
 drivers/scsi/constants.c  |    3 ++-
 drivers/scsi/scsi_error.c |   18 +++++++++++++++++-
 include/scsi/scsi.h       |    5 +++++
 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/constants.c b/drivers/scsi/constants.c
index 9785d73..4003dee 100644
--- a/drivers/scsi/constants.c
+++ b/drivers/scsi/constants.c
@@ -1364,7 +1364,8 @@ EXPORT_SYMBOL(scsi_print_sense);
 static const char * const hostbyte_table[]={
 #define NUM_HOSTBYTE_STRS ARRAY_SIZE(hostbyte_table)
 static const char * const driverbyte_table[]={
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 006a959..d257210 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1343,7 +1343,23 @@ int scsi_decide_disposition(struct scsi_cmnd *scmd)
 		return ADD_TO_MLQUEUE;
+		/*
+		 * LLD/transport was disrupted during processing of the IO.
+		 * The transport class is now blocked/blocking,
+		 * and the transport will decide what to do with the IO
+		 * based on its timers and recovery capablilities.
+		 *
+		 * TODO: When the target block code is merged we can block
+		 * entire target instead of just this device.
+		 */
+		return ADD_TO_MLQUEUE;
+		/*
+		 * The transport decided to failfast the IO (most likely
+		 * the fast io fail tmo fired), so send IO directly upwards.
+		 */
+		return SUCCESS;
 	case DID_ERROR:
 		if (msg_byte(scmd->result) == COMMAND_COMPLETE &&
 		    status_byte(scmd->result) == RESERVATION_CONFLICT)
diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h
index 2b5b935..df2c775 100644
--- a/include/scsi/scsi.h
+++ b/include/scsi/scsi.h
@@ -363,6 +363,11 @@ struct scsi_lun {
 #define DID_IMM_RETRY   0x0c	/* Retry without decrementing retry count  */
 #define DID_REQUEUE	0x0d	/* Requeue command (no immediate retry) also
 				 * without decrementing the retry count	   */
+#define DID_TRANSPORT_DISRUPTED 0x0e /* Transport error disrupted execution
+				      * and the driver blocked the port to
+				      * recover the link. Transport class will
+				      * retry or fail IO */
+#define DID_TRANSPORT_FAILFAST	0x0f /* Transport class fastfailed the io */
 #define DRIVER_OK       0x00	/* Driver status                           */

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]