[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]
Re: [dm-devel] do Symmetrix multipath-tools defaults need update ? or scsi-to-blk errors management ?
- From: Mike Christie <michaelc cs wisc edu>
- To: device-mapper development <dm-devel redhat com>
- Cc: Levy_Jerome emc com, linux-scsi vger kernel org
- Subject: Re: [dm-devel] do Symmetrix multipath-tools defaults need update ? or scsi-to-blk errors management ?
- Date: Wed, 10 Jun 2009 18:34:52 -0500
On 06/10/2009 04:49 PM, christophe varoqui free fr wrote:
Hi Jerome,
EMC recently asked my/one-of-your client to active "queue_if_no_path" on Symmetrix logical units, which is not the current default setting in the upstream multipath-tools package.
I'd like to know if you intent on submitting a patch to change the default setting accordingly, or if you'd rather let the no-queueing default unchanged and work on fixing the root cause of this issue.
::: Background information, root cause :::
The Symmetrix array proved to return scsi errors io to submitters in certains circumstances (I was told of errors on R1+R2 network link). The linux kernel lacking finesse in the SCSI->DM error reporting ends-up invalidating in turn each path of the multipath before the multipathd daemon gets a chance to revalidate. "queue_if_no_path" being disabled, the io errors ends up in the FS layer and in the userspace submitter.
::: error log on a 2.6.9 (rhel 4.7) kernel :::
For RH 4.9 I did the attached patch. So this error is not fastfailed
(upstream does not fastfail this type of error when using dm-multipath
now). So now the scsi layer will retry its normal 5 times, then fail.
SCSI error :<h b t l> return code 0x8000002
current sday: sense key Aborted Command
Additional sense: Internal target failure
end_request: I/O error, dev sday, sector XXXXX
device-mapper: dm-multipath: Failing path 67:32.
::: unfortunate side effect of queue_if_no_path :::
Activating "queue_if_no_path" is certainly an effecient work-around for this kind of short-lived retriable errors, but this feature compromises data-protection on clusters relying on persistent reservation to fence ios from passive nodes. Ironically, the reason is quite similar : SCSI return codes for reservation conflicts also end up invalidating each path of a multipath, and worse, the io causing the conflict gets queued ! and retried ! until the poor active drops its reservation, unleashing data-corrupting ios from passive node queues on the logical unit.
::: error log on a 2.6.29.x kernel for a reservation conflict :::
sd h:b:t:l: reservation conflict
sd h:b:t:l: [sdu] Unhandled error code
sd h:b:t:l: [sdu] Result: hostbyte=DID_OK driver_byte=DRIVER_OK,SUGGEST_OK
end_request: I/O error, dev sdu, sector XXXXX
device-mapper: dm-multipath: Failing path 65:64.
::: persistent reservation + queue_if_no_path, possible solution ? :::
Seems to me scsi_lib.c::scsi_io_completion() should be able to cancel a reservation conflicting io and signal blk_end_request() with no error reported.
I was just about to post new blkerr patches. For this we just wan
multipath to fail this IO right away right? So have scsi return some
fatal error then dm-multipath will see it and not retry that IO?
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 7309f12..d5a3390 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1390,7 +1390,7 @@ int scsi_decide_disposition(struct scsi_cmnd *scmd)
case CHECK_CONDITION:
rtn = scsi_check_sense(scmd);
if (rtn == NEEDS_RETRY)
- goto maybe_retry;
+ goto check_retry_count;
/* if rtn == FAILED, we have no sense information;
* returning FAILED will wake the error handler thread
* to collect the sense and redo the decide
[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]