[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [dm-devel] [PATCH 2/2] scsi : fixing the new host byte settings (DID_TARGET_FAILURE and DID_NEXUS_FAILURE)



> -----Original Message-----
> From: Mike Snitzer [mailto:snitzer redhat com]
> Sent: Tuesday, January 24, 2012 5:03 PM
> To: Moger, Babu
> Cc: linux-scsi vger kernel org; device-mapper development (dm-
> devel redhat com)
> Subject: Re: [PATCH 2/2] scsi : fixing the new host byte settings
> (DID_TARGET_FAILURE and DID_NEXUS_FAILURE)
> 
> Hi Babu,
> 
> Thanks for finding this.
> 
> On Tue, Jan 24 2012 at  3:38pm -0500,
> Moger, Babu <Babu Moger netapp com> wrote:
> 
> > Resubmitting as my previous post had format issues and did not go linux-scsi.
> >
> > This patch fixes the host byte settings DID_TARGET_FAILURE and
> DID_NEXUS_FAILURE.
> > The function __scsi_error_from_host_byte,  tries to reset the host byte to
> DID_OK. But that
> > does not happen because of the OR operation.
> >
> > Here is the flow.
> > scsi_softirq_done-> scsi_decide_disposition -> __scsi_error_from_host_byte
> 
> or more accurately:
> 
> scsi_softirq_done -> scsi_decide_disposition
> scsi_softirq_done -> scsi_finish_command -> scsi_io_completion ->
> __scsi_error_from_host_byte
> 
> > Let's take an example with DID_NEXUS_FAILURE. In scsi_decide_disposition,
> result will be set as
> > DID_NEXUS_FAILURE (=0x11). Then in __scsi_error_from_host_byte, when we
> do OR with
> > DID_OK.  Purpose is to reset it back to DID_OK. But that does not happen.  This
> patch fixes this issue.
> 
> We clearly aren't properly resetting to DID_OK but I'm not seeing an
> obvious "nasty bug" that is lurking due to this.  Am I missing
> something?

Yes. It is causing some issues in our proprietary multipath driver. Normally, our assumption
is that host status  overrides all other statuses. If host status is set to status other than DID_OK
then we normally ignore other statuses(like reading the check sense).  We have worked this around.
My assumption is, most of the user Level code does the same thing.  It might give wrong impression
about the kind of error.

One question..  Did the newlines wrapped in this patch  also? 

> 
> __scsi_error_from_host_byte() is setting error which is passed back up
> via blk_end_request() and blk_end_request_all().  And in my previous
> testing I know that corresponding errors are making it out to
> dm-multipath (e.g. -EREMOTEIO).
> 
> Also, your patch header is missing the location where DID_OK is not
> properly matched (because it wasn't set exclussively due to being

I am not sure what you meant here.

> or'd).  Looks like scsi_noretry_cmd() will be made more efficient
> because it will match DID_OK immediately.  Any other locations?  Would
> be good to call them out.
> 
> Mike


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]