[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]
[dm-devel] RE: [PATCH] dm mpath: Try recover from I/O failure by re-initializing the PG if device is running on one path
- From: "Moger, Babu" <Babu Moger lsi com>
- To: Grant Grundler <grundler google com>
- Cc: "dm-devel redhat com" <dm-devel redhat com>, "Chauhan, Vijay" <Vijay Chauhan lsi com>, "linux-scsi vger kernel org" <linux-scsi vger kernel org>
- Subject: [dm-devel] RE: [PATCH] dm mpath: Try recover from I/O failure by re-initializing the PG if device is running on one path
- Date: Wed, 22 Apr 2009 11:16:21 -0700
> > Problem: Device mapper fails the path for every I/O error.
> > It does not care about the type of error.
>
> This is the fundamental problem. Different layers of the block IO
> path have to agree on how to handle each possible type of error that
> can be returned. I don't know where to find such an agreement and
> think an implementation that does discriminate is needed.
> > There are certain errors which can be recovered by re-initializing the
> path again. I have seen this problem during my testing on rdac device
> handler. I have observed I/O errors when there is a change in Lun
> ownership. When Lun ownership changes device will return back with check
> condition with sense 0x05/0x94/0x01(SK/ASC/ASCQ -meaning Lun ownership
> changed). Currently, device mapper fails the path for this error and
> eventually this will lead to I/O error. We don't want to see I/O error for
> this reason.
>
> 1) This patch isn't discriminating between transport, media, or other
> device errors. Wouldn't it make sense to discriminate?
> "LUN ownership changed" sounds like some of the events possible in
> multi-inititiator enviroment would want to be notified about and
> perhaps even take some action (renegotiate access to
We will not be able to discriminate the error because error specific information is not available to DM.
> 2) Will this result in resetting a SATA device?
> I ask because device reset may result in data loss due to WCE enabled.
> I just don't know the higher parts of the block SW stack and how
> errors flow up the stack.
I am not sure about this. Don’t know if re-activating (or calling activate_path) the device causes reset.
[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]