[dm-devel] [Stable-review] [93/93] dm mpath: fix stall when requeueing io
Mikulas Patocka
mpatocka at redhat.com
Tue Feb 23 19:52:12 UTC 2010
On Tue, 23 Feb 2010, Alasdair G Kergon wrote:
> > >> spin_unlock(q->queue_lock);
> > >> - map_request(ti, rq, md);
> > >> + if (map_request(ti, rq, md))
> > >> + goto requeued;
> > >> spin_lock_irq(q->queue_lock);
>
> > In the current device-mapper code, I would like to go with
> > spin_unlock/lock here.
> > However, there was a case to enable irq in map_requst() for request
> > allocation, and this spin_lock_irq was a work-around for the case.
> > Now, there is no such case in the device-mapper code, so spin_lock should
> > be enough here. But I'm still using spin_lock_irq for safeness, since
> > there might be some more cases to enable irq during request submission
> > to underlying devices.
Either map_requst() may enable irqs --- then you should enable them with
spin_unlock_irq (then, you'd have to review all the callers of
dm_request_fn that they are fine with enabling irqs).
Or map_request() must not enable irqs --- and then there is already a bug
and there is no point in using "spin_lock_irq" for safeness, because it
doesn't fix the bug (the interrupt may come anyway, before spin_lock_irq).
> I don't understand this.
>
> Are you saying that sometimes this fn is called with local interrupts disabled
> and other times with them still enabled?
>
> If they are sometimes left enabled, and the unlock() leaves them alone,
> then the lock_irq disables them, what piece of code then reenables them?
>
> Surely this code can only be working if local interrupts are always
> disabled prior to entry?
>
> Alasdair
Another problem:
dm_request_fn can be called in an interrupt context, I scanned it for
calling process-context functions and found:
It may call rq_completed (either directly, via
dm_request_fn->map_request->dm_kill_unmapped_request->dm_complete_request
->dm_done->dm_end_request->dm_put) or indirectly, when the request is
completed from host controller interrupt. And dm_put is a process_context
function.
I believe it doesn't cause a real crash, because dm_put is called in
dm_blk_close, thus there is always at least one reference. When the device
is closed with dm_blk_close, there should be no requests on it.
But it is simply a logic error to call a process-context function from
an interrupt context. I'd remove those dm_get/dm_put from
request-based-dm --- they are not needed anyway, as long as there are
requests, the "mapped_device" structure can't disappear.
You can apply this (in 2.6.34-rc1) to catch all the errorneous users of
dm_put.
Mikulas
---
Catch errorneous calls of dm_put from an interrupt context.
Signed-off-by: Mikulas Patocka <mpatocka at redhat.com>
---
drivers/md/dm.c | 1 +
1 file changed, 1 insertion(+)
Index: linux-2.6.33-rc8-fast/drivers/md/dm.c
===================================================================
--- linux-2.6.33-rc8-fast.orig/drivers/md/dm.c 2010-02-23 20:45:31.000000000 +0100
+++ linux-2.6.33-rc8-fast/drivers/md/dm.c 2010-02-23 20:46:10.000000000 +0100
@@ -2188,6 +2188,7 @@ void dm_put(struct mapped_device *md)
struct dm_table *map;
BUG_ON(test_bit(DMF_FREEING, &md->flags));
+ might_sleep();
if (atomic_dec_and_lock(&md->holders, &_minor_lock)) {
map = dm_get_live_table(md);
More information about the dm-devel
mailing list