[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [dm-devel] block: Check that queue is alive in blk_insert_cloned_request()



On Mon, Jul 11, 2011 at 09:22:06PM -0400, Mike Snitzer wrote:
> On Mon, Jul 11 2011 at  8:52pm -0400,
> Alan Stern <stern rowland harvard edu> wrote:
> 
> > On Mon, 11 Jul 2011, Mike Snitzer wrote:
> > 
> > > [cc'ing dm-devel, vivek and tejun]
> > > 
> > > On Fri, Jul 8, 2011 at 7:04 PM, Roland Dreier <roland kernel org> wrote:
> > > > From: Roland Dreier <roland purestorage com>
> > > >
> > > > This fixes crashes such as the below that I see when the storage
> > > > underlying a dm-multipath device is hot-removed. ?The problem is that
> > > > dm requeues a request to a device whose block queue has already been
> > > > cleaned up, and blk_insert_cloned_request() doesn't check if the queue
> > > > is alive, but rather goes ahead and tries to queue the request. ?This
> > > > ends up dereferencing the elevator that was already freed in
> > > > blk_cleanup_queue().
> > > 
> > > Your patch looks fine to me:
> > > Acked-by: Mike Snitzer <snitzer redhat com>
> > 
> > There's still the issue that Stefan Richter pointed out: The test for a 
> > dead queue must be made _after_ acquiring the queue lock, not _before_.
> 
> Yes, quite important.
> 
> Jens, can you tweak the patch or should Roland send a v2?

I do not think that we should do queue dead check after taking a spinlock.
The reason being that there are life time issues of two objects.

- Validity of request queue pointer
- Validity of q->spin_lock pointer

If the dm has taken the reference to the request queue in the beginning
then it can be sure request queue pointer is valid. But spin_lock might
be coming from driver and might be in one of driver allocated structures.
So it might happen that driver has called blk_cleanup_queue() and freed
up structures which contained the spin lock.

So if queue is not dead, we know that q->spin_lock is valid. I think
only race present here is that whole operation is not atomic. First
we check for queue not dead flag and then go on to acquire request
queue lock. So this leaves a small window for race. I think I have
seen other code written in such manner (__generic_make_request()). So
it proably reasonably safe to do here too.

Thanks
Vivek


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]