[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]
Re: [dm-devel] [BUG] Oops when SCSI device under multipath is removed
- From: Alan Stern <stern rowland harvard edu>
- To: James Bottomley <James Bottomley HansenPartnership com>
- Cc: Kiyoshi Ueda <k-ueda ct jp nec com>, linux-scsi vger kernel org, jaxboe fusionio com, "linux-kernel vger kernel org" <linux-kernel vger kernel org>, roland purestorage com, device-mapper development <dm-devel redhat com>, Jun'ichi Nomura <j-nomura ce jp nec com>
- Subject: Re: [dm-devel] [BUG] Oops when SCSI device under multipath is removed
- Date: Thu, 11 Aug 2011 11:16:17 -0400 (EDT)
On Thu, 11 Aug 2011, James Bottomley wrote:
> > > Well, it's just hiding the problem. The essential problem is that only
> > > block has the correctly refcounted knowledge to know the last release of
> > > the queue reference. Until that time, the holder of the reference can
> > > use the queue regardless of whether blk_cleanup_queue() has been called.
> > > This is the race you complain about since use of the queue involves the
> > > lock which should be guarded by QUEUE_DEAD checks.
> > >
> > > This is essentially unfixable with function calls. The only way to fix
> > > it is to have a callback model for freeing the external lock.
> >
> > Assuming the queue is associated with a device, the queue could take a
> > reference to the device, dropping that reference when the queue is
> > freed. Then the lock could safely be freed at the same time as the
> > device.
>
> If that assumption is correct, there's no point refcounting the queue at
> all because its use is entirely subordinated to the lifecycle of the
> associated device.
That's true. Why wasn't it done that way originally? Are there queues
that aren't associated with devices?
> Plus all the wittering about my previous patch is
> pointless, because blk_cleanup_queue() has to do the final put of the
> queue in the lock free path (otherwise the assumption is violated).
>
> However, much as I'd like to accept this rosy view, the original oops
> that started all of this in 2.6.38 was someone caught something with a
> reference to a SCSI queue after the device release function had been
> called.
Not according to your commit log. You wrote that the reference was
taken after scsi_remove_device() had been called -- but the device
release function is scsi_device_dev_release_usercontext().
Alan Stern
[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]