[Cluster-devel] [PATCH] RHEL fix for bz428751
Benjamin Marzinski
bmarzins at redhat.com
Fri Mar 7 15:36:23 UTC 2008
On Fri, Mar 07, 2008 at 08:52:33AM +0000, Steven Whitehouse wrote:
> Hi,
>
> On Thu, 2008-03-06 at 17:05 -0600, Benjamin Marzinski wrote:
> > On Wed, Mar 05, 2008 at 11:05:41AM +0000, Steven Whitehouse wrote:
> > > This doesn't look like it solves the problem... don't we need to move
> > > the ->go_inval call into gfs2_glock_drop_th() ? After all we know at that
> > > point that we'll be dropping the lock, so there is no reason not to
> > > invalidate there.
> >
> >
> > Moving the invalidation into the gfs2_glock_drop_th() causes problems.
> > If the page is already locked when gfs2_glock_drop_th() is called, you
> > deadlock trying to lock the pages when you invalidate the lock.
> >
> > Looking through the code, it seems like the only time this should happen
> > is during a gfs2_readpage() call, like this.
> >
> > #0 [f1d6ec2c] schedule at c06072d9
> > #1 [f1d6ec94] io_schedule at c0607974
> > #2 [f1d6eca0] sync_page at c0455074
> > #3 [f1d6eca4] __wait_on_bit_lock at c0607a89
> > #4 [f1d6ecb8] __lock_page at c0454fbf
> > #5 [f1d6ece4] truncate_inode_pages_range at c045be78
> > #6 [f1d6ed50] truncate_inode_pages at c045bed6
> > #7 [f1d6ed5c] inode_go_inval at f8e4fba2
> > #8 [f1d6ed64] gfs2_glock_drop_th at f8e4ed97
> > #9 [f1d6ed80] run_queue at f8e4ef40
> > #10 [f1d6ed9c] gfs2_glock_nq at f8e4f432
> > #11 [f1d6edb8] gfs2_glock_nq_atime at f8e5060b
> > #12 [f1d6edfc] gfs2_readpage at f8e56c95
> >
> > From the best I can tell, it looks like it would be O.K. to unlock the page
> > before calling glops->go_inval() in this case, assuming that you knew
> > that you were the process that is holding the lock to the page and which page
> > was actually locked, and you had a way to tell gfs2_readpage not to
> > bother unlocking the page once you were finished.
> >
> > Unfortunately, coming up with a good way to pass that information back
> > and forth isn't straightforward. As soon as I come up with a decent
> > answer, I'll post the modified fix.
> >
> > -Ben
> >
>
> Nothing ought to be holding the page lock while we are unlocking the
> glock, otherwise problems will occur whichever version of the code you
> use. Are you missing the fix for #432057 perhaps?
>
Yes I am. Thanks. That should make things work much better.
-Ben
> Steve.
>
More information about the Cluster-devel
mailing list