Re: [Cluster-devel] [PATCH] RHEL fix for bz428751

On Wed, Mar 05, 2008 at 11:05:41AM +0000, Steven Whitehouse wrote:
> This doesn't look like it solves the problem... don't we need to move
> the ->go_inval call into gfs2_glock_drop_th() ? After all we know at that
> point that we'll be dropping the lock, so there is no reason not to
> invalidate there.

Moving the invalidation into the gfs2_glock_drop_th() causes problems.
If the page is already locked when gfs2_glock_drop_th() is called, you
deadlock trying to lock the pages when you invalidate the lock.

Looking through the code, it seems like the only time this should happen
is during a gfs2_readpage() call, like this.

 #0 [f1d6ec2c] schedule at c06072d9
 #1 [f1d6ec94] io_schedule at c0607974
 #2 [f1d6eca0] sync_page at c0455074
 #3 [f1d6eca4] __wait_on_bit_lock at c0607a89
 #4 [f1d6ecb8] __lock_page at c0454fbf
 #5 [f1d6ece4] truncate_inode_pages_range at c045be78
 #6 [f1d6ed50] truncate_inode_pages at c045bed6
 #7 [f1d6ed5c] inode_go_inval at f8e4fba2
 #8 [f1d6ed64] gfs2_glock_drop_th at f8e4ed97
 #9 [f1d6ed80] run_queue at f8e4ef40
#10 [f1d6ed9c] gfs2_glock_nq at f8e4f432
#11 [f1d6edb8] gfs2_glock_nq_atime at f8e5060b
#12 [f1d6edfc] gfs2_readpage at f8e56c95

>From the best I can tell, it looks like it would be O.K. to unlock the page
before calling glops->go_inval() in this case, assuming that you knew
that you were the process that is holding the lock to the page and which page
was actually locked, and you had a way to tell gfs2_readpage not to
bother unlocking the page once you were finished.

Unfortunately, coming up with a good way to pass that information back
and forth isn't straightforward. As soon as I come up with a decent
answer, I'll post the modified fix.


