[Cluster-devel] RHEL4 Test Patch: bz 345401

Wed Mar 19 16:38:05 UTC 2008

Hi,

This is a RHEL5.x gfs2 patch for bug #345401.  I just thought I'd
toss it out here to see if this fix makes sense and get comments
from people.  My explanation will be longer than the patch itself.

It seems like there were a couple things going on here, so I'll
detail each part of the fix.  I'll work from the bottom up because
the code change at the bottom of the patch may be the most
important one.

In the failing scenario, millions (literally) of files are being
opened, written, truncated and closed.  In many cases, the block
numbers used for dinodes are being reused, resulting in the glocks
and gfs2_inode structures that are nearly indistinguishable from
previous incarnations.  Some of these structures may still be in
memory and therein lies the problem.

(1) First change, at the bottom of the patch:

First, in inode_go_lock, when a glock was locked, it could call
gfs2_truncatei_resume on new gfs2_inodes in cases where the flags
were not yet set.  There is a GL_SKIP gh_flag to indicate whether
the gfs2_inode is new, so I added code to skip the truncate
resume path in that case.  So a reused gfs2_inode / glock should
no longer call the truncatei_resume code path anymore.

(2) Second change, in the middle of the patch:

Second, in function scan_glock, the code was putting glocks on 
the reclaim list if there were no holders.  However, it may be
possible that there is still someone waiting for the glock, (i.e.
on the waiters1, 3 list, or waiters2).  There might also be
items on the active items list.  So I changed it so it would not
put glocks on the reclaim list if they have any waiters or
active items.  I'm not sure if the "waiters" part of this fix
is necessary because if there are waiters pending, they should
be immediately granted the glock and become "holders" themselves.
I don't know if scan_glock can get in there while this is happening.

(3) Third change, at the top of the patch:

In function search_bucket, it is looking for a glock to use,
searching for it by hash key.  If the dinode block was reused,
it can have the same hash key.  My theory is that the glock
was being accessed while it was on the reclaim list, and then
the reclaim daemon was nuking it part-way through some operations
(during times when the glock was released).  The reclaim daemon
just blasts all glocks on the reclaim list.  The code change is
to make the code not find glocks by hash-key if the glock in
question is on the reclaim list because you can't guarantee
the thing won't go away at an inconvenient time.
Previous instrumentation indicated this was happening.

Regards,

Bob Peterson
--
diff -pur a/fs/gfs2/glock.c b/fs/gfs2/glock.c

--- a/fs/gfs2/glock.c	2008-02-22 10:42:48.000000000 -0600
+++ b/fs/gfs2/glock.c	2008-03-19 10:40:23.000000000 -0500
@@ -251,6 +251,8 @@ static struct gfs2_glock *search_bucket(
 			continue;
 		if (gl->gl_sbd != sdp)
 			continue;
+		if (!list_empty(&gl->gl_reclaim))
+			continue;
 
 		atomic_inc(&gl->gl_ref);
 
@@ -1688,6 +1690,12 @@ static void scan_glock(struct gfs2_glock
 	if (gl->gl_ops == &gfs2_inode_glops && gl->gl_object)
 		return;
 
+	if (!list_empty(&gl->gl_waiters1) ||
+	    !list_empty(&gl->gl_waiters3) ||
+	    !list_empty(&gl->gl_ail_list) ||
+	    test_bit(GLF_WAITERS2, &gl->gl_flags))
+		return;
+
 	if (gfs2_glmutex_trylock(gl)) {
 		if (list_empty(&gl->gl_holders) &&
 		    gl->gl_state != LM_ST_UNLOCKED && demote_ok(gl))
diff -pur a/fs/gfs2/glops.c b/fs/gfs2/glops.c
--- a/fs/gfs2/glops.c	2008-02-22 10:42:48.000000000 -0600
+++ b/fs/gfs2/glops.c	2008-03-19 10:41:00.000000000 -0500
@@ -306,7 +306,7 @@ static int inode_go_lock(struct gfs2_hol
 	struct gfs2_inode *ip = gl->gl_object;
 	int error = 0;
 
-	if (!ip)
+	if (!ip || (gh->gh_flags & GL_SKIP))
 		return 0;
 
 	if (test_bit(GIF_INVALID, &ip->i_flags)) {