[Cluster-devel] [GFS2 patch] fix hangup when multiple processes are trying to write to the same file

Josef Whiter jwhiter at redhat.com
Fri Feb 23 14:05:41 UTC 2007


On Fri, Feb 23, 2007 at 08:38:32AM -0500, Josef Whiter wrote:
> This fixes a problem I encountered while running bonnie++.  When you have one
> thread that opens a file and starts to write to it, and then another thread that
> tries to open and write to the same file, the second thread will loop forever
> trying to grab the inode lock for that inode.  Basically we come in through
> generic_buffered_file_write, which calls gfs2_prepare_write, which then attempts
> to grab the glock.  Because we don't own the lock, gfs2_prepare_write gets
> GLR_TRYFAILED, which returns AOP_TRUNCATED_PAGE to generic_buffered_file_write.
> At this point generic_buffered_file_write loops around again and immediately
> retries the prepare_write.  This means that the second process never gets off of
> the processor in order to allow the process that holds the lock to finish its
> work and let go of the lock.  This patch makes gfs2_glock_nq schedule() if it
> gets back a GLR_TRYFAILED, which resolves this problem.  Please let me know if
> this is in the wrong place, ie if it should go in gfs2_prepare_write instead of
> gfs2_glock_nq, or if its completely wrong :).
>

Ok at the suggestion of Steven I have made it so that we just yield in the
places where this would be a problem, so in gfs2_prepare_write and
gfs2_readpage.

Signed-off-by: Josef Whiter <jwhiter at redhat.com>

--- linux-2.6/fs/gfs2/ops_address.c.josef	2007-02-23 09:51:40.000000000 -0500
+++ linux-2.6/fs/gfs2/ops_address.c	2007-02-23 09:54:27.000000000 -0500
@@ -266,8 +266,10 @@ skip_lock:
 out:
 	return error;
 out_unlock:
-	if (error == GLR_TRYFAILED)
+	if (error == GLR_TRYFAILED) {
 		error = AOP_TRUNCATED_PAGE;
+		yield();
+	}
 	unlock_page(page);
 	if (do_unlock)
 		gfs2_holder_uninit(&gh);
@@ -364,6 +366,7 @@ static int gfs2_prepare_write(struct fil
 		if (error == GLR_TRYFAILED) {
 			unlock_page(page);
 			error = AOP_TRUNCATED_PAGE;
+			yield();
 		}
 		goto out_uninit;
 	}




More information about the Cluster-devel mailing list