[Cluster-devel] [PATCH] [GFS2] bz 276631 : GFS2: chmod hung - TRY 2
Steven Whitehouse
swhiteho at redhat.com
Fri Sep 14 12:26:42 UTC 2007
Hi,
Now in the -nmw git tree. Thanks,
Steve.
On Thu, 2007-09-13 at 23:04 -0500, Bob Peterson wrote:
> Josef's right--my bad. Here is the corrected patch for 276631.
>
> The problem boiled down to a race between the gdlm_init_threads()
> function initializing thread1 and its setting of blist = 1.
> Essentially, "if (current == ls->thread1)" was checked by the thread
> before the thread creator set ls->thread1.
>
> Since thread1 is the only thread who is allowed to work on the
> blocking queue, and since neither thread thought it was thread1, no one
> was working on the queue. So everything just sat.
>
> This patch reuses the ls->async_lock spin_lock to fix the race,
> and it fixes the problem. I've done more than 2000 iterations of the
> loop that was recreating the failure and it seems to work.
>
> Dave Teigland brought up the question of whether we should do this
> another way. For example, by checking for the task name "lock_dlm1"
> instead. I'm open to opinions.
> --
> Signed-off-by: Bob Peterson <rpeterso at redhat.com>
> --
> diff -pur a/fs/gfs2/locking/dlm/thread.c b/fs/gfs2/locking/dlm/thread.c
> --- a/fs/gfs2/locking/dlm/thread.c 2007-09-13 17:33:58.000000000 -0500
> +++ b/fs/gfs2/locking/dlm/thread.c 2007-09-13 22:47:14.000000000 -0500
> @@ -279,8 +279,10 @@ static int gdlm_thread(void *data)
> /* Only thread1 is allowed to do blocking callbacks since gfs
> may wait for a completion callback within a blocking cb. */
>
> + spin_lock(&ls->async_lock);
> if (current == ls->thread1)
> blist = 1;
> + spin_unlock(&ls->async_lock);
>
> while (!kthread_should_stop()) {
> set_current_state(TASK_INTERRUPTIBLE);
> @@ -338,10 +340,12 @@ int gdlm_init_threads(struct gdlm_ls *ls
> struct task_struct *p;
> int error;
>
> + spin_lock(&ls->async_lock);
> p = kthread_run(gdlm_thread, ls, "lock_dlm1");
> error = IS_ERR(p);
> if (error) {
> log_error("can't start lock_dlm1 thread %d", error);
> + spin_unlock(&ls->async_lock);
> return error;
> }
> ls->thread1 = p;
> @@ -351,9 +355,11 @@ int gdlm_init_threads(struct gdlm_ls *ls
> if (error) {
> log_error("can't start lock_dlm2 thread %d", error);
> kthread_stop(ls->thread1);
> + spin_unlock(&ls->async_lock);
> return error;
> }
> ls->thread2 = p;
> + spin_unlock(&ls->async_lock);
>
> return 0;
> }
>
>
More information about the Cluster-devel
mailing list