[Cluster-devel] [PATCH] [GFS2] bz 276631 : GFS2: chmod hung - TRY 3

Fri Sep 14 15:12:18 UTC 2007

Hi,

Now in the -nmw git tree. Thanks,

Steve.

On Fri, 2007-09-14 at 09:27 -0500, Bob Peterson wrote:
> This is a rewrite of the patch.  We decided it was a better
> approach to call separate wrapper functions than trying to work around
> the problem with a spin_lock.
> --
> The problem boiled down to a race between the gdlm_init_threads()
> function initializing thread1 and its setting of blist = 1.
> Essentially, "if (current == ls->thread1)" was checked by the thread
> before the thread creator set ls->thread1.
> 
> Since thread1 is the only thread who is allowed to work on the
> blocking queue, and since neither thread thought it was thread1, no one
> was working on the queue.  So everything just sat.
> 
> This patch reuses the ls->async_lock spin_lock to fix the race,
> and it fixes the problem.  I've done more than 2000 iterations of the
> loop that was recreating the failure and it seems to work.
> 
> Dave Teigland brought up the question of whether we should do this
> another way.  For example, by checking for the task name "lock_dlm1"
> instead.  I'm open to opinions.
> --
> Signed-off-by: Bob Peterson <rpeterso at redhat.com> 
> --
> diff -pur a/fs/gfs2/locking/dlm/thread.c b/fs/gfs2/locking/dlm/thread.c
> --- a/fs/gfs2/locking/dlm/thread.c	2007-09-13 17:33:58.000000000 -0500
> +++ b/fs/gfs2/locking/dlm/thread.c	2007-09-14 09:16:07.000000000 -0500
> @@ -268,20 +268,16 @@ static inline int check_drop(struct gdlm
>  	return 0;
>  }
>  
> -static int gdlm_thread(void *data)
> +static int gdlm_thread(void *data, int blist)
>  {
>  	struct gdlm_ls *ls = (struct gdlm_ls *) data;
>  	struct gdlm_lock *lp = NULL;
> -	int blist = 0;
>  	uint8_t complete, blocking, submit, drop;
>  	DECLARE_WAITQUEUE(wait, current);
>  
>  	/* Only thread1 is allowed to do blocking callbacks since gfs
>  	   may wait for a completion callback within a blocking cb. */
>  
> -	if (current == ls->thread1)
> -		blist = 1;
> -
>  	while (!kthread_should_stop()) {
>  		set_current_state(TASK_INTERRUPTIBLE);
>  		add_wait_queue(&ls->thread_wait, &wait);
> @@ -333,12 +329,22 @@ static int gdlm_thread(void *data)
>  	return 0;
>  }
>  
> +static int gdlm_thread1(void *data)
> +{
> +	return gdlm_thread(data, 1);
> +}
> +
> +static int gdlm_thread2(void *data)
> +{
> +	return gdlm_thread(data, 0);
> +}
> +
>  int gdlm_init_threads(struct gdlm_ls *ls)
>  {
>  	struct task_struct *p;
>  	int error;
>  
> -	p = kthread_run(gdlm_thread, ls, "lock_dlm1");
> +	p = kthread_run(gdlm_thread1, ls, "lock_dlm1");
>  	error = IS_ERR(p);
>  	if (error) {
>  		log_error("can't start lock_dlm1 thread %d", error);
> @@ -346,7 +352,7 @@ int gdlm_init_threads(struct gdlm_ls *ls
>  	}
>  	ls->thread1 = p;
>  
> -	p = kthread_run(gdlm_thread, ls, "lock_dlm2");
> +	p = kthread_run(gdlm_thread2, ls, "lock_dlm2");
>  	error = IS_ERR(p);
>  	if (error) {
>  		log_error("can't start lock_dlm2 thread %d", error);
> 
>