Re: [Cluster-devel] [GFS2 PATCH] GFS2: Flush work queue before clearing glock hash tables


Now in the -nmw tree. Thanks,


On Thu, 2013-04-25 at 12:49 -0400, Bob Peterson wrote:
> Hi,
> There was a timing window when a GFS2 file system was unmounted
> that caused GFS2 to call BUG() and panic the kernel. The call
> to BUG() is meant to ensure that the glock reference count,
> gl_ref, never gets down to zero and bounce back up again. What was
> happening during umount is that function gfs2_put_super was dequeing
> its glocks for well-known files. In particular, we saw it on the
> journal glock, sd_jinode_gh. The dequeue caused delayed work to be
> queued for the glock state machine, to transition the lock to an
> "unlocked" state. While the work was still queued, gfs2_put_super
> called gfs2_gl_hash_clear to clear out the glock hash tables.
> If the timing was just so, the glock work function would drop the
> reference count at the time when it was being checked for zero,
> and that caused BUG() to be called. This patch calls
> flush_workqueue before clearing the glock hash tables, thereby
> ensuring that the delayed work is executed before the hash tables
> are cleared, and therefore the reference count never goes to zero
> until the glock is cleared.
> Regards,
> Bob Peterson
> Red Hat File Systems
> Signed-off-by: Bob Peterson <rpeterso redhat com> 
> ---
> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> index 3b9e178..b777691 100644
> --- a/fs/gfs2/glock.c
> +++ b/fs/gfs2/glock.c
> @@ -1577,6 +1577,7 @@ static void dump_glock_func(struct gfs2_glock *gl)
>  void gfs2_gl_hash_clear(struct gfs2_sbd *sdp)
>  {
>  	set_bit(SDF_SKIP_DLM_UNLOCK, &sdp->sd_flags);
> +	flush_workqueue(glock_workqueue);
>  	glock_hash_walk(clear_glock, sdp);
>  	flush_workqueue(glock_workqueue);
>  	wait_event(sdp->sd_glock_wait, atomic_read(&sdp->sd_glock_disposal) == 0);

