[Cluster-devel] [PATCH 2/2] GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads

Mark Syms Mark.Syms at citrix.com
Mon Oct 8 20:53:23 UTC 2018


Thanks Bob,

We're having a look at the bit that Tim sent earlier to see if switching the call order makes the schedule_timeout never occur.

As Steven said, it would be nice if we didn't have to worry about waiting for "dead" glocks to be cleaned up but that would require some considerable care to ensure that we don't just get the opposite race and get a newly created glock freed and deleted.

Mark
________________________________
From: Bob Peterson <rpeterso at redhat.com>
Sent: Monday, 8 October 2018 21:10
To: Mark Syms <Mark.Syms at citrix.com>
CC: cluster-devel at redhat.com,Ross Lagerwall <ross.lagerwall at citrix.com>,Tim Smith <tim.smith at citrix.com>
Subject: Re: [Cluster-devel] [PATCH 2/2] GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads


----- Original Message -----
> From: Tim Smith <tim.smith at citrix.com>
>
> Flushing the workqueue can cause operations to happen which might
> call gfs2_log_reserve(), or get stuck waiting for locks taken by such
> operations.  gfs2_log_reserve() can io_schedule(). If this happens, it
> will never wake because the only thing which can wake it is gfs2_logd()
> which was already stopped.
>
> This causes umount of a gfs2 filesystem to wedge permanently if, for
> example, the umount immediately follows a large delete operation.
>
> When this occured, the following stack trace was obtained from the
> umount command
>
> [<ffffffff81087968>] flush_workqueue+0x1c8/0x520
> [<ffffffffa0666e29>] gfs2_make_fs_ro+0x69/0x160 [gfs2]
> [<ffffffffa0667279>] gfs2_put_super+0xa9/0x1c0 [gfs2]
> [<ffffffff811b7edf>] generic_shutdown_super+0x6f/0x100
> [<ffffffff811b7ff7>] kill_block_super+0x27/0x70
> [<ffffffffa0656a71>] gfs2_kill_sb+0x71/0x80 [gfs2]
> [<ffffffff811b792b>] deactivate_locked_super+0x3b/0x70
> [<ffffffff811b79b9>] deactivate_super+0x59/0x60
> [<ffffffff811d2998>] cleanup_mnt+0x58/0x80
> [<ffffffff811d2a12>] __cleanup_mnt+0x12/0x20
> [<ffffffff8108c87d>] task_work_run+0x7d/0xa0
> [<ffffffff8106d7d9>] exit_to_usermode_loop+0x73/0x98
> [<ffffffff81003961>] syscall_return_slowpath+0x41/0x50
> [<ffffffff815a594c>] int_ret_from_sys_call+0x25/0x8f
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> Signed-off-by: Tim Smith <tim.smith at citrix.com>
> Signed-off-by: Mark Syms <mark.syms at citrix.com>
> ---
Hi Mark, Tim, and all,

I pushed patch 2/2 upstream. For now I'll hold off on 1/2 but keep it
on my queue, pending our investigation.
https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/commit/fs/gfs2?h=for-next&id=b7f5a2cd27b76e96fdc6d77b060dfdd877c9d0a9

Regards,

Bob Peterson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/cluster-devel/attachments/20181008/a5fe00da/attachment.htm>


More information about the Cluster-devel mailing list