[Cluster-devel] [PATCH 0/2] GFS2 locking patches

Mon Oct 8 12:36:29 UTC 2018

A pair of locking issues in GFS2 observed when running VM storgae stress
tests.

0001-GFS2-use-schedule-timeout-in-find-insert-glock.patch covers a case
where an application level flock would wedge. The VM control plane makes
extensive use of flocks to control access to VM virtual disks and databases
and we envountered several failed tests where the flocks did not get acquired
even when noone was holding them. Investigation indicates that there is a
race in find_insert_glock where the call to schedule can be called when
the expected waiter has already completed its work. Replace schedule with
schedule_timeout and log.

0002-GFS2-Flush-the-GFS2-delete-workqueue-before-stopping.patch covers a
case where umount would wedge unrecoverably. The completion of the stress
test involves the deletion of the test machines and virtual disks followed
by the filesystem being unmounted on all hosts before the hosts are returned
to the lab pool. umount was found to wedge and this has been traced to
gfs2_log_reserve being called in the flush_workqueue but after the associated
kthread processes had been stopped. Thus there was nobody to handle the
log reserver request and the code wedged.

Mark Syms (1):
  GFS2: use schedule timeout in find insert glock

Tim Smith (1):
  GFS2: Flush the GFS2 delete workqueue before stopping the kernel
    threads

 fs/gfs2/glock.c | 3 ++-
 fs/gfs2/super.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

-- 
1.8.3.1