[Cluster-devel] GFS2: Umount recovery race fix
Steven Whitehouse
swhiteho at redhat.com
Tue Aug 11 08:42:39 UTC 2009
Hi,
On Mon, 2009-08-10 at 17:31 -0500, David Teigland wrote:
> On Thu, May 14, 2009 at 02:13:17PM +0100, Steven Whitehouse wrote:
> >
> > This patch fixes a race condition where we can receive recovery
> > requests part way through processing a umount. This was causing
> > problems since the recovery thread had already gone away.
>
> Do you have some logs showing specifically what happened in both kernel and
> userland?
>
Yes, the one you sent to me on Fri, 8 May 2009 11:34:54 -0500 (17:34
BST). Next time please file a bugzilla so that we have a proper record
of the issues.
> > Looking in more detail at the recovery code, it was really trying
> > to implement a slight variation on a work queue, and that happens to
> > align nicely with the recently introduced slow-work subsystem. As a
> > result I've updated the code to use slow-work, rather than its own home
> > grown variety of work queue.
> >
> > When using the wait_on_bit() function, I noticed that the wait function
> > that was supplied as an argument was appearing in the WCHAN field, so
> > I've updated the function names in order to produce more meaningful
> > output.
>
> That description doesn't explain how the specific bug was fixed.
>
The bug was fixed by not allowing recovery on a filesystem after a
umount has occurred.
> I'm guessing that this is the patch that broke gfs2 recovery, although there
> are others that muck around with the sysfs control files.
>
> This is what appears in /var/log/messages,
>
> gfs_controld[7901]: start_journal_recovery 3 error -1
>
> And from the daemon debug log,
>
> 1249942342 foo start_journal_recovery jid 3
> 1249942342 foo set /sys/fs/gfs2/bull:foo/lock_module/recover to 3
> 1249942342 foo set open /sys/fs/gfs2/bull:foo/lock_module/recover error -1 13
> 1249942342 start_journal_recovery 3 error -1
>
> Dave
>
I'll have a look - EPERM (error = 13) is not one of the errno values
which the recover code returns though,
Steve.
More information about the Cluster-devel
mailing list