[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Cluster-devel] GFS2: Umount recovery race fix



On Thu, May 14, 2009 at 02:13:17PM +0100, Steven Whitehouse wrote:
> 
> This patch fixes a race condition where we can receive recovery
> requests part way through processing a umount. This was causing
> problems since the recovery thread had already gone away.

Do you have some logs showing specifically what happened in both kernel and
userland?

> Looking in more detail at the recovery code, it was really trying
> to implement a slight variation on a work queue, and that happens to
> align nicely with the recently introduced slow-work subsystem. As a
> result I've updated the code to use slow-work, rather than its own home
> grown variety of work queue.
> 
> When using the wait_on_bit() function, I noticed that the wait function
> that was supplied as an argument was appearing in the WCHAN field, so
> I've updated the function names in order to produce more meaningful
> output.

That description doesn't explain how the specific bug was fixed.

I'm guessing that this is the patch that broke gfs2 recovery, although there
are others that muck around with the sysfs control files.

This is what appears in /var/log/messages,

gfs_controld[7901]: start_journal_recovery 3 error -1

And from the daemon debug log,

1249942342 foo start_journal_recovery jid 3
1249942342 foo set /sys/fs/gfs2/bull:foo/lock_module/recover to 3
1249942342 foo set open /sys/fs/gfs2/bull:foo/lock_module/recover error -1 13
1249942342 start_journal_recovery 3 error -1

Dave


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]