[Cluster-devel] [PATCH 5/5] gfs2: dlm based recovery coordination

Wed Dec 21 10:45:21 UTC 2011

Hi,

On Tue, 2011-12-20 at 16:04 -0500, David Teigland wrote:
> On Tue, Dec 20, 2011 at 02:16:43PM -0500, David Teigland wrote:
> > On Tue, Dec 20, 2011 at 10:39:08AM +0000, Steven Whitehouse wrote:
> > > > I dislike arbitrary delays also, so I'm hesitant to add them.
> > > > The choices here are:
> > > > - removing NOQUEUE from the requests below, but with NOQUEUE you have a
> > > >   much better chance of killing a mount command, which is a fairly nice
> > > >   feature, I think.
> > > > - removing the delay, which results in nodes often doing fast+repeated
> > > >   lock attempts, which could get rather excessive.  I'd be worried about
> > > >   having that kind of unlimited loop sitting there.
> > > > - using some kind of delay.
> > > > 
> > > > While I don't like the look of the delay, I like the other options less.
> > > > Do you have a preference, or any other ideas?
> > > > 
> > > Well, I'd prefer to just remove the NOQUEUE command in that case, so
> > > that we don't spin here. The dlm request is async anyway, so we should
> > > be able to wait for it in an interruptible manner and send a cancel if
> > > required.
> > 
> > I won't do async+cancel here, that would make the code unnecessarily ugly
> > and complicated.  There's really no reason to be so dogmatic about delays,
> > but since you refuse I'll just make it block, assuming I don't find any
> > new problems with that.
> 
> Now that I look at it, waiting vs blocking on the lock requests is not the
> main issue; removing NOQUEUE doesn't really do anything.  We're waiting
> for the other nodes to finish their work and update the state in the lvb.
> The only option is to periodically check the lvb, so the only choices are
> to do that as fast as possible (not nice), or introduce a delay.
> 

I don't think I understand whats going on in that case. What I thought
should be happening was this:

 - Try to get mounter lock in EX
   - If successful, then we are the first mounter so recover all
journals
   - Write info into LVB
   - Drop mounter lock to PR so other nodes can mount

 - If failed to get mounter lock in EX, then wait for lock in PR state
   - This will block until the EX lock is dropped to PR
   - Read info from LVB

So a node with the mounter lock in EX knows that it is always the first
mounter and will recover all journals before demoting the mounter lock
to PR. A node with the mounter lock in PR may only recover its own
journal (at mount time).

That makes this the exact equivalent of what we currently do with the
first mounter flag from gfs_controld.

So I guess what I can't quite figure out is how it it possible for the
LVB to be out of sync with the lock state of the mounter lock,

Steve.