[Cluster-devel] [PATCH 5/5] gfs2: dlm based recovery coordination

Tue Dec 20 21:04:05 UTC 2011

On Tue, Dec 20, 2011 at 02:16:43PM -0500, David Teigland wrote:
> On Tue, Dec 20, 2011 at 10:39:08AM +0000, Steven Whitehouse wrote:
> > > I dislike arbitrary delays also, so I'm hesitant to add them.
> > > The choices here are:
> > > - removing NOQUEUE from the requests below, but with NOQUEUE you have a
> > >   much better chance of killing a mount command, which is a fairly nice
> > >   feature, I think.
> > > - removing the delay, which results in nodes often doing fast+repeated
> > >   lock attempts, which could get rather excessive.  I'd be worried about
> > >   having that kind of unlimited loop sitting there.
> > > - using some kind of delay.
> > > 
> > > While I don't like the look of the delay, I like the other options less.
> > > Do you have a preference, or any other ideas?
> > > 
> > Well, I'd prefer to just remove the NOQUEUE command in that case, so
> > that we don't spin here. The dlm request is async anyway, so we should
> > be able to wait for it in an interruptible manner and send a cancel if
> > required.
> 
> I won't do async+cancel here, that would make the code unnecessarily ugly
> and complicated.  There's really no reason to be so dogmatic about delays,
> but since you refuse I'll just make it block, assuming I don't find any
> new problems with that.

Now that I look at it, waiting vs blocking on the lock requests is not the
main issue; removing NOQUEUE doesn't really do anything.  We're waiting
for the other nodes to finish their work and update the state in the lvb.
The only option is to periodically check the lvb, so the only choices are
to do that as fast as possible (not nice), or introduce a delay.