[Cluster-devel] [PATCH 5/5] gfs2: dlm based recovery coordination

David Teigland teigland at redhat.com
Tue Dec 20 19:16:43 UTC 2011


On Tue, Dec 20, 2011 at 10:39:08AM +0000, Steven Whitehouse wrote:
> > I dislike arbitrary delays also, so I'm hesitant to add them.
> > The choices here are:
> > - removing NOQUEUE from the requests below, but with NOQUEUE you have a
> >   much better chance of killing a mount command, which is a fairly nice
> >   feature, I think.
> > - removing the delay, which results in nodes often doing fast+repeated
> >   lock attempts, which could get rather excessive.  I'd be worried about
> >   having that kind of unlimited loop sitting there.
> > - using some kind of delay.
> > 
> > While I don't like the look of the delay, I like the other options less.
> > Do you have a preference, or any other ideas?
> > 
> Well, I'd prefer to just remove the NOQUEUE command in that case, so
> that we don't spin here. The dlm request is async anyway, so we should
> be able to wait for it in an interruptible manner and send a cancel if
> required.

I won't do async+cancel here, that would make the code unnecessarily ugly
and complicated.  There's really no reason to be so dogmatic about delays,
but since you refuse I'll just make it block, assuming I don't find any
new problems with that.

> > > Again - I don't want to add arbitrary delays into the code. Why is this
> > > waiting for half a second? Why not some other length of time? We should
> > > figure out how to wait for the end of the first mounter recovery some
> > > other way if that is what is required.
> > 
> > This msleep slows down a rare loop to wake up a couple times vs once with
> > a proper wait mechanism.  It's waiting for the next recover_done()
> > callback, which the dlm will call when it is done with recovery.  We do
> > have the option here of using a standard wait mechanism, wait_on_bit() or
> > something.  I'll see if any of those would work here without adding too
> > much to the code.
> > 
> Ok. That would be a better option I think.

Only if it doesn't make things more (unnecessarily) complex.

Dave




More information about the Cluster-devel mailing list