[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Cluster-devel] GFS2: Wait for journal id on mount if not specified on mount command line


On Tue, 2010-06-08 at 15:37 -0500, David Teigland wrote:
> On Tue, Jun 08, 2010 at 09:34:52AM +0100, Steven Whitehouse wrote:
> > > A couple obvious questions from the start...
> > > - What if gfs_controld isn't running?
> > It will hang until mount is killed, where upon it will clean up and exit
> > gracefully.
> Right, so instead of failing with an error, it hangs and requires a user
> to intervene and kill it.  That's not good.
Well we could make it time out, but I fail to see why its not good. They
are not going to be able to mount the fs anyway, given that this is only
going to occur when gfs_controld is not configured.

> > > - Won't processes start to access the fs and block during this intermediate
> > > time between mount(2) and getting a journal id?  All of those processes
> > > now need errors returned if gfs_controld returns an error instead of a
> > > journal id.
> > Nothing can access the fs until mount completes successfully.
> Right, but does the access block indefinately waiting for all the userland
> nonsense to complete?
Yes, although we could make it timeout, but I'm not convinced that we
need to do that.

> > > Current:
> > > - get all the userspace/clustering-related/error-laden overhead sorted out
> > > - then, at the very end, pull the kernel fs into the picture
> > > - collect the result of mount(2) in userpsace, which is almost always
> > >   "success"
> > > 
> > But that isn't the way it works currently. The first mount result (and
> > recovery results) are collected via uevents, even in the current scheme.
> > 
> > > Proposed:
> > > - pull the kernel fs into the picture
> > > - transition to userspace to sort out all the clustering-related /
> > >   error-laden overhead
> > > - get back to the kernel with the result
> > > - collect the result of mount(2) in userspace
> > > 
> > > The further you get before you encounter errors, the harder they are to
> > > handle.  You want most errors to happen earlier, with fewer entities
> > > involved, so backing out is easier to do.
> > > 
> > There isn't a great difference between the error handling in either
> > case. There is one extra case for the kernel to handle, that of getting
> > an invalid journal id, or not getting a journal id at all. The actual
> > sequence of events is pretty similar, the main difference being that
> > gfs_controld talks directly to the fs in all cases, rather than using
> > mount.gfs2 as in intermediate point in the communications. Since
> > gfs_controld already communicates directly with the fs anyway for a
> > number of other reasons, it seems reasonable to cut out the middle man
> > in this one remaining case where we have the mount helper in order to
> > simplify the system.
> My comments on really simplifying things are at the end; this just
> rearranges the complexity.  If we're going to take the dive into reworking
> this code, at least make it worth the effort.
> > The question is then, whether this could be made backward compatible
> > with what we already have. I'm not sure how we could allocate journal
> > ids in the kernel since we have no communication mechanism other than
> > the locking.
> The dlm should work fine.  You'd want to move all the journal related
> stuff into gfs2: allocating jid, knowing which jid's are used by which
> nodes, knowing which journals need recovery, etc.  Ocfs2 does all this
> stuff itself; you may be able to copy a bunch of it (or share!)
OCFS2 appears to keep the journal state (wrt to allocation to nodes and
also recovery state) on disk. Whilst that might well be sensible, it
would be rather tricky to try and introduce that at this stage. It
couldn't possibly be backward compatible after that change, since older
nodes would not know how to read that information, and would thus still
be reliant on the existing infrastructure.

I don't want to rule out making a change if there is a sensible way to
implement it, but its not obvious to me how we'd do that at the moment.

> > Getting rid of mount.gfs2 results in there being one fewer userland
> > program to maintain. It also removes the one major dependency between
> > gfs2-utils and the cluster packages making builds much easier.  In time
> > we can simplify gfs_controld too since it would no longer need the code
> > to talk to mount.gfs2.
> >
> > Some of these changes are a long way off at the moment, since it will
> > take some time before we can reasonably make all of the userland
> > changes. In the mean time though, I think it is important to get the
> > kernel changes in as soon as we can, in order to give us the opportunity
> > of making the userland changes at a later date,
> The changes you're suggesting may seem minor to you, but they involve
> changing how very core and delicate interactions work.  It's senseless
> to change that kind of stuff without a very good reason.  Serious
> simplification would qualify, but rearranging the deck chairs doesn't.
> Dave

This is a serious simplification. We lose a dependency between packages
and at the same time, we have one less program to maintain. Making this
change doesn't preclude further changes if we can then work out how to
implement them in a backward compatible way.

I didn't claim that the change was minor, but it is certainly a lot less
invasive than your suggestion of replacing the whole system. Thats not
to say I'm against doing that, but I'd prefer to do it carefully, one
bit at a time,


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]