[Cluster-devel] GFS2: Wait for journal id on mount if not specified on mount command line

David Teigland teigland at redhat.com
Tue Jun 8 20:37:55 UTC 2010


On Tue, Jun 08, 2010 at 09:34:52AM +0100, Steven Whitehouse wrote:
> > A couple obvious questions from the start...
> > - What if gfs_controld isn't running?
> It will hang until mount is killed, where upon it will clean up and exit
> gracefully.

Right, so instead of failing with an error, it hangs and requires a user
to intervene and kill it.  That's not good.

> > - Won't processes start to access the fs and block during this intermediate
> > time between mount(2) and getting a journal id?  All of those processes
> > now need errors returned if gfs_controld returns an error instead of a
> > journal id.
> Nothing can access the fs until mount completes successfully.

Right, but does the access block indefinately waiting for all the userland
nonsense to complete?

> > Current:
> > - get all the userspace/clustering-related/error-laden overhead sorted out
> > - then, at the very end, pull the kernel fs into the picture
> > - collect the result of mount(2) in userpsace, which is almost always
> >   "success"
> > 
> But that isn't the way it works currently. The first mount result (and
> recovery results) are collected via uevents, even in the current scheme.
> 
> > Proposed:
> > - pull the kernel fs into the picture
> > - transition to userspace to sort out all the clustering-related /
> >   error-laden overhead
> > - get back to the kernel with the result
> > - collect the result of mount(2) in userspace
> > 
> > The further you get before you encounter errors, the harder they are to
> > handle.  You want most errors to happen earlier, with fewer entities
> > involved, so backing out is easier to do.
> > 
> There isn't a great difference between the error handling in either
> case. There is one extra case for the kernel to handle, that of getting
> an invalid journal id, or not getting a journal id at all. The actual
> sequence of events is pretty similar, the main difference being that
> gfs_controld talks directly to the fs in all cases, rather than using
> mount.gfs2 as in intermediate point in the communications. Since
> gfs_controld already communicates directly with the fs anyway for a
> number of other reasons, it seems reasonable to cut out the middle man
> in this one remaining case where we have the mount helper in order to
> simplify the system.

My comments on really simplifying things are at the end; this just
rearranges the complexity.  If we're going to take the dive into reworking
this code, at least make it worth the effort.

> The question is then, whether this could be made backward compatible
> with what we already have. I'm not sure how we could allocate journal
> ids in the kernel since we have no communication mechanism other than
> the locking.

The dlm should work fine.  You'd want to move all the journal related
stuff into gfs2: allocating jid, knowing which jid's are used by which
nodes, knowing which journals need recovery, etc.  Ocfs2 does all this
stuff itself; you may be able to copy a bunch of it (or share!)

> Getting rid of mount.gfs2 results in there being one fewer userland
> program to maintain. It also removes the one major dependency between
> gfs2-utils and the cluster packages making builds much easier.  In time
> we can simplify gfs_controld too since it would no longer need the code
> to talk to mount.gfs2.
>
> Some of these changes are a long way off at the moment, since it will
> take some time before we can reasonably make all of the userland
> changes. In the mean time though, I think it is important to get the
> kernel changes in as soon as we can, in order to give us the opportunity
> of making the userland changes at a later date,

The changes you're suggesting may seem minor to you, but they involve
changing how very core and delicate interactions work.  It's senseless
to change that kind of stuff without a very good reason.  Serious
simplification would qualify, but rearranging the deck chairs doesn't.

Dave




More information about the Cluster-devel mailing list