[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Cluster-devel] GFS2: Wait for journal id on mount if not specified on mount command line


On Mon, 2010-06-07 at 12:34 -0500, David Teigland wrote:
> On Mon, Jun 07, 2010 at 04:39:09PM +0100, Steven Whitehouse wrote:
> > 
> > This patch implements a wait for the journal id in the case that it has
> > not been specified on the command line. This is to allow the future
> > removal of the mount.gfs2 helper. The journal id would instead be
> > directly communicated by gfs_controld to the file system. Here is a
> > comparison of the two systems:
> > 
> > Current:
> > 1. mount calls mount.gfs2
> > 2. mount.gfs2 connects to gfs_controld to retrieve the journal id
> > 3. mount.gfs2 adds the journal id to the mount command line and calls
> > the mount system call
> > 4. gfs_controld receives the status of the mount request via a uevent
> > 
> > Proposed:
> > 1. mount calls the mount system call (no mount.gfs2 helper)
> > 2. gfs_controld receives a uevent for a gfs2 fs which it doesn't know
> > about already
> > 3. gfs_controld assigns a journal id to it via sysfs
> > 4. the mount system call then completes as normal (sending a uevent
> > according to status)
> Proposed is the way it originally worked.  I switched to using Current
> back in 2005... unfortunately I don't remember all the specific reasons,
> but I'm pretty sure it was the error/edge cases that were better handled
> without sitting in the kernel early in the process.  (Especially when you
> combine simultaneous mounting / mount failures / node failures / recovery.)
> A couple obvious questions from the start...
> - What if gfs_controld isn't running?
It will hang until mount is killed, where upon it will clean up and exit

> - Won't processes start to access the fs and block during this intermediate
> time between mount(2) and getting a journal id?  All of those processes
> now need errors returned if gfs_controld returns an error instead of a
> journal id.
Nothing can access the fs until mount completes successfully.

> Another way to compare them:
> Current:
> - get all the userspace/clustering-related/error-laden overhead sorted out
> - then, at the very end, pull the kernel fs into the picture
> - collect the result of mount(2) in userpsace, which is almost always
>   "success"
But that isn't the way it works currently. The first mount result (and
recovery results) are collected via uevents, even in the current scheme.

> Proposed:
> - pull the kernel fs into the picture
> - transition to userspace to sort out all the clustering-related /
>   error-laden overhead
> - get back to the kernel with the result
> - collect the result of mount(2) in userspace
> The further you get before you encounter errors, the harder they are to
> handle.  You want most errors to happen earlier, with fewer entities
> involved, so backing out is easier to do.
There isn't a great difference between the error handling in either
case. There is one extra case for the kernel to handle, that of getting
an invalid journal id, or not getting a journal id at all. The actual
sequence of events is pretty similar, the main difference being that
gfs_controld talks directly to the fs in all cases, rather than using
mount.gfs2 as in intermediate point in the communications. Since
gfs_controld already communicates directly with the fs anyway for a
number of other reasons, it seems reasonable to cut out the middle man
in this one remaining case where we have the mount helper in order to
simplify the system.

> IIRC, nfs recently moved to using a mount helper after *not* using one for
> many years.  It would be interesting to ask them about their motivations.
We can ask by all means, but I'm not sure its relevant to this

> > The advantage of the proposed system is that it is completely backward
> > compatible with the current system both at the kernel and at the
> > userland levels. The "first" parameter can also be set the same way,
> > with the restriction that it must be set before the journal id is
> > assigned.
> That's not an "advantage" of new versus old, which is the missing bit of
> information here.  I'm not against changing it per se, but it seems we'd
> want some substantial advantage before going to all the effort of changing
> such a delicate area that has worked quite well for the past 5 years.
> There's room for real, major improvements in this whole area, but you're
> barking up the wrong tree.  gfs_controld has always been far too complex.
> But it's *not* a result of current mount helper scheme.  It is a direct
> result of gfs_controld being required to do jobs that gfs (in kernel)
> should probably handle itself:  allocating journal id's, coordinating who
> does journal recovery, coordinating first mounter recovery, sorting out
> valid combinations of mount options from different nodes, keeping track of
> recovered journals vs journals that haven't been recovered, coordinating
> when all journals have been successfully recovered so that normal fs
> access can be continued.
> If you want to do something that's meaningful and beneficial in this area,
> you need to look at moving *those* things from gfs_controld into gfs.
> Ocfs2 is a good example here, it handles almost all of that stuff in the
> kernel, and leaves only what's really necessary for ocfs2_controld.
> In fact, this could be a perfect area for gfs2/ocfs2 unification:  adopt a
> single fs_controld, single mount/unmount scheme, single node failure/recovery
> notification scheme, single journal id/allocation scheme.
> Dave

The question is then, whether this could be made backward compatible
with what we already have. I'm not sure how we could allocate journal
ids in the kernel since we have no communication mechanism other than
the locking.

Getting rid of mount.gfs2 results in there being one fewer userland
program to maintain. It also removes the one major dependency between
gfs2-utils and the cluster packages making builds much easier. In time
we can simplify gfs_controld too since it would no longer need the code
to talk to mount.gfs2.

Some of these changes are a long way off at the moment, since it will
take some time before we can reasonably make all of the userland
changes. In the mean time though, I think it is important to get the
kernel changes in as soon as we can, in order to give us the opportunity
of making the userland changes at a later date,


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]