[Cluster-devel] cluster4 gfs_controld

Steven Whitehouse swhiteho at redhat.com
Thu Oct 13 16:16:29 UTC 2011


Hi,

On Thu, 2011-10-13 at 11:30 -0400, David Teigland wrote:
> On Thu, Oct 13, 2011 at 03:41:31PM +0100, Steven Whitehouse wrote:
> > > cluster4
> > > . jid from dlm-kernel "slots" which will be assigned similarly
> > What is the actual algorithm used to assign these slots?
> 
> The same as picking jids: lowest unused id starting with 0.  As for
> implementation, I'll add it to the current dlm recovery messages.
> 
Yes, but the current implementation uses corosync to enforce ordering of
events, so I'm wondering how the dlm will do that after the change.

> (Frankly, I'd really like to just set jid to nodeid-1.  Any support for
> that?  It would obviously add a slight requirement to picking nodeid's,
> which 99.9% of people already do.)
> 
The problem is that if you have a cluster with lots of nodes, but where
each fs is only mounted by a small number of them, we'd have to insist
on always creating as many journals as there are nodes in the cluster.

> > > . first mounter using a dlm lock in lock_dlm
> > > 
> > That sounds good to me. The thing we need to resolve is how do we get
> > from one to the other. We may have to introduce a new name for the lock
> > protocol to avoid people accidentally using both schemes in the same
> > cluster.
> 
> Compatibility rests on the fact that the new dlm kernel features will only
> work when the cluster4 dlm_controld is used.
> 
> dlm_controld.v3 running: dlm_recover_register() returns an error, and
> everything falls back to working as it does now, with gfs_controld.v3 etc.
> 
> dlm_controld.v4 running: dlm_recover_register() works, lock_dlm sets jid
> and first.  (gfs_controld.v3 will fail to even run with dlm_controld.v4,
> and other combinations of v3/v4 daemons will also fail to run together.)
> 
> > > cluster4
> > > . coordination of dlm-kernel/gfs-kernel recovery is done
> > >   directly in kernel using callbacks from dlm-kernel to gfs-kernel.
> > > . gdlm_mount(struct gfs2_sbd *sdp, const char *table, int *first, int *jid)
> > >   calls dlm_recover_register(dlm, &jid, &recover_callbacks)
> > Can we not just pass the extra functions to dlm_create_lockspace? That
> > seems a bit simpler than adding an extra function just to register the
> > callbacks,
> 
> Yes we could; I may do that.  Returning the error mentioned above becomes
> less direct.  I'd have to overload the jid arg, or add another to indicate
> the callbacks are enabled.
> 
Another alternative is just to add a member of the recover_callbacks
structure which would be a function taking the first and jid as
arguments and the dlm can call that to pass the into to gfs2.

That way dlm users who don't care about that would just leave those
functions NULL, for example,

Steve.





More information about the Cluster-devel mailing list