[Cluster-devel] cluster/group/daemon cman.c cpg.c gd_internal. ...

Tue Jun 20 19:19:51 UTC 2006

On Tue, Jun 20, 2006 at 01:56:27PM -0500, Robert Peterson wrote:
> teigland at sourceware.org wrote:
> >	Moving the cluster infrastructure to userland introduced a new
> >	problem that we didn't need to worry about before.  All cluster
> >	state now exists in userland processes which can go away and then
> >	come back like new, i.e.  unaware of the previous state.
> Hi Dave,
> 
> You know this new development cman stuff and I really don't, but I was
> just thinking:
> 
> If we used a shared memory segment, we could hold state information
> there and then cman would remember the cluster state after process
> termination and restart, possibly making this whole thing unnecessary.
> Just a thought.  Of course, one could also argue that if the process
> terminated, can we really trust the state information it had at the
> time?

Might be a good idea, I don't really know.  I'm not even sure we'd need to
save much or any additional state that couldn't be pulled from the gfs/dlm
instances themselves.  It seems to me the challenge would be writing the
daemons so they could put all the pieces and interconnections back
together again.

If this ends up being a big enough problem to get more attention, I think
the first practical improvement we could make is something like
blocking/clearing i/o from the residual fs's (like we do in withdraw) and
adding the ability to fully purge instances of gfs/dlm from the kernel
without rebooting the node.  Then the machines could all start from
scratch without rebooting or fencing.