[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Cluster-devel] cluster/group/daemon cman.c cpg.c gd_internal. ...



David Teigland wrote:
Might be a good idea, I don't really know.  I'm not even sure we'd need to
save much or any additional state that couldn't be pulled from the gfs/dlm
instances themselves.  It seems to me the challenge would be writing the
daemons so they could put all the pieces and interconnections back
together again.

If this ends up being a big enough problem to get more attention, I think
the first practical improvement we could make is something like
blocking/clearing i/o from the residual fs's (like we do in withdraw) and
adding the ability to fully purge instances of gfs/dlm from the kernel
without rebooting the node.  Then the machines could all start from
scratch without rebooting or fencing
Here's another idea that came to me:

For critical cluster processes like cman and fenced, maybe we could use init's ability to restart processes, i.e. the "respawn" option in /etc/inittab. Maybe we can use "respawn" or something similar to ensure that if a critical process like fenced dies, it gets restarted automatically and immediately. Of course, that might cause problems for shutdown, etc., and it would probably make it harder to test certain things...

Bob Peterson
Red Hat Cluster Suite


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]