Re: [Linux-cluster] Graceful recover after connectivity failure

Cliff Hones wrote:
> I am using Centos5.1 with GNBD and GNBD fencing.
> Following the failure of a cluster member - eg a temporary
> loss of connectivity - which results in the node being
> fenced, is there a clean way to re-join the cluster without
> having to reboot the affected node?

Basically, no.

If a node is apart from the cluster for any period of time, it can't
tell whether the state of that cluster has changed while it was
disconnected. So it must be fenced and restart the cluster software from
the beginning to rebuild it's state from scratch.

> I am finding that it is impossible to shut down or restart the
> cluster components on the affected node, and even trying to force
> a reboot from a ssh session just hangs.
> There seems to be a chicken-and-egg situation - a gfs filesystem
> cannot be unmounted if the node is fenced, and cman/clvmd cannot
> be stopped/restarted if a filesystem is mounted.   Forcibly
> trying to kill the cluster processes also fails.


