[Cluster-devel] fence daemon problems

Dietmar Maurer dietmar at proxmox.com
Wed Oct 3 09:25:08 UTC 2012


> I observe strange problems with fencing when a cluster loose quorum for a
> short time.
> 
> After regain quorum, fenced reports 'wait state   messages', and whole
> cluster is blocked waiting for fenced.

Just found the following in fenced/cpg.c:

		/* This is how we deal with cpg's that are partitioned and
		   then merge back together.  When the merge happens, the
		   cpg on each side will see nodes from the other side being
		   added, and neither side will have zero started_count.  So,
		   both sides will ignore start messages from the other side.
		   This causes the the domain on each side to continue waiting
		   for the missing start messages indefinately.  To unblock
		   things, all nodes from one side of the former partition
		   need to fail. */

So the observed behavior is expected? 







More information about the Cluster-devel mailing list