[Linux-cluster] RHEL Cluster Suite + Xen Dom0 = infinite reboots

Aaron Benner tfrumbacher at gmail.com
Wed Jul 8 21:52:37 UTC 2009


I have 3 xen Dom0 machines upon which I'm trying to build a cluster  
for HA DomUs.  At present the cluster config file simply lists the 3  
nodes.  No fencing, services, resources or failover domains have been  
defined.  I know that this is not what I will need moving to  
production.  I was using the most minimal cluster config I could to  
ensure that my problem was the interaction of Xen and the cluster suite.

The problem is this:  when a node reboots it joins the cluster  
successfully, then xen tears down the network to build xenbr0, vif0.0,  
and peth0 (standard /etc/xen/scripts/network-bridge).  When this  
happens the rebooting node "fails" in the cluster's eyes.  The active  
nodes try to fence it.  Originally I had power fencing enabled and  
this situation resulted in the shootout at the o.k. corral with the  
failed node booting, failing and getting fenced forever.

I did find the gem at the very bottom of the FAQ in the  
GeneralQuestions section (http://sources.redhat.com/cluster/wiki/FAQ/GeneralQuestions#xencluster 
) that mentions this situation.  The "workaround" which also mentions  
a "more permanent solution" seems, well, clunky so I thought I'd ping  
the list to see if the more permanent solution exists and is just not  
well documented or if others have found a solution that doesn't  
require override of the default xen script behavior?





More information about the Linux-cluster mailing list