Re: [Linux-cluster] RHEL Cluster Suite + Xen Dom0 = infinite reboots

On Wed, Jul 8, 2009 at 11:52 PM, Aaron Benner
I have 3 xen Dom0 machines upon which I'm trying to build a cluster for HA DomUs.  At present the cluster config file simply lists the 3 nodes.  No fencing, services, resources or failover domains have been defined.  I know that this is not what I will need moving to production.  I was using the most minimal cluster config I could to ensure that my problem was the interaction of Xen and the cluster suite.

The problem is this:  when a node reboots it joins the cluster successfully, then xen tears down the network to build xenbr0, vif0.0, and peth0 (standard /etc/xen/scripts/network-bridge).  When this happens the rebooting node "fails" in the cluster's eyes.  The active nodes try to fence it.  Originally I had power fencing enabled and this situation resulted in the shootout at the o.k. corral with the failed node booting, failing and getting fenced forever.
So, you have fencing configured among the domU's cluster but not in the dom0 cluster, haven't you? And this behavior happens in the dom0's cluster. Maybe you should configure an additional physical network interface (or bonding of interfaces) independent from the one used by xen to be used as the cluster main comms interface.


I did find the gem at the very bottom of the FAQ in the GeneralQuestions section (http://sources.redhat.com/cluster/wiki/FAQ/GeneralQuestions#xencluster) that mentions this situation.  The "workaround" which also mentions a "more permanent solution" seems, well, clunky so I thought I'd ping the list to see if the more permanent solution exists and is just not well documented or if others have found a solution that doesn't require override of the default xen script behavior?

