[Linux-cluster] CMAN removing nodes from cluster?

Patrick Caulfield pcaulfie at redhat.com
Thu Jun 9 06:56:40 UTC 2005


Nate Carlson wrote:
> Hey all,
> 
> I've got my cluster set up with two physical nodes, and am trying to
> start a virtual machine on one of the nodes. Things seem to work, but I
> fairly quickly get the following errors (sorry, haven't set up ntp yet,
> so the clock is wrong - these messages occured at the same time):
> 
> Jun  8 17:52:21 xen1 kernel: CMAN: removing node
> xen-test-vm1.int.technicality.org from the cluster : Missed too many
> heartbeats
> Jun  8 17:51:33 xen-test-vm1 kernel: CMAN: Being told to leave the
> cluster by node 1
> Jun  8 17:51:33 xen-test-vm1 kernel: CMAN: we are leaving the cluster.
> 
> How does the Heartbeat configuration work? How should I debug this?
> 
> All I need to do on xen-test-vm1 to rejoin is 'cman_tool join', and
> everything works for 10-15 seconds until it gets kicked again.

It sounds like there's some internal routing problem. If the node gets
kicked out after 10-15 seconds then no heartbeats are getting through at all.

Things to do:

Check that the broadcast messages are being sent onto the xen bridge.
Check that all the nodes have the same broadcast address.
Use cman_tool status to check the node addresses in use by each of the
virtual machines.
tcpdump is the thing here.
If you compiled the modules yourself then make sure you used ARCH=xen
on the make command-line or all the timeouts are way out. if you're using
the Fedora packages make sure you have > cman-kernel-xenU-2.6.11.4-20050517.141233


I do know that Xen clusters work because I'm using it here!



patrick




More information about the Linux-cluster mailing list