[Linux-cluster] trouble trying to get ccs/cman working on one machine, not the other

Lennert Buytenhek buytenh at wantstofly.org
Sat Jun 26 21:30:57 UTC 2004


Hi,

Sorry to bother you all once more.  I'm seeing two problems when trying
to get ccs/cman working.

On my Celeron 2GHz, when I try to start ccsd and cman, all is well.
I start ccsd, then 'cman_tool join', and the machine begins periodically
broadcasting such packets:

	23:22:26.300381 IP 10.0.0.1.6809 > 10.0.0.255.6809: UDP, length 24
	23:22:26.300491 IP 10.0.0.1.6809 > 10.0.0.255.6809: UDP, length 24

However, when I try the exact same thing on a Dual Xeon in the same
subnet, I get this:

	23:19:51.095492 IP 10.0.0.3.32770 > 255.255.255.255.50007: UDP, length 20
	23:19:51.344805 arp who-has 10.0.0.9 tell 10.0.0.3
	23:19:52.344396 arp who-has 10.0.0.9 tell 10.0.0.3
	23:19:53.344257 arp who-has 10.0.0.9 tell 10.0.0.3

The machine begins ARPing for 10.0.0.9 -- but that IP isn't even used
at all!  It doesn't broadcast like the other machines do, and after
waiting for a while, both machines decide to create a new cluster
instead of trying to talk to each other.

Futhermore, when I try to 'cman_tool leave' on the dual proc, I get:

	Jun 26 22:51:43 phi kernel: CMAN: we are leaving the cluster
	Jun 26 22:51:43 phi ccsd[9833]: Received bad communication type on cluster socket. 
	Jun 26 22:51:49 phi last message repeated 106830 times

syslogd then starts looping, until I kill ccsd.  On the uniproc, I
don't get any such error at all when I issue a leave:

	Jun 26 22:51:40 xi kernel: CMAN: we are leaving the cluster
	Jun 26 22:51:40 xi ccsd[2181]: Unable to bind cluster socket: Transport endpoint is not connected 
	Jun 26 22:51:40 xi ccsd[2181]: Exiting... 

I tried a UP kernel (exact same one as on the uniproc) on the dual proc,
but same result.  Anyone any clues?  Anything obvious I forgot?  I've
attached /etc/cluster/cluster.xml -- it's identical on both machines,
they both run the same kernel, and same binary packages (I hope.)  Do I
have to provide more info?


cheers,
Lennert
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cluster.xml
Type: text/xml
Size: 461 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040626/14f4017f/attachment.xml>


More information about the Linux-cluster mailing list