[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] trouble trying to get ccs/cman working on one machine, not the other


Sorry to bother you all once more.  I'm seeing two problems when trying
to get ccs/cman working.

On my Celeron 2GHz, when I try to start ccsd and cman, all is well.
I start ccsd, then 'cman_tool join', and the machine begins periodically
broadcasting such packets:

	23:22:26.300381 IP > UDP, length 24
	23:22:26.300491 IP > UDP, length 24

However, when I try the exact same thing on a Dual Xeon in the same
subnet, I get this:

	23:19:51.095492 IP > UDP, length 20
	23:19:51.344805 arp who-has tell
	23:19:52.344396 arp who-has tell
	23:19:53.344257 arp who-has tell

The machine begins ARPing for -- but that IP isn't even used
at all!  It doesn't broadcast like the other machines do, and after
waiting for a while, both machines decide to create a new cluster
instead of trying to talk to each other.

Futhermore, when I try to 'cman_tool leave' on the dual proc, I get:

	Jun 26 22:51:43 phi kernel: CMAN: we are leaving the cluster
	Jun 26 22:51:43 phi ccsd[9833]: Received bad communication type on cluster socket. 
	Jun 26 22:51:49 phi last message repeated 106830 times

syslogd then starts looping, until I kill ccsd.  On the uniproc, I
don't get any such error at all when I issue a leave:

	Jun 26 22:51:40 xi kernel: CMAN: we are leaving the cluster
	Jun 26 22:51:40 xi ccsd[2181]: Unable to bind cluster socket: Transport endpoint is not connected 
	Jun 26 22:51:40 xi ccsd[2181]: Exiting... 

I tried a UP kernel (exact same one as on the uniproc) on the dual proc,
but same result.  Anyone any clues?  Anything obvious I forgot?  I've
attached /etc/cluster/cluster.xml -- it's identical on both machines,
they both run the same kernel, and same binary packages (I hope.)  Do I
have to provide more info?

<?xml version="1.0"?>
<cluster name="alpha" config_version="1">


<node name="phi" votes="2">
		<method name="single">
			<device name="human" ipaddr=""/>

<node name="xi" votes="1">
		<method name="single">
			<device name="human" ipaddr=""/>

	<device name="human" agent="fence_manual"/>


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]