[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] trouble trying to get ccs/cman working on one machine, not the other



Hi,

Sorry to bother you all once more.  I'm seeing two problems when trying
to get ccs/cman working.

On my Celeron 2GHz, when I try to start ccsd and cman, all is well.
I start ccsd, then 'cman_tool join', and the machine begins periodically
broadcasting such packets:

	23:22:26.300381 IP 10.0.0.1.6809 > 10.0.0.255.6809: UDP, length 24
	23:22:26.300491 IP 10.0.0.1.6809 > 10.0.0.255.6809: UDP, length 24

However, when I try the exact same thing on a Dual Xeon in the same
subnet, I get this:

	23:19:51.095492 IP 10.0.0.3.32770 > 255.255.255.255.50007: UDP, length 20
	23:19:51.344805 arp who-has 10.0.0.9 tell 10.0.0.3
	23:19:52.344396 arp who-has 10.0.0.9 tell 10.0.0.3
	23:19:53.344257 arp who-has 10.0.0.9 tell 10.0.0.3

The machine begins ARPing for 10.0.0.9 -- but that IP isn't even used
at all!  It doesn't broadcast like the other machines do, and after
waiting for a while, both machines decide to create a new cluster
instead of trying to talk to each other.

Futhermore, when I try to 'cman_tool leave' on the dual proc, I get:

	Jun 26 22:51:43 phi kernel: CMAN: we are leaving the cluster
	Jun 26 22:51:43 phi ccsd[9833]: Received bad communication type on cluster socket. 
	Jun 26 22:51:49 phi last message repeated 106830 times

syslogd then starts looping, until I kill ccsd.  On the uniproc, I
don't get any such error at all when I issue a leave:

	Jun 26 22:51:40 xi kernel: CMAN: we are leaving the cluster
	Jun 26 22:51:40 xi ccsd[2181]: Unable to bind cluster socket: Transport endpoint is not connected 
	Jun 26 22:51:40 xi ccsd[2181]: Exiting... 

I tried a UP kernel (exact same one as on the uniproc) on the dual proc,
but same result.  Anyone any clues?  Anything obvious I forgot?  I've
attached /etc/cluster/cluster.xml -- it's identical on both machines,
they both run the same kernel, and same binary packages (I hope.)  Do I
have to provide more info?


cheers,
Lennert
<?xml version="1.0"?>
<cluster name="alpha" config_version="1">

<cman>
</cman>

<nodes>
<node name="phi" votes="2">
	<fence>
		<method name="single">
			<device name="human" ipaddr="10.0.0.3"/>
		</method>
	</fence>
</node>

<node name="xi" votes="1">
	<fence>
		<method name="single">
			<device name="human" ipaddr="10.0.0.1"/>
		</method>
	</fence>
</node>
</nodes>

<fence_devices>
	<device name="human" agent="fence_manual"/>
</fence_devices>

</cluster>

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]