[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] CMAN removing nodes from cluster?



Nate Carlson wrote:
> On Thu, 9 Jun 2005, Patrick Caulfield wrote:
> 
>> It sounds like there's some internal routing problem. If the node gets
>> kicked out after 10-15 seconds then no heartbeats are getting through
>> at all.
> 
> 
> Yeah, kind of what I figured.
> 
>> Things to do:
>>
>> Check that the broadcast messages are being sent onto the xen bridge.
> 
> 
> Are these the packets on 6809/udp?


Yes

> Sniffing on both eth0 on the VM and on the Xen bridge, I am seeing
> broadcasts from the two physical hosts (xen1 and xen2, 10.20.0.201 and
> 202):
> 
> 08:52:58.953913 IP xen1.int.technicality.org.6809 > 10.20.0.255.6809:
> UDP, length: 28
> 08:53:03.893870 IP xen2.int.technicality.org.6809 > 10.20.0.255.6809:
> UDP, length: 28
> 08:53:03.953963 IP xen1.int.technicality.org.6809 > 10.20.0.255.6809:
> UDP, length: 28
> 08:53:08.893795 IP xen2.int.technicality.org.6809 > 10.20.0.255.6809:
> UDP, length: 28
> 
> ..but even when the Xen VM is in the cluster, I see some unicast 6808,
> but no broadcast. Odd.. I'll have to investigate that.

There are Unicast as well as broadcast messages - that's why they can
see each other to start with. I wonder if something is filtering out
the broadcasts - is there any iptables filtering on ? I seem to
remember having to turn off anispoof when starting the Xen networking.

So (if I read that correctly) the physical hosts are OK but the VM doesn't
want to play?

If broadcast really seems not to work then you could always try multicast...

>> Check that all the nodes have the same broadcast address.
> 
> 
> First thing I checked.
> 
>> Use cman_tool status to check the node addresses in use by each of the
>> virtual machines.
> 
> 
> Sees the correect address on each node.
> 
>> tcpdump is the thing here.
> 
> 
> Yeah, tcpdump rules.  :)
> 
>> If you compiled the modules yourself then make sure you used ARCH=xen
>> on the make command-line or all the timeouts are way out. if you're
>> using the Fedora packages make sure you have >
>> cman-kernel-xenU-2.6.11.4-20050517.141233
> 
> 
> I built them myself, based on the RHEL4 tree. I'm 99.9% sure that I did
> pass ARCH=xen on all the builds, but I'll rebuild, just to make sure.
> 
>> I do know that Xen clusters work because I'm using it here!
> 
> 
> That's why it's so odd!
> 
> What version of Xen are you using? I just upgraded to a recent snapshot
> of 3.0 (with 2.0, GFS was causing the kernel to crash. Lovely.)
> 

I'm running a slightly old Xen 3.0 snapshot.

-- 

patrick


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]