[Linux-cluster] 2-node cluster fence loop

emmanuel segura emi2fast at gmail.com
Thu Jun 12 15:05:03 UTC 2014


I always used "tcpdump -ni bond1 port 5405" to check if both nodes are
involved in the comunication, if isn't like that, that would say is
multicast problem

2014-06-12 16:43 GMT+02:00 Kaloyan Kovachev <kkovachev at varna.net>:
> Do you have a different auth key on each node by any chance?
>
>
> On 2014-06-12 17:29, Arun G Nair wrote:
>
>> We have multicast enabled on the switch. I've also tried the multicast.py
>> tool from RH's knowledge base to test multicast and I see the expected
>> output, though the tool uses a different multicast IP( guess that shouldn't
>> matter). I've tried increasing the post_join_delay to 360 seconds to give me
>> enough time to check everything on both the nodes. One node still gets
>> fenced. `clustat` output says the other node is offline on both servers. So
>> one node can't see the other one ? This again points to issue with
>> multicast. Any other clues as to what/where to look ?
>>
>> On Wed, Jun 11, 2014 at 8:33 PM, Digimer <lists at alteeve.ca> wrote:
>>
>> On 11/06/14 10:48 AM, Arun G Nair wrote:
>> Hello,
>>
>> What are the reasons for fence loops when only cman is started ? We
>> have an RHEL 6.5 2-node cluster which goes in to a fence loop and every
>> time we start cman on both nodes. Either one fences the other. Multicast
>> seems to be working properly. My understanding is that without rgmanager
>> running there won't be a multicast group subscription ? I don't see the
>> multicast address in 'netstat -g' unless rgmanager is running. I've
>> tried to increase the fence post_join_delay but one of the nodes still
>> gets fenced.
>>
>> The cluster works fine if we use unicast UDP.
>>
>> Thanks, Hi,
>>
>> When cman starts, it waits post_join_delay seconds for the peer to
>> connect. If, after that time expires (6 seconds by default, iirc), it gives
>> up and calls a fence against the peer to put it into a known state.
>>
>> Corosync is what determines membership, and it is started by cman. The
>> rgmanager only handles resource start/stop/relocate/recovery and has nothing
>> to do with fencing directly. Corosync is what uses multicast.
>>
>> So as you seem to have already surmised, multicast is probably not working
>> in your environment. Have you enabled multicast traffic on the firewall? Do
>> your switches support multicast properly?
>>
>> digimer
>>
>> --
>> Digimer
>> Papers and Projects: https://alteeve.ca/w/ [1]
>>
>> What if the cure for cancer is trapped in the mind of a person without
>> access to education?
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster [2]
>
>
> --
> Arun G Nair
> Sr. Sysadmin
> Dimension Data | Ph: (800) 664-9973
> Feedback? We're listening [3]
>
>
>
> Links:
> ------
> [1] https://alteeve.ca/w/
> [2] https://www.redhat.com/mailman/listinfo/linux-cluster
> [3] http://www.surveymonkey.com/s/XRCYXBH
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



-- 
esta es mi vida e me la vivo hasta que dios quiera




More information about the Linux-cluster mailing list