[Linux-cluster] Node is randomly fenced

Digimer lists at alteeve.ca
Thu Jun 12 04:19:50 UTC 2014


I considered that, but I would expect more nodes to be lost.

On 12/06/14 12:12 AM, Netravali, Ganesh wrote:
> Make sure multicast is enabled across the switches.
>
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Schaefer, Micah
> Sent: Thursday, June 12, 2014 1:20 AM
> To: linux clustering
> Subject: Re: [Linux-cluster] Node is randomly fenced
>
> Okay, I set up active/ backup bonding and will watch for any change.
>
> This is the network side:
>       0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
>       0 output errors, 0 collisions, 0 interface resets
>
>
>
> This is the server side:
>
> em1       Link encap:Ethernet  HWaddr C8:1F:66:EB:46:FD
>            inet addr:x.x.x.x  Bcast:x.x.x.255  Mask:255.255.255.0
>            inet6 addr: fe80::ca1f:66ff:feeb:46fd/64 Scope:Link
>            UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>            RX packets:41274798 errors:0 dropped:0 overruns:0 frame:0
>            TX packets:4459245 errors:0 dropped:0 overruns:0 carrier:0
>            collisions:0 txqueuelen:1000
>            RX bytes:18866207931 (17.5 GiB)  TX bytes:1135415651 (1.0 GiB)
>            Interrupt:34 Memory:d5000000-d57fffff
>
>
>
> I need to run some fiber, but for now two nodes are plugged into one switch and the other two nodes into a separate switch that are on the same subnet. I'll work on cross connecting the bonded interfaces to different switches.
>
>
>
> On 6/11/14, 3:28 PM, "Digimer" <lists at alteeve.ca> wrote:
>
>> The first thing I would do is get a second NIC and configure
>> active-passive bonding. network issues are too common to ignore in HA
>> setups. Ideally, I would span the links across separate stacked switches.
>>
>> As for debugging the issue, I can only recommend to look closely at the
>> system and switch logs for clues.
>>
>> On 11/06/14 02:55 PM, Schaefer, Micah wrote:
>>> I have the issue on two of my nodes. Each node has 1ea 10gb connection.
>>> No
>>> bonding, single link. What else can I look at? I manage the network
>>> too. I  don¹t see any link down notifications, don¹t see any errors on
>>> the ports.
>>>
>>>
>>>
>>>
>>> On 6/11/14, 2:29 PM, "Digimer" <lists at alteeve.ca> wrote:
>>>
>>>> On 11/06/14 02:21 PM, Schaefer, Micah wrote:
>>>>> It failed again, even after deleting all the other failover domains.
>>>>>
>>>>> Cluster conf
>>>>> http://pastebin.com/jUXkwKS4
>>>>>
>>>>> I turned corosync output to debug. How can I go about
>>>>> troubleshooting if  it really is a network issue or something else?
>>>>>
>>>>>
>>>>>
>>>>> Jun 09 13:06:59 corosync [QUORUM] Members[4]: 1 2 3 4 Jun 11
>>>>> 14:10:17 corosync [TOTEM ] A processor failed, forming new
>>>>> configuration.
>>>>> Jun 11 14:10:29 corosync [QUORUM] Members[3]: 1 2 3 Jun 11 14:10:29
>>>>> corosync [TOTEM ] A processor joined or left the membership and a
>>>>> new membership was formed.
>>>>> Jun 11 14:10:29 corosync [CPG   ] chosen downlist: sender r(0)
>>>>> ip(10.70.100.101) ; members(old:4 left:1)
>>>>
>>>> This is, to me, *strongly* indicative of a network issue. It's not
>>>> likely switch-wide as only one member was lost, but I would
>>>> certainly put my money on a network problem somewhere, some how.
>>>>
>>>> Do you use bonding?
>>>>
>>>> --
>>>> Digimer
>>>> Papers and Projects: https://alteeve.ca/w/ What if the cure for
>>>> cancer is trapped in the mind of a person without access to
>>>> education?
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>>
>>
>>
>> --
>> Digimer
>> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer
>> is trapped in the mind of a person without access to education?
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?




More information about the Linux-cluster mailing list