[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] 3 node cluster crashes



 

I have a 3 node cluster running cman-2.0.84-2.el5.  At times we have spanning tree events that cause network storms up to 9 seconds.

When these events  occur (today we caused them twice to verify this issue). All three nodes go down within seconds of this event.

 

The second time we tried it I added the totem token statement shown below. Same problem.

 

 

 

 

 

<cman>

                <multicast addr="225.0.0.11"/>

                <totem token="21000"/>

        </cman>

 

 

 

Aug  5 16:41:18 csarcsys2-eth0 ntpd[3484]: kernel time sync enabled 0001

Aug  5 16:41:19 csarcsys2-eth0 openais[3096]: [TOTEM] The token was lost in the OPERATIONAL state.

Aug  5 16:41:19 csarcsys2-eth0 openais[3096]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes).

Aug  5 16:41:19 csarcsys2-eth0 openais[3096]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes).

Aug  5 16:41:19 csarcsys2-eth0 openais[3096]: [TOTEM] entering GATHER state from 2.

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] entering GATHER state from 0.

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] Creating commit token because I am the rep.

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] Saving state aru 46 high seq received 46

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] Storing new sequence id for ring b50

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] entering COMMIT state.

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] entering RECOVERY state.

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] position [0] member 172.xx.xx.xxx:

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] previous ring seq 2892 rep 172.xx.xxx.xx

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] aru 46 high delivered 46 received flag 1

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] Did not need to originate any messages in recovery.

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] Sending initial ORF token

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ] CLM CONFIGURATION CHANGE

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ] New Configuration:

Aug  5 16:41:24 csarcsys2-eth0 kernel: dlm: closing connection to node 1

Aug  5 16:41:24 csarcsys2-eth0 clurgmgrd[3750]: <emerg> #1: Quorum Dissolved

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ]   r(0) ip(172. xx.xxx.xx)

Aug  5 16:41:24 csarcsys2-eth0 kernel: dlm: closing connection to node 3

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ] Members Left:

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ]   r(0) ip(172. xx.xxx.xx)

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ]   r(0) ip(172. xx.xxx.xx)

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ] Members Joined:

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CMAN ] quorum lost, blocking activity

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ] CLM CONFIGURATION CHANGE

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ] New Configuration:

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ]   r(0) ip(172. xx.xxx.xx)

Aug  5 16:41:24 csarcsys2-eth0 ccsd[3031]: Cluster is not quorate.  Refusing connection.

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ] Members Left:

Aug  5 16:41:24 csarcsys2-eth0 ccsd[3031]: Error while processing connect: Connection refused

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ] Members Joined:

Aug  5 16:41:24 csarcsys2-eth0 ccsd[3031]: Invalid descriptor specified (-111).

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [SYNC ] This node is within the primary component and will provide service.

Aug  5 16:41:24 csarcsys2-eth0 ccsd[3031]: Someone may be attempting something evil.

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] entering OPERATIONAL state.

Aug  5 16:41:24 csarcsys2-eth0 ccsd[3031]: Error while processing get: Invalid request descriptor

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ] got nodejoin message 172.24.86.143

Aug  5 16:41:24 csarcsys2-eth0 ccsd[3031]: Cluster is not quorate.  Refusing connection.

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CPG  ] got joinlist message from node 2

Aug  5 16:41:24 csarcsys2-eth0 ccsd[3031]: Error while processing connect: Connection refused

Aug  5 16:41:24 csarcsys2-eth0 ccsd[3031]: Invalid descriptor specified (-111).


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]