[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] node kicked out of cluster



My latest test ran 49 hours before a node got kicked out.


cl030:
Feb 18 18:07:40 cl030 kernel: CMAN: node cl030a has been removed from the cluster : No response to messages
Feb 18 18:07:40 cl030 kernel: CMAN: killed by NODEDOWN message
Feb 18 18:07:40 cl030 kernel: CMAN: we are leaving the cluster.
Feb 18 18:07:41 cl030 kernel: dlm: stripefs: recoverd_kick after exit
Feb 18 18:07:41 cl030 kernel:
Feb 18 18:07:41 cl030 kernel: SM: send_nodeid_message error -107 to 2
Feb 18 18:07:42 cl030 kernel: SM: 00000001 sm_stop: SG still joined
Feb 18 18:07:42 cl030 kernel: SM: 01000430 sm_stop: SG still joined
Feb 18 18:07:42 cl030 kernel: SM: 02000431 sm_stop: SG still joined
Feb 18 18:07:42 cl030 ccsd[3766]: [cluster_mgr.c:387] Cluster manager shutdown.

cl031:
Feb 18 18:07:40 cl031 kernel: CMAN: removing node cl030a from the cluster : No response to messages
Feb 18 18:07:41 cl031 fenced[4127]: cl030a not a cluster member after 0 sec post_fail_delay
Feb 18 18:07:41 cl031 fenced[4127]: fencing node "cl030a"
Feb 18 18:07:41 cl031 fence_manual: Node cl030a needs to be reset before recovery can procede.  Waiting for cl030a to rejoin the cluster or for manual acknowledgement that it has been reset (i.e. fence_ack_manual -n cl030a)

cl032:
Feb 18 18:07:40 cl032 kernel: CMAN: node cl030a has been removed from the cluster : No response to messages
Feb 18 18:07:41 cl032 fenced[4262]: fencing deferred to cl031a
Feb 19 04:02:06 cl032 su(pam_unix)[29639]: session opened for user cyrus by (uid=0)

Does this mean heartbeats got lost so cl030 was kicked out?
Full info here:
http://developer.osdl.org/daniel/GFS/test.16feb2005/

Thanks,

Daniel


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]