[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

RE: [Linux-cluster] Strange Behavior



Nevermind. This was all due to incorrect time on a couple of the nodes. One node was in the past, and one was in the future.
 
It may be beneficial to fix this as it DOES cause a kernel panic. Maybe add some kind of time sync check to disallow a node from joining when its time isn't within X of the cluster.
 
Robert Gil
Linux Systems Administrator
American Home Mortgage
Phone: 631-622-8410
Cell: 631-827-5775
Fax: 516-495-5861
 


From: linux-cluster-bounces redhat com [mailto:linux-cluster-bounces redhat com] On Behalf Of Robert Gil
Sent: Tuesday, May 22, 2007 11:49 AM
To: linux-cluster redhat com
Subject: [Linux-cluster] Strange Behavior

I am getting some strange behavior on a 4 node cluster. When node dbs2 tries to connect to the cluster, node app3 either kernel panics or ccsd and rgmanager crash. Node dbs2 says that the heartbeats drop off and it goes to remove itself from the cluster. I am curious why node app3 would crash, and what these SM messages are. Also why node dbs2 would connect to the cluster, become quorate, and then drop off and crash node 1. Has anyone seen this before?
 
 
/var/log/messages
 
May 22 11:34:36 melqsjssapp03 kernel: CMAN: node melqsjssdbs02.americanhm.com rejoining
May 22 11:35:11 melqsjssapp03 kernel: CMAN: node melqsjssdbs02.americanhm.com has been removed from the cluster : Missed too many heartbeats
May 22 11:35:25 melqsjssapp03 kernel: CMAN: node melqsjssapp03.americanhm.com has been removed from the cluster : No response to messages
May 22 11:35:25 melqsjssapp03 kernel: CMAN: killed by NODEDOWN message
May 22 11:35:25 melqsjssapp03 kernel: CMAN: we are leaving the cluster. No response to messages
May 22 11:35:25 melqsjssapp03 kernel: WARNING: dlm_emergency_shutdown
May 22 11:35:25 melqsjssapp03 kernel: WARNING: dlm_emergency_shutdown
May 22 11:35:25 melqsjssapp03 kernel: SM: 00000011 sm_stop: SG still joined
May 22 11:35:25 melqsjssapp03 kernel: SM: 01000014 sm_stop: SG still joined
May 22 11:35:25 melqsjssapp03 kernel: SM: 0200001a sm_stop: SG still joined
May 22 11:35:25 melqsjssapp03 kernel: SM: 03000002 sm_stop: SG still joined
May 22 11:35:25 melqsjssapp03 clurgmgrd[5179]: <warning> #67: Shutting down uncleanly
May 22 11:35:25 melqsjssapp03 ccsd[4630]: Cluster manager shutdown.  Attemping to reconnect...
May 22 11:35:51 melqsjssapp03 ccsd[4630]: Unable to connect to cluster infrastructure after 30 seconds.
May 22 11:36:21 melqsjssapp03 ccsd[4630]: Unable to connect to cluster infrastructure after 60 seconds.
 
Thanks,
 
Robert Gil
Linux Systems Administrator
American Home Mortgage
Phone: 631-622-8410
Cell: 631-827-5775
Fax: 516-495-5861
 

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]