[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]
[Linux-cluster] 3rd node mount hang and kicked out of cluster
- From: Daniel McNeil <daniel osdl org>
- To: linux-cluster <linux-cluster redhat com>
- Subject: [Linux-cluster] 3rd node mount hang and kicked out of cluster
- Date: Thu, 31 Mar 2005 11:28:30 -0800
My latest test run only made it 22 hours. It was starting
a test that mounts gfs on all 3 nodes. The first 2 nodes
mounted the gfs file system without any problem, but the 3rd
node's mount hung and it got kicked out of the cluster:
cl032 (3rd node):
CMAN: removing node cl030a from the cluster : Missed too many heartbeats
CMAN: removing node cl031a from the cluster : No response to messages
CMAN: quorum lost, blocking activity
[-- MARK -- Wed Mar 30 09:15:00 2005]
GFS: Trying to join cluster "lock_dlm", "gfs_cluster:stripefs"
cl030 (1st node):
CMAN: removing node cl032a from the cluster : Missed too many heartbeats
GFS: Trying to join cluster "lock_dlm", "gfs_cluster:stripefs"
SM: process_reply invalid id=6764 nodeid=4294967295
GFS: fsid=gfs_cluster:stripefs.0: Joined cluster. Now mounting FS...
GFS: fsid=gfs_cluster:stripefs.0: jid=0: Trying to acquire journal lock...
GFS: fsid=gfs_cluster:stripefs.0: jid=0: Looking at journal...
GFS: fsid=gfs_cluster:stripefs.0: jid=0: Done
GFS: fsid=gfs_cluster:stripefs.0: jid=1: Trying to acquire journal lock...
GFS: fsid=gfs_cluster:stripefs.0: jid=1: Looking at journal...
GFS: fsid=gfs_cluster:stripefs.0: jid=1: Done
GFS: fsid=gfs_cluster:stripefs.0: jid=2: Trying to acquire journal lock...
GFS: fsid=gfs_cluster:stripefs.0: jid=2: Looking at journal...
GFS: fsid=gfs_cluster:stripefs.0: jid=2: Done
GFS: fsid=gfs_cluster:stripefs.0: jid=3: Trying to acquire journal lock...
GFS: fsid=gfs_cluster:stripefs.0: jid=3: Looking at journal...
GFS: fsid=gfs_cluster:stripefs.0: jid=3: Done
SM: process_reply invalid id=6764 nodeid=4294967295
SM: process_reply invalid id=6765 nodeid=4294967295
SM: process_reply invalid id=6765 nodeid=4294967295
SM: process_reply invalid id=6765 nodeid=4294967295
SM: process_reply invalid id=6765 nodeid=4294967295
GFS: Trying to join cluster "lock_dlm", "gfs_cluster:stripefs"
SM: process_reply invalid id=5553 nodeid=4294967295
SM: process_reply invalid id=5553 nodeid=4294967295
...
cl031 (2nd node):
CMAN: node cl032a has been removed from the cluster : Missed too many heartbeatsSM: process_reply invalid id=6764 nodeid=4294967295
SM: process_reply invalid id=6764 nodeid=4294967295
SM: process_reply invalid id=6764 nodeid=4294967295
GFS: Trying to join cluster "lock_dlm", "gfs_cluster:stripefs"
SM: process_reply invalid id=6765 nodeid=4294967295
GFS: fsid=gfs_cluster:stripefs.1: Joined cluster. Now mounting FS...
GFS: fsid=gfs_cluster:stripefs.1: jid=1: Trying to acquire journal lock...
GFS: fsid=gfs_cluster:stripefs.1: jid=1: Looking at journal...
GFS: fsid=gfs_cluster:stripefs.1: jid=1: Done
SM: process_reply invalid id=6765 nodeid=4294967295
SM: process_reply invalid id=6765 nodeid=4294967295
SM: process_reply invalid id=5553 nodeid=4294967295
A whole lot more info is available here:
http://developer.osdl.org/daniel/GFS/test.29mar2005/
Any ideas on what happened?
Daniel
[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]