[Linux-cluster] clustat stuck
bergman at merctech.com
bergman at merctech.com
Fri Apr 1 20:42:34 UTC 2011
The pithy ruminations from frederic randriamora on Oct 29, 2010 4:30:03 pm entitled"RE: [Linux-cluster] clustat stuck" were:
==> Hi,
==>
==> I have a 4 node cluster, with multipathed qdisk on a san. The nodes are
==> running redhat 5.4.
I've got a 3 node cluster, with multipathed qdisk on a SAN. The nodes are
running CentOS 5.5:
Linux 2.6.18-194.32.1.el5 #1 SMP Wed Jan 5 17:52:25 EST 2011 x86_64 x86_64 x86_64 GNU/Linux
lvm2-cluster-2.02.56-7.el5_5.4
cman-2.0.115-34.el5_5.4
rgmanager-2.0.52-6.el5.centos.8
openais-0.80.6-16.el5_5.9
==>
==> After a minor change made in cluster.conf on node3 properly propagated
==> by ccs_tool update, clustat is no longer correctly responding in the
==> other 3 nodes.
In my case, I failed a service from node3 ==> node2, but made no cluster
configuration changes.
==> node3 is neither nodeid 1 nor qdisk master.
==>
==> clustat on node3 runs fine
Similar. On node2, clustat works fine.
==>
==> clustat on the other nodes
==>
==> either hangs with
==> connect(8, {sa_family=AF_FILE, path="/var/run/cluster/rgmanager.sk"...}, 110
==> from strace
==>
==>
==> or times out with
==> Timed out waiting for a response from Resource Group Manager
==> without displaying the still running services
==>
Exactly the same behavior here.
==> cman_tool services et al. are just fine everywhere,
==>
Agreed. The actual sevices are running on each node. The report from cman_tool
is correct, but querying the cluster with "clustat" or operations with
"cluscvadm" hang or timeout.
==> Although all the services are running fine, I cannot move/stop them
==> anymore with clusvcadm.
==>
==> How to get out of that situation?
Is there any solution to this issue?
Thanks,
Mark
More information about the Linux-cluster
mailing list