[Linux-cluster] RHEL 4.7 fenced fails -- stuck join state: S-2,2,1
Robert Hurst
rhurst at bidmc.harvard.edu
Tue Aug 11 14:55:48 UTC 2009
Simple 4-node cluster, 2-nodes have a GFS shared home directory mounted
for over a month. Today, I wanted to mount /home on a 3rd node, so:
# service fenced start [failed]
Weird. Checking /var/log/messages show:
Aug 11 10:19:06 cerberus kernel: Lock_Harness 2.6.9-80.9.el4_7.10 (built
Jan 22 2009 18:39:16) installed
Aug 11 10:19:06 cerberus kernel: GFS 2.6.9-80.9.el4_7.10 (built Jan 22
2009 18:39:32) installed
Aug 11 10:19:06 cerberus kernel: GFS: Trying to join cluster "lock_dlm",
"ccc_cluster47:home"
Aug 11 10:19:06 cerberus kernel: Lock_DLM (built Jan 22 2009 18:39:18)
installed
Aug 11 10:19:06 cerberus kernel: lock_dlm: fence domain not found; check
fenced
Aug 11 10:19:06 cerberus kernel: GFS: can't mount proto = lock_dlm,
table = ccc_cluster47:home, hostdata =
# cman_tool services
Service Name GID LID State
Code
Fence Domain: "default" 0 2 join
S-2,2,1
[]
So, a fenced process is now hung:
root 28302 0.0 0.0 3668 192 ? Ss 10:19 0:00 fenced -t
120 -w
Q: Any idea how to "recover" from this state, without rebooting?
The other two servers are unaffected by this (thankfully) and show
normal operations:
$ cman_tool services
Service Name GID LID State
Code
Fence Domain: "default" 2 2 run -
[1 12]
DLM Lock Space: "home" 5 5 run -
[1 12]
GFS Mount Group: "home" 6 6 run -
[1 12]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090811/7b46b120/attachment.htm>
More information about the Linux-cluster
mailing list