[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] RHEL 4.7 fenced fails -- stuck join state: S-2,2,1



Simple 4-node cluster, 2-nodes have a GFS shared home directory mounted for over a month.  Today, I wanted to mount /home on a 3rd node, so:

# service fenced start                [failed]

Weird.  Checking /var/log/messages show:

Aug 11 10:19:06 cerberus kernel: Lock_Harness 2.6.9-80.9.el4_7.10 (built Jan 22 2009 18:39:16) installed
Aug 11 10:19:06 cerberus kernel: GFS 2.6.9-80.9.el4_7.10 (built Jan 22 2009 18:39:32) installed
Aug 11 10:19:06 cerberus kernel: GFS: Trying to join cluster "lock_dlm", "ccc_cluster47:home"
Aug 11 10:19:06 cerberus kernel: Lock_DLM (built Jan 22 2009 18:39:18) installed
Aug 11 10:19:06 cerberus kernel: lock_dlm: fence domain not found; check fenced
Aug 11 10:19:06 cerberus kernel: GFS: can't mount proto = lock_dlm, table = ccc_cluster47:home, hostdata =

# cman_tool services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           0   2 join      S-2,2,1
[]

So, a fenced process is now hung:

root     28302  0.0  0.0  3668  192 ?        Ss   10:19   0:00 fenced -t 120 -w

Q: Any idea how to "recover" from this state, without rebooting?

The other two servers are unaffected by this (thankfully) and show normal operations:

$ cman_tool services

Service          Name                              GID LID State     Code
Fence Domain:    "default"                           2   2 run       -
[1 12]

DLM Lock Space:  "home"                              5   5 run       -
[1 12]

GFS Mount Group: "home"                              6   6 run       -
[1 12]


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]