[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] manual fencing not working in RHEL4 branch



On Tue, Nov 29, 2005 at 07:53:09PM -0700, busy admin wrote:
> Here's a quick summary of what I've done and the results... to
> simplify the config I've just been running ccsd and cman via init
> scripts during boot and then manual executing 'fenced' or 'fence_tool'
> or the fenced init script. The results I see are random success's and
> failures!
> 
> Initial test - reboot both systems and then, on both, executed 'fenced
> -D' both systems joined the cluster and it was quorate. Rebooted one
> node and to my surprise manual fencing worked, meaning
> /tmp/fence_manual.fifo was created and I had to run 'fence_ack_manual'
> on the other node. Tried again when the first node came back up and
> again everything worked as expected.
> 
> Additional testing - reboot both system and then, on both, executed
> 'fence_tool join -w', both systems joined the cluster and it was
> quorate. Rebooted one node and no fencing was done (nothing logged in
> /var/log/messages).
> 
> rebooted both systems again and this time executed 'fenced -D' on both
> nodes... rebooted a node and fencing worked, was logged in
> /var/log/messages and I had to manual run 'fence_ack_manual -n x64-5'.
> when that node came back up again I again manually executed 'fenced
> -D' on it and the cluster was quorate. I then rebooted the other node
> and again fencing worked!
> 
> so again I rebooted both nodes and executed 'fence_tool join -w' on
> each... I again rebooted a node and fencing worked this time. fenced
> msgs were logged to /var/log/messages, /tmp/fence_manual.fifo was
> created and I had to execute 'fence_ack_manual -n x64-4' to recover.
> 
> ... more testing w/mixed results ...
> 
> modified fenced init script to execute 'fenced -D &' instead of
> 'fence_tool join -w' and used chkconfig to turn it on on both systems
> and rebooted them. both system restarted and joined the cluster. once
> again I rebooted one node (x64-4) and fencing didn't work... nothing
> was logged in /var/log/messages from fenced. see corresponding
> /var/log/messages, fenced -D output and cluster.conf below.

It's not clear what you're trying to test or what you expect to happen.
Here's the optimal way to start up a cluster from a newly rebooted state:

1. nodeA: ccsd
2. nodeB: ccsd
3. nodeA: cman_tool join -w
4. nodeB: cman_tool join -w
5. nodeA: fence_tool join
6. nodeB: fence_tool join

It's best if steps 5 & 6 only happen after both nodes are members of
the cluster (see 'cman_tool nodes').  If this is the case, then no
nodes should be fenced when starting up.

If you use the init scripts you may loose a little control and certainty
about what happens when, so I'd suggest using the commands directly until
you know that things are running correctly, then try the init scripts.

If, from the state above, nodeB fails, then nodeA should always fence
nodeB.  With manual fencing, this means that a message should appear in
nodeA's /var/log/messages telling you to reboot nodeB and run
fence_ack_manual.  If, by chance, nodeB reboots and rejoins the cluster
before you get to running fence_ack_manual, the fencing system on nodeA
will just complete the fencing operation itself and you don't need to run
fence_ack_manual (and if you try, the fence_ack_manual command will report
an error.)

Dave


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]