[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Cluster-devel] Manual override / manual fencing replacement



Hi,

I put a patch in to HEAD / RHEL4 / RHEL5 / STABLE branches that obviates
the need to configure fence_manual for use in clusters, and allows
manual override if fencing fails.

Here's how it works:
1. Try fencing as usual
2. If fencing *fails*, open a manual override socket and wait for user
input for a few seconds.
3. If we get no input, start the loop over...



Why does it obviate manual fencing?
* If you have no fencing configured, fencing immediately fails (this is
part of the designed-in behavior of fenced!), thereby activating the
possibility of using the manual override.  I.e. No fencing implies
"manual override only"!


How is this better than manual fencing?
* You do not have to configure fencing at all in order for this to work.
(Woooo! Simpler configurations rock!)

* This is a general manual override case which works with all types of
fencing, without hanging forever waiting for input.

* The fence devices, if configured, will be retried.  Previously, if you
used fence_manual as a backup, you *had* to use fence_ack_manual -- even
if the problem with the fence device was only suffering a temporary
problem (ex: a network fencing device that only allows one login at a
time).

* Since both methods require manual intervention, the net effect is
approximately the same in the "no-fencing" case.  In fact, I committed a
sample "fence_ack_manual" shell script replacement which works like the
original command (note: script is only in -HEAD branch).


Why did I do this?
* I think that you should not have to configure manual fencing in order
for it to work; it should the default behavior.

* I think this is a better and more general solution to a problem where
a fence device fails.  Currently, the only way to un-break a cluster
where fencing is permanently dead is to do something like
"mv /sbin/fence_foo /sbin/fence_foo.bak; cp /bin/true /sbin/fence_foo" -
and reverse the process after fencing completes.


Why did I bother to write this up?
* I want to remove the fence_manual and fence_ack_manual commands (from
the HEAD branch), and I want to replace fence_ack_manual with the shell
script that does the same thing with the patch.  If anyone has strong
opinions against this, please comment.


What other information is there?
* fence*manual will not go away in the RHEL4/RHEL5 branches.

* This should not impact your configuration -- even if you are a
fence_manual consumer.  If you are using RHEL4, STABLE, or RHEL5
branches, your configuration will still work.

* Even if you are using HEAD, the removal of fence_manual from your
system (but not your configuration) with this new feature will simply
cause fencing to immediately fail, activating the manual override.


Comments?  Can I nuke fence_manual and fence_ack_manual from the HEAD
trunk of CVS? :)

-- Lon


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]