[Linux-cluster] Fencing Methods

Mon Mar 20 20:51:51 UTC 2006

On Mon, Mar 20, 2006 at 03:32:17PM -0500, Hendershot, Zach wrote:
> That's a good point, I wasn't thinking. Oracle and (I assume) Veritas do
> this by relying on a kernel thread that writes out timestamps and if it
> doesn't write an expected timestamp (and other nodes see it as dead) it
> panics itself to self-fence.

That still doesn't work.  The node can easily wake up and write before it
panics.

> How does RHCS decide if a node is dead? I was under the understanding
> that if the other nodes don't receive a heartbeat from the node for a
> timeout period they execute the fence command on the node. 

One of the remaining nodes in the cluster fences the node who has been
declared dead by the cluster manager.  Fencing the dead node does not
involve running anything on the dead node; it just amounts to turning its
power off or something.

> I'm interested why that choice was made, was it a technical problem with
> the above method or a design decision? Have a good one.

Self-fencing is simply not correct and will lead to fs corruption.
That's not been acceptable to us.

Dave