[Linux-cluster] post_fail_delay

Thu Jun 8 18:49:42 UTC 2006

On Tue, Jun 06, 2006 at 03:23:01PM +0200, riaan at obsidian.co.za wrote:

> I would like for an errant GFS node to be able to create network/disk
> dumps before being power fenced. Am I missing something, or is this
> leaving the errand node unfenced for any significant amount of time
> (enough to complete the dump, assuming it is upwards of a few seconds)
> just a bad idea?

No, adding a delay before fencing is just fine, it just prolongs the time
until other stuff can be recovered and used normally again.

> AFAIUnderstand, the whole idea of fencing is to prevent the node from
> damaging the file system in the first place, making the collection of
> dumps and power fencing fundamentally at odds with each other.

The only way the failed node is going to damage anything is if it happens
to write to the fs after its journal has been recovered.  That's why the
only requirement for fencing is that it happens prior to gfs journal
recovery.  If a failed node writes to the fs before journal recovery it's
no problem.

If you want a failed node to disk/net-dump, then set post_fail_delay to
some number of seconds just greater than the typical time a dump takes.

Dave