Re: [Linux-cluster] Post-Join and Post-Fail Delay


On Tue, Mar 23, 2010 at 12:42:08PM -0300,
Cris <cryptogrid gmail com> wrote:
> Does anyone have any recommendation or experience to set Post-Join
> Delay and Post-Fail Delay. Default values are 3 and 0, but in the
> documentation they mention that these default values might be too
> short. Can someone explain why?
> Post-Fail Delay=0, means that the node is fenced immediately after a
> fail. Is it ok, or is recommended to wait some seconds to fence a
> failed node?

post-fail-delay defines the number of seconds to wait until a failed
node will be fenced. I usually set this about 20-30 seconds depending
on the cluster. So the failed node has the chance to rejoin within
20-30 seconds. If it is not able to rejoint within this time, it will 
be fenced.

post-join-delay ist used during startup, to give all nodes the chance
to successfully joint the cluster, before being fenced. I usually set
this, to 30 seconds. It can be set to -1, as well which tells fenced
to wait forever.

You can completely avoid fencing during startup by setting clean_start
to 1, as well.


