[Linux-cluster] CS4 Update 2 / problem with systems dump ?

Alain Moulle Alain.Moulle at bull.net
Wed Mar 22 07:59:17 UTC 2006


On Tue, Mar 21, 2006 at 06:02:22PM +0100, Alain Moulle wrote:
>> >> How do you manage the problem of dumping machines ?
>> >>
>> >> I mean, if in a HA pair, the 1st node is crashing
>> >> and proceeding to dump, the CS4 on peer node will
>> >> automatically power off the first one, and therefore
>> >> interrupt the dump process.
>> >>
>> >> Does that mean that we never can have dump when
>> >> the nodes are "under" CS4 ?
>> >> Or is there a way to manage this point ?

>> You might set a fencing delay that would allow the dump to complete, e.g.
>>   <fence_daemon post_fail_delay="10">
>>   </fence_daemon>
>> Dave
OK but does that mean that one we have patched this, the peer node will
wait in all cases this delay before fencing the node with problem, even
if this node is not dumping , right ?
So, the workaround that you propose is to be used only this way :
1. a node has crashed and was about to dump but has been fenced.
2. patch the post_fail_delay
3. re-start CS4 on both nodes
4. wait for a new crash and dump, and in this case, the failover
   will take at least the post_fail_delay value.

Am I right ?

Thanks
Alain




More information about the Linux-cluster mailing list