[Linux-cluster] Cluster Suite v3 software watchdog

Celso K. Webber celso at webbertek.com.br
Wed Dec 21 18:50:58 UTC 2005


Hi Lon,

Thank you very much for your reply. I'll try your tips.

Now another question: is it really necessary to pass on the 
"nmi_watchdog=1" parameter to the kernel? Or is it enabled by default 
under RHELv3 ou v4?

Regards,

Celso.

Lon Hohberger escreveu:

>On Wed, 2005-12-21 at 16:25 -0200, Celso K. Webber wrote:
>
>  
>
>>Does anyone has had this issue before? Or am I missing any step on 
>>configuring the software watchdog feature?
>>
>>Another question for the Red Hat people on the list: does this "software 
>>watchdog" works ok? I ask because it's enabled by default when you add a 
>>new member to the cluster. The Cluster Suite v3 manual tells nothing 
>>about this resource either.
>>    
>>
>
>Yes, it works fine.
>
>A few things could be happening:
>
>(1) The NMI watchdog will reboot the machine if it detects an NMI hang.
>This is only a few seconds.
>
>(2) The cluster is extremely paranoid because you are not using a
>STONITH device (power controller), and it's detecting internal hangs.
>Try increasing the failover time.
>
>(3) The cluster is not getting scheduled due to system load.  See the
>man page for cludb(8) about clumembd%rtp - both may help.
>
>
>-- Lon
>  
>




More information about the Linux-cluster mailing list