[Linux-cluster] Cluster Suite v3 software watchdog
Celso K. Webber
celso at webbertek.com.br
Wed Dec 21 18:50:58 UTC 2005
Hi Lon,
Thank you very much for your reply. I'll try your tips.
Now another question: is it really necessary to pass on the
"nmi_watchdog=1" parameter to the kernel? Or is it enabled by default
under RHELv3 ou v4?
Regards,
Celso.
Lon Hohberger escreveu:
>On Wed, 2005-12-21 at 16:25 -0200, Celso K. Webber wrote:
>
>
>
>>Does anyone has had this issue before? Or am I missing any step on
>>configuring the software watchdog feature?
>>
>>Another question for the Red Hat people on the list: does this "software
>>watchdog" works ok? I ask because it's enabled by default when you add a
>>new member to the cluster. The Cluster Suite v3 manual tells nothing
>>about this resource either.
>>
>>
>
>Yes, it works fine.
>
>A few things could be happening:
>
>(1) The NMI watchdog will reboot the machine if it detects an NMI hang.
>This is only a few seconds.
>
>(2) The cluster is extremely paranoid because you are not using a
>STONITH device (power controller), and it's detecting internal hangs.
>Try increasing the failover time.
>
>(3) The cluster is not getting scheduled due to system load. See the
>man page for cludb(8) about clumembd%rtp - both may help.
>
>
>-- Lon
>
>
More information about the Linux-cluster
mailing list