Watchdog, and supporting programs


I have been playing with watchdog on my systems in the past few months. One Fedora 8 older Tyan board with hardware watchdog, and one new Fedora 9 Asus server board with hard ware watchdog. While using this in its normal configuration everything works fine, and stable. I have run into one problem with it. While over using my new server for I/O intensive activities, I launched the load to 27 (one minute load average) which triggered the default software watchdogs limit of 24 max 1 minute load average. It did what it was supposed to, and rebooted the system. Tweaking problems remain on this box and I need to set it up to live through a full reboot (it fails a check during boot and reboots the box half way through a box, which is a glaring personal error in configuration).

The other system I have is rather stable and has no problems with the watchdog and it runs as I require it. What I am after from the group with this is what repair style programs are being used, and examples of such, if anyone is using this. I am working on writing my own, but I am not sure where I want to go, and am looking for good idea's and gotcha's on this as well.

Thanks in advance,

