[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

NMI Watchdog Timers (RH Cluster)



I'm testing a two-node cluster on two identical HP ProLiant DL560 servers
running RHEL AS 3.0.   Both servers have the software watchdog timers and
NMI watchdog timers enabled.   They are both connected to hardware power
switches as well.

I started installing the HP ProLiant support pack components today -->
http://h18023.www1.hp.com/support/files/server/us/locate/101_4765.html#0. 
 After I installed the hpasm utility (HP Server Management Drivers and
Agents for Red Hat Enterprise Linux 3) and tried to start it, the server
rebooted.    The server hung during system startup after the hpasm utility
tried to start because of the NMI watchdog timer.   This is the error....


starting hpasm:  cevt:   hp ProLiant Event Logging Driver (rev 7.1.0-CUSTOM)
casm: hp ProLiant Advanced Server Management driver (rev 7.1.0-CUSTOM)
casm: NMI Handler has been called on processor 3!
casm: NMI Handler has been called on processor 4!
casm: NMI Handler has been called on processor 6!
casm: NMI Handler has been called on processor 0!
casm: spinning for 2 seconds!
CRITICAL: casm: Unknown non-maskable interrupt (NMI error (0x7f) Hour 0 -
0/0/0)
eax: 00000001 ebx: 0000000 ecx: f7f60038 edx: f7f60000
f7f61ebc <2>casm: casm: NMI Handler has been called on processor 1!
casm: NMI Handler has been called on processor 4!



We have 4 CPUs, but it shows as 8 because of the Intel P4 hyper-threading
technology.    If I boot into runlevel 1, it boots fine, but once I start
the hpasm utility it hangs or reboots.   If I remove the "nmi_watchdog=1"
from the kernel line in /etc/grub.conf, then everything works fine
including hpasm.

Do I really need the software watchdog and NMI watchdog timers since I
already have the hardware power switches installed (WTI NPS-2)?

Thanks,
Chris




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]