[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

RE: NMI Watchdog Timers (RH Cluster)



The boot option nmi_watchdog=1 is not supported with the hpasm 7.1.0
package and below. The reason for this is that we register our own NMI
handler that sees any NMI as a fatal condition and thus will halt the
processor(s).

-----Original Message-----
From: taroon-list-bounces redhat com
[mailto:taroon-list-bounces redhat com] On Behalf Of Chris Purcell
Sent: Monday, June 28, 2004 3:08 PM
To: taroon-list redhat com
Subject: NMI Watchdog Timers (RH Cluster)

I'm testing a two-node cluster on two identical HP ProLiant DL560
servers
running RHEL AS 3.0.   Both servers have the software watchdog timers
and
NMI watchdog timers enabled.   They are both connected to hardware power
switches as well.

I started installing the HP ProLiant support pack components today -->
http://h18023.www1.hp.com/support/files/server/us/locate/101_4765.html#0
. 
 After I installed the hpasm utility (HP Server Management Drivers and
Agents for Red Hat Enterprise Linux 3) and tried to start it, the server
rebooted.    The server hung during system startup after the hpasm
utility
tried to start because of the NMI watchdog timer.   This is the
error....


starting hpasm:  cevt:   hp ProLiant Event Logging Driver (rev
7.1.0-CUSTOM)
casm: hp ProLiant Advanced Server Management driver (rev 7.1.0-CUSTOM)
casm: NMI Handler has been called on processor 3!
casm: NMI Handler has been called on processor 4!
casm: NMI Handler has been called on processor 6!
casm: NMI Handler has been called on processor 0!
casm: spinning for 2 seconds!
CRITICAL: casm: Unknown non-maskable interrupt (NMI error (0x7f) Hour 0
-
0/0/0)
eax: 00000001 ebx: 0000000 ecx: f7f60038 edx: f7f60000
f7f61ebc <2>casm: casm: NMI Handler has been called on processor 1!
casm: NMI Handler has been called on processor 4!



We have 4 CPUs, but it shows as 8 because of the Intel P4
hyper-threading
technology.    If I boot into runlevel 1, it boots fine, but once I
start
the hpasm utility it hangs or reboots.   If I remove the
"nmi_watchdog=1"
from the kernel line in /etc/grub.conf, then everything works fine
including hpasm.

Do I really need the software watchdog and NMI watchdog timers since I
already have the hardware power switches installed (WTI NPS-2)?

Thanks,
Chris



--
Taroon-list mailing list
Taroon-list redhat com
http://www.redhat.com/mailman/listinfo/taroon-list



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]