[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]
Re: How to diagnose lockup?
- From: Chris Kloiber <ckloiber redhat com>
- To: taroon-beta-list redhat com
- Subject: Re: How to diagnose lockup?
- Date: Sat, 30 Aug 2003 22:06:07 -0400
On Sat, 2003-08-30 at 20:43, Denis Hennessy wrote:
> On my test machine (dual-athlon), fresh install of beta 2 with all
> updates applied (-411 kernel), I'm getting frequent lockups which I've
> just figured out how to make happen on demand. The problem is: the
> machine needs a power cycle once it happens and there's no log files, or
> core dumps to diagnose the problem. I'd like to have a bit more
> information before filing a bug report.
>
> To make it happen, I start mozilla and type anything into the web
> address field. Within a few characters the machine will lock hard with
> no mouse, keyboard or network response. I have to power it off to recover.
>
> What should I do?
Try hooking up a serial console via a null-modem cable to another system
running a terminal application (minicom, hyperterminal) set for 115200
baud, no flow control and is logging to disk.
In the grub.conf, add to the end of the kernel line you are booting
(assuming you use ttyS0 for the null-modem connection):
console=ttyS0,115200 console=tty0
In /etc/sysctl.conf, change the existing line to read:
kernel.sysrq = 1
Reboot to activate the changes made to grub.conf. If you see the kernel
boot messages on the remote console, you have it working and can proceed
to crash the system. With luck, an Oops or kernel panic will be logged
to the remote system that you can attach to a bugzilla report. Sometimes
enabling nmi_watchdog=1 (or nmi_watchdog= 2) on the kernel boot line of
SMP machines may help force the kernel to oops on demand for diagnostic
purposes. Whether or not you see the Oops or panic message, try
activating the "magic" sysrq combinations to try to get some more
information from the machine.
<ALT><SysRq><T> - Dumps kernel stack of each process.
<ALT><SysRq><P> - Shows which process is running on each processor.
(Press this many times, as it randomly picks a
processor to report on. You should do this at
least 2-3 times the number of processors you have)
<ALT><SysRq><M> - Prints the kernel memory summary. Do this last, as
it can make the kernel lock up even harder if you
can imagine it.
Once you have this information, reboot and add to your report the output
of the following commands:
# sysreport (Install the 'sysreport' rpm if necessary)
# lspci -vv
# lsmod
# cat /proc/meminfo
# cat /proc/cpuinfo
The bug owner may ask you for additional information as well, but this
is a decent start.
A possible alternative to the serial console may be to set up a netdump
client and server, however netdump and nmi_watchdog are currently
mutually exclusive, and netdump reboots the sick machine, so sysrq
commands can be harder to capture.
--
Chris Kloiber
Red Hat, Inc.
[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]