SMP Kernel Crash

Wade Hampton wade.hampton at nsc1.net
Mon Jan 12 15:17:00 UTC 2004


I just experienced a kernel crash on an SMP machine with
FC 1.0.  Prior to the lockup, this machine had been up
over 25 days without problems and with a moderate load
(moved many GB of data to/from NFS and to/from SMB). 
System info:

  Kernel 2.4.20-1115 (FC 1.0 stock kernel)
  Supermicro dual XEON 2.2, hyperthread not enabled in BIOS
  Soft raid (dual 120G HD)
  Dual 1G ethernet
     - one had several NFS and SMB mounted partitions (read and write)
     - one has an NFS partition (Solaris 7 server)
  2G RAM

The machine had both NFS and SMB mounts, but the NFS
server was down at the time (cable removed).  Also, I did
df as a user and left it up yesterday. 

This morning, the machine was locked up and would only
respond to pings.  I could not login, hence had to hard reboot.

/var/log/messags reported:

  ... smb_request:  result -104, setting invalid
  ... smb_retry:  successful, new pid=9141, generation =2

This was repeated every hour, with generation 3, 4, 5, then 6.
That was the last message in /var/log/messages.

I found two threads on kernel lockups but from the info,
this is still a problem (last messages dated 1/8). 

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=109497
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=113148

Note:  I am loading the latest kernel and will retry, but I
really need a STABLE box....

Questions:

1)  Should I move to RH Enterprise?

2)  Should I use a stock 2.4.24 kernel (all I need is basic stuff:
     soft RAID, e1000, NFS, SAMBA, CD-ROM)?

3)  Do you think that the latest kernel will fix it?

4)  Any help on how to test this (e.g., Stress?)?

Cheers,
--
Wade Hampton





More information about the fedora-list mailing list