[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]
Re: hard locks / high memory
- From: "Oisin C. Feeley" <ofeeley yahoo com>
- To: enigma-list redhat com
- Subject: Re: hard locks / high memory
- Date: Tue, 5 Nov 2002 10:56:30 -0800 (PST)
On Tue, 5 Nov 2002, Pete Huckelba wrote:
> I have three boxes, two problems (one serious problem partially resolved,
> one question), all three boxes are running completely up2date, 2.4.18-17.7
> kernels.
>
> The first problem manifests itself when the enigma-boxes lock up
> completely. Not allowing any keyboard nor mouse input, not responding to
> ping, nor any tcp requests. User interaction at the time of lock-up would
> be anything from vi'ing a file, to browsing the web. It took forever to
> track down the problem since the boxes would crash in the user's office,
> but when I tried to replicate the behavior after moving the box to my
> office, it behaved like a dream. While it seemed to be completely random,
> and viewing the system-logs did not lend much to diagnosing the problem, I
> think I may have tracked it down. Both machines were connected to a 10
> megabit hub (different hubs, different offices, different segments of the
> network), one machine had a win2k box on the hub, the other has a two
> Sparc, SunOS 5.8 and SunOS 5.1 boxes on its hub. Sometimes data would move
> through eth0 fine, other times the machine would lock. A snipet from one of
> the kernel logs shows:
>
> Nov 4 16:42:14 sundown kernel: nfs: server marta OK
> Nov 4 16:42:14 sundown last message repeated 3 times
> Nov 4 17:09:54 sundown kernel: eepro100: wait_for_cmd_done timeout!
> Nov 4 17:10:00 sundown last message repeated 16 times
> Nov 4 17:10:04 sundown kernel: nfs: server marta not responding, still trying
> Nov 4 17:10:04 sundown kernel: eepro100: wait_for_cmd_done timeout!
> Nov 4 17:10:04 sundown kernel: nfs: server marta not responding, still trying
> Nov 4 17:10:04 sundown kernel: eepro100: wait_for_cmd_done timeout!
> Nov 4 17:10:04 sundown kernel: nfs: server marta not responding, still trying
> Nov 4 17:10:04 sundown kernel: eepro100: wait_for_cmd_done timeout!
> Nov 4 17:10:04 sundown kernel: nfs: server marta not responding, still trying
> Nov 4 17:10:04 sundown kernel: eepro100: wait_for_cmd_done timeout!
> Nov 4 17:10:40 sundown last message repeated 25 times
> Nov 4 17:11:12 sundown last message repeated 13 times
> Nov 4 17:11:14 sundown kernel: NETDEV WATCHDOG: eth0: transmit timed out
> Nov 4 17:11:14 sundown kernel: eth0: Transmit timed out: status 0050 0cf0
> at 17683/17743 command 000c0000.
> Nov 4 17:11:23 sundown kernel: nfs: server marta OK
>
> Shortly after this, the machine was locked hard. The other box would have a
> similar entry showing smb activity shortly before a lock-up. I moved the
> 7.2 and Sparc boxes to a 100 megabit hub and the other 7.2 box off its hub
> directly to the 100 megabit network and everything seems to be ok. While
> the hardware in the boxes is completely different from motherboard to the
> mouse, they do share one common factor. An lsmod shows they are using the
> Intel eepro100 kernel module. With nothing left to blame, that is my
> supposition. Has anyone else noticed similar behavior in this or other
> versions of RH or other distros? A fairly extensive google did not show
> anything remotely similar to my problem, but with the machines being up for
> over 16 hours under high network and CPU load with no problems, I am fairly
> confident in this diagnosis. Questions/comments/other suppositions are
> welcome...
>
<snip>
Hi Pete,
I had the exact same log messages appearing on a laptop with an Intel
82081BA/BAM/CA/CAM ethernet controller which reports itself with
eepro100-diag as an i82555. I was not getting system hangs from it
though, just downloads would stall and the netowrk become unreachable. I
resolved the problem by first getting Intel's own e100 driver from their
website and then the eepro100-diag program from Donald Becker's site
(www.scyld.org) and finding out that "sleep mode" was actually set on the
chip. See Donald Becker's message
http://www.tux.org/hypermail/linux-eepro100/2001-Dec/0022.html
To diagnose run: eepro100-diag -e -f
To fix run:
eepro100-diag -G -w -w -w -f
it then reported that it was writing 49a0 to configuration word 10 and
repeating the diagnostic delivered a correct checksum. It seems to work
find now. So, perhaps you could try the diagnostic first to see if sleep
mode is set. Then try to set the correct flags, then see if you're still
getting the 'wait_for_cmd_done_timeout'. If not, and you're still getting
system hangs then it's some other issue.
HTH
Oisin Feeley
[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]