[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]
Re: hard locks / high memory
- From: Keith Mastin <kmastin beechtree ca>
- To: enigma-list redhat com
- Subject: Re: hard locks / high memory
- Date: Tue, 5 Nov 2002 14:30:36 -0500 (EST)
>I have three boxes, two problems (one serious problem partially resolved,
>one question), all three boxes are running completely up2date, 2.4.18-17.7
>kernels.
>
>The first problem manifests itself when the enigma-boxes lock up
>completely. Not allowing any keyboard nor mouse input, not responding to
>ping, nor any tcp requests. User interaction at the time of lock-up would
>be anything from vi'ing a file, to browsing the web. It took forever to
>track down the problem since the boxes would crash in the user's office,
>but when I tried to replicate the behavior after moving the box to my
>office, it behaved like a dream. While it seemed to be completely random,
>and viewing the system-logs did not lend much to diagnosing the problem, I
>think I may have tracked it down. Both machines were connected to a 10
>megabit hub (different hubs, different offices, different segments of the
>network), one machine had a win2k box on the hub, the other has a two
>Sparc, SunOS 5.8 and SunOS 5.1 boxes on its hub. Sometimes data would move
>through eth0 fine, other times the machine would lock. A snipet from one of
>the kernel logs shows:
>
>Nov 4 16:42:14 sundown kernel: nfs: server marta OK
>Nov 4 16:42:14 sundown last message repeated 3 times
>Nov 4 17:09:54 sundown kernel: eepro100: wait_for_cmd_done timeout!
>Nov 4 17:10:00 sundown last message repeated 16 times
>Nov 4 17:10:04 sundown kernel: nfs: server marta not responding, still trying
>Nov 4 17:10:04 sundown kernel: eepro100: wait_for_cmd_done timeout!
>Nov 4 17:10:04 sundown kernel: nfs: server marta not responding, still trying
>Nov 4 17:10:04 sundown kernel: eepro100: wait_for_cmd_done timeout!
>Nov 4 17:10:04 sundown kernel: nfs: server marta not responding, still trying
>Nov 4 17:10:04 sundown kernel: eepro100: wait_for_cmd_done timeout!
>Nov 4 17:10:04 sundown kernel: nfs: server marta not responding, still trying
>Nov 4 17:10:04 sundown kernel: eepro100: wait_for_cmd_done timeout!
>Nov 4 17:10:40 sundown last message repeated 25 times
>Nov 4 17:11:12 sundown last message repeated 13 times
>Nov 4 17:11:14 sundown kernel: NETDEV WATCHDOG: eth0: transmit timed out
>Nov 4 17:11:14 sundown kernel: eth0: Transmit timed out: status 0050 0cf0
>at 17683/17743 command 000c0000.
>Nov 4 17:11:23 sundown kernel: nfs: server marta OK
>
>Shortly after this, the machine was locked hard. The other box would have a
>similar entry showing smb activity shortly before a lock-up. I moved the
>7.2 and Sparc boxes to a 100 megabit hub and the other 7.2 box off its hub
>directly to the 100 megabit network and everything seems to be ok. While
>the hardware in the boxes is completely different from motherboard to the
>mouse, they do share one common factor. An lsmod shows they are using the
>Intel eepro100 kernel module. With nothing left to blame, that is my
>supposition. Has anyone else noticed similar behavior in this or other
>versions of RH or other distros? A fairly extensive google did not show
>anything remotely similar to my problem, but with the machines being up for
>over 16 hours under high network and CPU load with no problems, I am fairly
>confident in this diagnosis. Questions/comments/other suppositions are
>welcome...
Unfortunately, nfs is famous for this behavior. I use eepro100's on a few
different machines where I need multihomed hosts, and have not had any
problems with them in any *nix environment. NFS, OTOH, is a different
animal. Turn up your logging to debug, and try making a few more
connections. Pay attention to the network traffic while doing so, as the
hubs might also be contributing factors. Might be worth your while
considering moving the network to switches.
Good luck
[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]