[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

hard locks / high memory



I have three boxes, two problems (one serious problem partially resolved, one question), all three boxes are running completely up2date, 2.4.18-17.7 kernels.

The first problem manifests itself when the enigma-boxes lock up completely. Not allowing any keyboard nor mouse input, not responding to ping, nor any tcp requests. User interaction at the time of lock-up would be anything from vi'ing a file, to browsing the web. It took forever to track down the problem since the boxes would crash in the user's office, but when I tried to replicate the behavior after moving the box to my office, it behaved like a dream. While it seemed to be completely random, and viewing the system-logs did not lend much to diagnosing the problem, I think I may have tracked it down. Both machines were connected to a 10 megabit hub (different hubs, different offices, different segments of the network), one machine had a win2k box on the hub, the other has a two Sparc, SunOS 5.8 and SunOS 5.1 boxes on its hub. Sometimes data would move through eth0 fine, other times the machine would lock. A snipet from one of the kernel logs shows:

Nov 4 16:42:14 sundown kernel: nfs: server marta OK
Nov 4 16:42:14 sundown last message repeated 3 times
Nov 4 17:09:54 sundown kernel: eepro100: wait_for_cmd_done timeout!
Nov 4 17:10:00 sundown last message repeated 16 times
Nov 4 17:10:04 sundown kernel: nfs: server marta not responding, still trying
Nov 4 17:10:04 sundown kernel: eepro100: wait_for_cmd_done timeout!
Nov 4 17:10:04 sundown kernel: nfs: server marta not responding, still trying
Nov 4 17:10:04 sundown kernel: eepro100: wait_for_cmd_done timeout!
Nov 4 17:10:04 sundown kernel: nfs: server marta not responding, still trying
Nov 4 17:10:04 sundown kernel: eepro100: wait_for_cmd_done timeout!
Nov 4 17:10:04 sundown kernel: nfs: server marta not responding, still trying
Nov 4 17:10:04 sundown kernel: eepro100: wait_for_cmd_done timeout!
Nov 4 17:10:40 sundown last message repeated 25 times
Nov 4 17:11:12 sundown last message repeated 13 times
Nov 4 17:11:14 sundown kernel: NETDEV WATCHDOG: eth0: transmit timed out
Nov 4 17:11:14 sundown kernel: eth0: Transmit timed out: status 0050 0cf0 at 17683/17743 command 000c0000.
Nov 4 17:11:23 sundown kernel: nfs: server marta OK


Shortly after this, the machine was locked hard. The other box would have a similar entry showing smb activity shortly before a lock-up. I moved the 7.2 and Sparc boxes to a 100 megabit hub and the other 7.2 box off its hub directly to the 100 megabit network and everything seems to be ok. While the hardware in the boxes is completely different from motherboard to the mouse, they do share one common factor. An lsmod shows they are using the Intel eepro100 kernel module. With nothing left to blame, that is my supposition. Has anyone else noticed similar behavior in this or other versions of RH or other distros? A fairly extensive google did not show anything remotely similar to my problem, but with the machines being up for over 16 hours under high network and CPU load with no problems, I am fairly confident in this diagnosis. Questions/comments/other suppositions are welcome...

The next "problem" is more of a question. Has anyone successfully recompiled a kernel with high-mem support? I am looking to find a way to exceed the 2GB user-space limit imposed by default kernels. Googles have shown instances where 8GB or more has been recognized by the system, but I have not found any instances where someone was able to malloc more than 2GB. Is this possible/worth the experiment or has anyone been successful?

Thanks,

Pete




-------------------------- Pete Huckelba

Stata Corporation
4905 Lakeway Drive
College Station, TX 77845
(979)696-4600





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]