[K12OSN] Server Help! (a little desperate)


Things have been going great this year, our entire district is using thin clients. Here's a very brief breakdown of how things are running:

1 Server handles DNS, TFTP, DHCP, NIS
1 Server handles NFS (/home), SMB
1 Server handles LTSP (running 4.0.1, but the TFTP and DHCP are farmed out to the other server)

For some reason, I've had 2 major "glitches" this year.

Last week, eth0 (where clients connect) just quit responding. The server appeared fine, but was not pingable. After a brief panic, I just ran ifdown eth0, and ifup eth0 -- and I've had no problems until today. They started right after I left for lunch, of course.

Today, the LTSP server quit responding altogether. When going to the console, I couldn't even get THAT to come up. I power cycled the machine, and everything has come up just peachy -- BUT I'm very worried now.

I'm getting some "I told you so's" from the staff, who accused me that putting all my eggs in one basket was a bad idea, and with linux you get what you pay for, etc, etc, etc...

My question? Where do I start looking for some problems? I've read just about every bit of text in /var/log -- and nothing looks fishy. At 13:00, messages just stopped being written to /var/log/messages. There were no odd entries before it stopped.

Are there other logs I should be checking? Perhaps after school today, I'll take the server down and run memtest... Especially during this first year, I need close to 100% uptime, and I've had bad luck so far.

