[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [K12OSN] Random logouts, inability to log in



On Fri, 2 Jan 2004, Les Mikesell wrote:
> On Fri, 2004-01-02 at 11:18, Julius Szelagiewicz wrote:
> > 	the value i was referring to is 'memory in use'. The system was
> > done by fresh install. I freely admit to being very weakly versed in
> > Linux, which all too often leads me to expect behavior similar to that of
> > HP-UX. I was spooked by the fact that 20 terminals could essentially max
> > out the 4GB server memory, i was even more spooked by the apparent *non*
> > realeasing of memory as proceses went away - going down to 8 users didn't
> > change memory usage display in top. Reading between the lines in your
> > mesage, i gather that memory is not added explicitly to the free pool, but
> >  just marked as unused?
>
> Actually it is still used, buffering valid contents of the previous
> activity but will be released and re-used as needed for something
> else.  All disk read/write activity and program loads are buffered
> in available memory so the slow disk activity doesn't have to be
> repeated in the likely event that something else wants to read the
> same thing.  In the also-likely event that something wants memory for
> a new use, the least-recently used buffers are invalidated and reused
> or if you are down to the miminum, something is pushed into swap.
> If things are working normally, you should see a general slowdown
> as you start to need swap activity but nothing should completely die
> until swap space is all consumed.
>
> > 	I agree that the lack of swap use doesn't suggest memory
> > starvation, but sudden logins accompanied by apparent data loss and or
> > corruption *feel* like it.
>
> Is there a difference between activity on the clients and server
> in this respect?  That is, if you log in on the console does anything
> ever go wrong?  If it really is memory the failures should be the
> same.  If it only happens on the clients you more likely have some
> kind of network problem that is either killing your X connection or
> failing as some program code tries to page in over NFS.
>
> > 	My big problem is that this is a production box in a middle of the
> > fight over the desktop. Managers and users want M$, I want Linux. For the
> > time being i'm making slow progress (at high political cost). What i can't
> > afford is the system that fails - I got them spoiled, the business system
> > stays up forever (3 years, 3 outages, 2 caused by vast power fluctuations
> > and power failures and 1 caused by a mysterious hardware problem - no
> > data loss).
> >
>
> Note that your business system can probably survive a lot of network
> issues without affecting data much.  The ltsp clients can't, although
> the server should retain everything successfully saved.
>
> > 	my previous experiences with upgrades of K12 (2.x to 3.0 to 3.1.0)
> > wer very positive, so i just went for it. now i'm thinking that maybe i
> > want to hear about a few reasonably large installs doing wel for a few
> > months first ;-)
>
> I've used the recent kernels enough to know that there is not a generic
> problem that causes random program crashes and others would have
> reported more problems if there were.  You have some specific issue
> with your hardware and/or network.  If updating the kernel triggered
> it, a likely suspect is the network card.  I had trouble recently
> with a 3com chip on an old Dell motherboard but blamed it more on age
> than the new driver.  Switching to an Intel PCI card fixed it
> regardless.
>
Les,
	ok, i am beginnig to get the difference between "used but not
really" and "free" memory. what has confused me is that all the unices i
played with returned memory from user processes to the free pool right
after log out.

	as to is the problem with the hardware or network, my answer is
unequivocal "maybe". the server worked with same wiring and switches and
clients without problems running v3.1.2, it has problems with 4.0.0. i
*have* seen hardware problem manifest themseves after upgrade before,
so, yes, it is possible, but at the same time it is highly unlikely. the
server in question is a very new gateway dual 2.4GHz xeon box with 4GB
memory and a megaraid controlled 6 disk (146GB each) array. array has
changed at the upgrade time - it was 4 disk, i added 2 and reconfigured
it. if the array is the problem, we'll find out soon enough.
	the plan is to load 3.1.2 from scratch, blow the backup on top of
it (saving the /etc/fstab) and let her rip on monday. if all goes well for
2 days, upgrade ltsp to ver 4 and see if anything goes bad. if all is ok,
upgrade mozilla and oo to current versions. if it takes 3 months, so be
it. the pain here is the loss of apt-get dist-upgrade.

	i have 4.0.0 running on my laptop and on 3 test servers - no
problems for 2 months now (started with beta)

	normally i wouldn't push fast with upgrades, bu here i want to
give users the best of what open software has to offer to drag them away
from the clutches of billg. you might say i went "one upgrade too far".
julius




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]