[Linux-cluster] OOM issues with GFS, NFS, Samba on RHEL3-AS cluster
Jonathan Woytek
woytek+ at cmu.edu
Sun Jan 23 23:12:18 UTC 2005
Sorry about the duplicate message--I had sent this when I had a mistake
in my email address. When I fixed it, this message apparently went
through to the list.
jonathan
Jonathan Woytek wrote:
> Hello. I've tried to read-up on the lists here to see what I can find
> about these sorts of issues, but the information appears to be somewhat
> sparse.
>
> Here's my situation: I have a two-member cluster built on RHEL 3 AS
> (with all current updates installed). That means kernel
> 2.4.21-27.0.2.EL with GFS (6.0.2-25) and cluster services (1.2.16-1)
> built from SRPMS distributed by RedHat. My storage is iSCSI-based over
> gigabit ethernet. Hardware are Dell PowerEdge 1860's with 4GB of RAM
> and dual 2.4GHz processors.
>
> My problem is that the node serving disk via NFS and Samba gets into a
> strange mode where it starts to get kernel-based out-of-memory errors,
> which start to kill things off. The machine reboots itself and comes
> back up with no issues. In the process, of course, it wreaks havoc with
> lock_gulmd and a host of other things, and makes a bunch of users upset
> (it probably didn't help that we've been dealing with unstable storage
> here for a while, and I put this system together with the idea that it
> would be more reliable).
>
> I plan on trying to add a third node, which would fix the lock_gulmd
> craziness. That's not my big problem, though. I NEED to figure out why
> this is happening. My analysis so far seems to indicate that the
> crashes are caused mostly when there are a lot of files open (or at
> least a lot of disk activity). The failures seem to occur most often
> when people are accessing data (on GFS) from the server over an NFS
> mount to another machine, but they also seem to occur if the machine has
> seen a day's worth of that sort of usage and the backup system tries to
> get its nightly backup between 11PM and 2AM. When memory starts to get
> low, kswapd shows up and starts eating serious cycles, along with the
> nfsd's. I've tried increasing the number of nfsd's, but that didn't
> seem to have an effect.
>
> Any ideas on things I should be checking? Interestingly enough, no swap
> seems to be used when this happens. The load average normally creeps up
> right before death, and the machine gets down to less than 18MB free
> (though a lot the 4GB is tied up in cache).
>
> jonathan
> --
> Jonathan Woytek w: 412-681-3463 woytek+ at cmu.edu
> NREC Computing Manager c: 412-401-1627 KB3HOZ
> PGP Key available upon request
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster
--
Jonathan Woytek w: 412-681-3463 woytek+ at cmu.edu
NREC Computing Manager c: 412-401-1627 KB3HOZ
PGP Key available upon request
More information about the Linux-cluster
mailing list