[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: "Out of Memory: Killed process" errors on server running Oracle



On Wed, 2007-07-25 at 07:53 -0600, Eric Sisler wrote:
> > If you really think the kernel is not out of memory you could simply
> > turn the OOM killer off (sysctl vm.oom-kill=0).  I think this needs a
> > fairly recent for RHEL3 (I think around U6).  Of course, if the system
> > is actually out of memory then it will simply hang or panic rather than
> > kill a process and survive.
> 
> Been there, done that - got mixed results at best.

If you got mixed results with this, then that's even more of an
indicator that the system really is OOM.  Otherwise it would work.


> I might try this, but I don't understand why a lightly-loaded server
> with 16Gb of RAM is having this problem.  The total RAM configured for
> all VMs is 7Gb - less than half of what's available.  Obviously running
> VMs will generally use much less than they've been configured for.  I
> understand that low memory starvation might be the culprit, but what's
> frustrating is this has *never* been an issue on our older servers
> running RHEL 3.  Even servers where the total RAM configured for all VMs
> is much closer to the server's total RAM (say 3Gb out of 4Gb) never had
> this problem. Starting multiple VMs simultaneously slowed the server
> down considerably, which is to be expected, but no process were ever
> killed and the load average never got as high as I'm seeing with RHEL 4.
> What changed between the 2.4.x & 2.6.x kernels to cause this problem?

Well, do you run RHEL3 on servers with 16GB?  It's not fair to compare a
system with 16GB to a system with 4GB.  The kernel uses low memory to
track allocations of all memory thus system with 16GB of memory will use
significantly more low memory than a system with 4GB, perhaps as much as
4 times.  This extra pressure happens from the moment you turn the
system on before you do anything at all because the kernel structures
have to be sized for the potential of tracking allocations in four times
as much memory.

> Any idea which would be better - switching to 64-bit or running the
> hugemem kernel on 32-bit?

The hugemem kernel has some overhead, I would go with 64-bit personally,
but it's up to you.  Hugemem is certainly easier because it's just a
quick kernel change.

> Free pages:       15072kB (1600kB HighMem)
> Active:2230508 inactive:1839773 dirty:9223 writeback:0 unstable:0
> free:3768 slab:46213 mapped:890178 pagetables:3166
> DMA free:12544kB min:16kB low:32kB high:48kB active:0kB inactive:0kB
> present:16384kB pages_scanned:10691 all_unreclaimable? yes
> protections[]: 0 0 0
> Normal free:928kB min:928kB low:1856kB high:2784kB active:0kB
> inactive:522028kB present:901120kB pages_scanned:954558
> all_unreclaimable? yes
> protections[]: 0 0 0
> HighMem free:1600kB min:512kB low:1024kB high:1536kB active:8922032kB
> inactive:6837064kB present:16646144kB pages_scanned:0 all_unreclaimable?
> no

OK, so this shows normal zone starvation.  You have exactly 928kB free
out of 900MB in the normal zone and none of it is reclaimable.  That's
less than 1% free and is OOM as far as the kernel is concerned.  You
system has free memory in the HighMem zone, and reclaimable memory as
well (probably cache), but practically none in the "normal" zone.

You could try setting /proc/sys/vm/lower_zone_protection to a high
value, say 250 or even more.  This will cause the kernel to try to be
more aggressive in defending the normal zone from allocating memory that
could potentially be allocated in the high memory zone.

Later,
Tom



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]