[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: "Out of Memory: Killed process" errors on server running Oracle



Hi Tom,

First off, thanks for all your thoughts & ideas.

> I think the KB article in question references Fedora Core 3 and
> basically says FC3 is not supported so your on your own.

Yes it does, but the symptoms seem to be the same.

> If you really think the kernel is not out of memory you could simply
> turn the OOM killer off (sysctl vm.oom-kill=0).  I think this needs a
> fairly recent for RHEL3 (I think around U6).  Of course, if the system
> is actually out of memory then it will simply hang or panic rather than
> kill a process and survive.

Been there, done that - got mixed results at best.

> You could also try playing with overcommit_memory and overcommit_ratio
> settings.  By default the Linux kernel will allow memory overcommit
> which will actually allow you to start more VM's than you could ever
> really support.  If you've started more VM's than your memory will
> support they will still start up because of overcommit, but as the
> individual VM's actually use their pages they will hit the real limit
> and be killed.

I might try this, but I don't understand why a lightly-loaded server
with 16Gb of RAM is having this problem.  The total RAM configured for
all VMs is 7Gb - less than half of what's available.  Obviously running
VMs will generally use much less than they've been configured for.  I
understand that low memory starvation might be the culprit, but what's
frustrating is this has *never* been an issue on our older servers
running RHEL 3.  Even servers where the total RAM configured for all VMs
is much closer to the server's total RAM (say 3Gb out of 4Gb) never had
this problem. Starting multiple VMs simultaneously slowed the server
down considerably, which is to be expected, but no process were ever
killed and the load average never got as high as I'm seeing with RHEL 4.
What changed between the 2.4.x & 2.6.x kernels to cause this problem?

> Using the overcommit_memory and overcommit_ratio settings you can
> control how much memory the kernel is willing to overcommit, this might
> keep you VM's from even starting if you have allocated more memory than
> the system has.
> 
> Also, keep in mind that the OOM killer has to monitor memory in multiple
> zones and just because you have 4GB of free memory on your system does
> not mean that there is free memory in the low memory zone (this is
> normally less of on issue on 64-bit).
>
> If you are running 32-bit and seeing low-memory starvation (which can
> lead to OOM killer killing processes even if there appears to be
> "plenty" of free memory) then there are several options, like tweaking
> the lower_zone_protection settings (RHEL4 and above only I think) or
> running a hugemem kernel which, although officially only required for
> systems with >16GB of RAM, can help if you system is starved for lower
> zone memory.

Any idea which would be better - switching to 64-bit or running the
hugemem kernel on 32-bit?

> Perhaps if you post some of the log output from the OOM killer we can
> offer more specific suggestions.

Here's one from /var/log/messages, does oom-killer log anywhere else?

oom-killer: gfp_mask=0xd0
Mem-info:
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
cpu 1 hot: low 2, high 6, batch 1
cpu 1 cold: low 0, high 2, batch 1
cpu 2 hot: low 2, high 6, batch 1
cpu 2 cold: low 0, high 2, batch 1
cpu 3 hot: low 2, high 6, batch 1
cpu 3 cold: low 0, high 2, batch 1
cpu 4 hot: low 2, high 6, batch 1
cpu 4 cold: low 0, high 2, batch 1
cpu 5 hot: low 2, high 6, batch 1
cpu 5 cold: low 0, high 2, batch 1
cpu 6 hot: low 2, high 6, batch 1
cpu 6 cold: low 0, high 2, batch 1
cpu 7 hot: low 2, high 6, batch 1
cpu 7 cold: low 0, high 2, batch 1
Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16
cpu 2 hot: low 32, high 96, batch 16
cpu 2 cold: low 0, high 32, batch 16
cpu 3 hot: low 32, high 96, batch 16
cpu 3 cold: low 0, high 32, batch 16
cpu 4 hot: low 32, high 96, batch 16
cpu 4 cold: low 0, high 32, batch 16
cpu 5 hot: low 32, high 96, batch 16
cpu 5 cold: low 0, high 32, batch 16
cpu 6 hot: low 32, high 96, batch 16
cpu 6 cold: low 0, high 32, batch 16
cpu 7 hot: low 32, high 96, batch 16
cpu 7 cold: low 0, high 32, batch 16
HighMem per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16
cpu 2 hot: low 32, high 96, batch 16
cpu 2 cold: low 0, high 32, batch 16
cpu 3 hot: low 32, high 96, batch 16
cpu 3 cold: low 0, high 32, batch 16
cpu 4 hot: low 32, high 96, batch 16
cpu 4 cold: low 0, high 32, batch 16
cpu 5 hot: low 32, high 96, batch 16
cpu 5 cold: low 0, high 32, batch 16
cpu 6 hot: low 32, high 96, batch 16
cpu 6 cold: low 0, high 32, batch 16
cpu 7 hot: low 32, high 96, batch 16
cpu 7 cold: low 0, high 32, batch 16
Free pages:       15072kB (1600kB HighMem)
Active:2230508 inactive:1839773 dirty:9223 writeback:0 unstable:0
free:3768 slab:46213 mapped:890178 pagetables:3166
DMA free:12544kB min:16kB low:32kB high:48kB active:0kB inactive:0kB
present:16384kB pages_scanned:10691 all_unreclaimable? yes
protections[]: 0 0 0
Normal free:928kB min:928kB low:1856kB high:2784kB active:0kB
inactive:522028kB present:901120kB pages_scanned:954558
all_unreclaimable? yes
protections[]: 0 0 0
HighMem free:1600kB min:512kB low:1024kB high:1536kB active:8922032kB
inactive:6837064kB present:16646144kB pages_scanned:0 all_unreclaimable?
no
protections[]: 0 0 0
DMA: 4*4kB 4*8kB 3*16kB 3*32kB 3*64kB 3*128kB 2*256kB 0*512kB 1*1024kB
1*2048kB 2*4096kB = 12544kB
Normal: 16*4kB 12*8kB 6*16kB 1*32kB 0*64kB 1*128kB 0*256kB 1*512kB
0*1024kB 0*2048kB 0*4096kB = 928kB
HighMem: 2*4kB 3*8kB 2*16kB 2*32kB 1*64kB 1*128kB 5*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 1600kB
Swap cache: add 35926, delete 29354, find 8841/12488, race 0+0
0 bounce buffer pages
Free swap:       15974324kB
4390912 pages of RAM
3964840 pages of HIGHMEM
232575 reserved pages
3014401 pages shared
6572 pages swap cached
Out of Memory: Killed process 6344 (vmware-vmx).

It's mostly Greek to me.  Any suggestions or insights are greatly
appreciated.  Thanks!

-Eric

-- 
Eric Sisler <esisler westminster lib co us>
Library Network Specialist
Westminster Public Library
Westminster, CO USA

Linux - Don't fear the Penguin.
Want to know what we use Linux for?
Visit http://wallace.westminster.lib.co.us/linux


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]