[olpc-software] graceful handling of out-of-memory conditions

Jim Gettys jg at laptop.org
Mon Mar 27 16:19:54 UTC 2006


fOn Mon, 2006-03-27 at 08:03 -0500, Alan Cox wrote: 
> On Sun, Mar 26, 2006 at 02:03:00PM -0500, Havoc Pennington wrote:
> > Making something sane happen on OOM is a lot more work than just adding 
> > "if (!ptr)" checks.
> 
> That depends on the programming design. If it was designed properly you are
> in deep poo. In the case of stuff like dbus its hard because you have to keep
> going. Most applications can handle OOM by writing a recovery file and exiting
> or in many cases where changes are committed by just going boom

Certainly it was painful to to get the X server to be robust against
OOM, and the retrofit was difficult (done 15 or so years ago).  And
whether it is still robust against OOM is far from clear (it should
generally return BadAlloc and continue, possibly killing a connection);
we may have to do some more serious testing of at least our driver (and
see how much of the DIX part of the X server leaks and/or behaves in OOM
conditions, and whether it is still robust).

This experience makes me as pessimistic as Havoc about getting many
applications fixed to be robust against OOM, which is something that
system components should be.  When I say "system components", I use a
wider net than just the Linux kernel, but include the X server, Window
manager, session manager, but not much else.  I'd like the base
environment to be rock solid.

I'm more optimistic about the "save your state" and resume strategy,
which I think holds promise.  As I said before, if we have control over
the order of when applications get shot when memory is low, we can also
likely warn them to save state before this happens, and have
applications transparently restart when again visible.  I believe the
Maemo folks have been doing some work along these lines.

> 
> > Another complexity that applies to a normal Linux system but perhaps not 
> > to OLPC is that with e.g. the default Fedora swap configuration, the 
> > system is unusably slow and thoroughly locked up long before malloc 

Right now, some applications are hemorrhaging memory; this is literally
forcing the system to start to page out stuff you need. If you start
paging, your performance goes to hell in a hurry. And our most important
application is doing this at the moment (Firefox; though it sounds like
progress is being made toward fixing it; tabs are currently your enemy).
Paging back in a leaking, bloated application that got paged out takes a
long while indeed.

And I think it isn't too much to ask the community to fix their memory
leaks and gratuitously large memory usage: everyone will benefit.

> 
> This is because if we turn on the no-overcommit mode some bits of gnome are such
> a pile of leaky, bloated garbage you basically can't run stuff like evolution
> at all. In no-overcommit mode the OOM situation is controlled and the malloc
> fail return is very fast indeed.
> 

I don't think we're likely to run quite a bit of the 'standard' gnome
applications as a result of this, but equally the fact that many of
those applications are not even appropriate for the target audience.
I'm certainly going to talk about this at GUADEC.  I hope observations
about our memory wastage will be a wakeup call to the community.

The paging system should be recovering unused pages of text when more
heap/stack is needed; we certainly observed this behavior on the iPAQ.
But there comes a minimum point when you end up paging text too much out
of flash (which wastes power and hurts speed).
                        - Jim

-- 
Jim Gettys
One Laptop Per Child





More information about the olpc-software mailing list