[olpc-software] graceful handling of out-of-memory conditions

Tue Mar 28 02:36:07 UTC 2006

On Mon, 2006-03-27 at 20:03 -0500, Jim Gettys wrote:
> Now it's your turn to be caught out on a limb and have someone saw it
> off ;-).
> 
> The window manager is the item that knows what applications are most
> likely to be used by the user, and will likely be key to decent OOM
> behavior.  It knows what's on top, what's iconified, what is covered.
> It is the process most likely to be telling the OS what processes have
> to be killed in extremis.  Consider it an absolutely essential
> component.

Yup, I agree the WM should determine who to kill when running low on
memory; I was just making the point that should it temporarily die, the
session can be restored when the WM comes back up...

Reading this whole thread about how to handle OOM in applications is
kinda surreal - I mean, as much as some people are unhappy with the
reality today, it's not like we're in a position to rewrite all the
libraries and applications to handle OOM... and even if we were, you
would find it hard to get upstream acceptance for, say, 30% more code to
handle OOM. I bet it wouldn't even be possible with the architecture.
It's simply a non-starter to change this and thus pointless to even
discuss it.

What's wrong with looking at it this way? What we're doing with this
project is to provide a platform for applications. Part of this platform
includes a shell for launching applications. The platform in itself is
not really interesting for the end user; sure, it will include a panel,
an applet to see how much power you got left, an applet to connect to
networks, an applet to switch between applications etc. It also includes
some daemons (D-BUS, HAL, NM etc.) to do the heavy lifting.

As such, I'd argue that the "shell" part of the platform (panel, WM,
power mgmt, network applets) is under our control insofar that we should
be able to assume that it doesn't grow out of bounds or reap a lot of
resources. Of course, this may not be reality yet but it's fixable. And
no matter what we do we need to keep at least the shell under control as
the shell is always running.

I guess I'm just trying to say that... by default processes should be
protected from being nuked by the OOM killer in the kernel. When our
shell launches an application such as Firefox or Abiword it should
simply set a flag for that process such that the OOM killer in the
kernel may nuke it. Child processes should inherit this flag of course.
Hence, only apps that are not part of the "shell" may get nuked and this
should be sufficient as we already verified (through QA and testing)
that our shell isn't leaking.

What about this simple low-tech solution? If it's not 100% stupid, I
suppose the next question is 1) what kernel changes are necessary; and
2) can we expect the Linux kernel people to accept such a patch
upstream?

Of course, down the road we may add further things like instrumenting
apps to deal with SIGDANGER or whatever signals from the kernel, save
their session and automatically exit, cooperate with our WM, restore
from hibernation etc. 

> And session management of some sort, whether built into a window manager
> or separate, is what I'm referring to as a general concept of
> understanding what the set of applications the user is using right now,
> whether they are instantiated in runnable processes or not.
> Maybe I should have used a less loaded term.

Yea, I'm not sure X11 session management is particularly relevant these
days but I could be wrong? Or maybe it's just the GNOME session manager
that have done nothing but get in my way :-)

    David