Networkmanager service is shutdown too early

Sun Jun 1 23:07:32 UTC 2008

On Sun, 2008-06-01 at 11:49 -0400, Colin Walters wrote:
> On Sun, Jun 1, 2008 at 11:02 AM, Simo Sorce <ssorce at redhat.com> wrote:
>         
>         So far we can only consiuder DBUS as a sort of local UDP
>         transport, if
>         all goes well messages get to their destination but are not
>         guaranteed.
> 
> The argument here is that presently what we tell application authors
> is much more like TCP than UDP; if we allowed distributors to restart
> it in %post or the like automatically on upgrades, then we either have
> to change our guarantee, or try to "hide" the fact that the bus gets
> restarted under the covers.
> 
> I think the only sensible solution is the latter.  Which is certainly
> *possible*, just like how everything short of the halting problem is
> possible; but it would not be trivial.

Yes I agree, handling the restart inside dbus libraries is the best
choice.

> For many likely classes of DBus flaws, porting the Ksplice
> (http://web.mit.edu/ksplice/) style approach would be easiest
> probably.

To work you would have to separate very clearly functions from data
structures, standardize the latter, hanging them off the main code and
put the former into a loadable library, then you might be able to get to
a state where you can dlclose() and dlopen() again the library and iot
all just works.
This, in theory, I never saw code clean enough to work this way, but
usually this is just because it does not need to. :-) 

>   But to handle the general case, I can imagine a system where we send
> a special message to all clients like org.freedesktop.Local.Restart
> and this causes them to enter a mode where they queue pending
> messages, waiting via inotify for the socket to reappear.

Yes this would be a decent solution too, although this would require
some handling on the client apps.

>   The bus itself would try to flush all pending messages and save the
> current map of connections->service names and other state I'm not
> thinking of right now to JSON/XML/whatever.
> 
> Then on startup you'd need to wait for all of the previous clients to
> connect, probably with some timeout; I can't think of offhand how to
> make this nrandomon-racy.  After that we need to handle anything that
> changed in the meantime like clients having exited and thus lost their
> service name (this will happen for sure if we make other software
> restart on upgrade like setroubleshoot does).  So we compute that
> delta and then send the relevant signals off to clients.  

yup

> For someone who knew the code and was an A+ hacker it might only be a
> two week or so job, though to actually know this worked you'd have to
> spend a lot of time creating test cases.  

it may be tricky, but with a good test suite that insure this works it
would be really worth it.

>         What was the cost/benefit analysis in this case?
> 
> The original cost/benefit was "Absolutely nothing happens when I put
> my USB key into a Linux desktop" and "The networking system is a
> static mess of shell script that we edit via UIs run as root" =)

eh.. :-)

>         Given some people is thinking of using NM by default also on
>         servers
>         then this issue become more critical, servers do serve
>         clients,
> 
> Let's back up a second; if our overall goal is to make applying
> security/important-reliability updates happen more transparently, I
> think the best bang for the buck is going to be Linux.  For example,
> we could spend the engineering time figuring out how to get Ksplice
> (http://web.mit.edu/ksplice/) work under the RPM hood.  
> 
> DBus has so far had a pretty good security and reliability track
> record; while it's not simple software, it has simple goals and this
> has limited complexity.  Something like the Linux kernel clearly has a
> much bigger goal and so is order(s) of magnitude more complex and with
> this complexity has come the concomitant security/reliability issues.
> 
> And if I had the ability to herd security/reliability cats, I'd have
> them spend time on Firefox and try to take what Dan Walsh has been
> doing even farther - break it up into multiple processes with locked
> down security contexts and evaluate changes to the desktop to better
> handle the concept of processes with different privilege for example.

Security updates is just one aspect imo, reliability and self-repairing
with minimal service disruption is another very important goal.

Simo.

-- 
Simo Sorce * Red Hat, Inc * New York