[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Networkmanager service is shutdown too early

On Sun, 2008-06-01 at 09:14 -0400, Dan Williams wrote:
> On Fri, 2008-05-30 at 16:49 -0400, Alan Cox wrote:
> > On Fri, May 30, 2008 at 03:33:37PM -0400, Colin Walters wrote:
> > > DBus is not the same as any other random software because it is explicitly
> > > designed to provide reliable communication *between* components, much like
> > > the kernel.  If you restart it at random times that reliability guarantee is
> > > destroyed.
> > 
> > So the questions you should ask are
> > - Why does restarting dbus have to be unreliable
> It's a communication pipe; restarting D-Bus itself is reliable becuase
> it's just like TCP.  Its the transport.  But making what gets
> _transported_ reliable is the kicker.
> It's exactly like all those Cingular/AT&T dropped call commercials from
> a while ago:
> http://youtube.com/watch?v=DR26BZUo3Dk
> http://youtube.com/watch?v=GEd3pS1jXJ4
> http://www.spike.com/video/2839248 (spoof)

Except that in the case of NM, instead of being a one-to-one
conversation like these, it's like NM is the foreman of a construction
site, and just because his/her conversation to each one of the
bulldozer, crane, and structure welders drops because the cell company
had an outage at the base station his/her phone is patched through,
doesn't mean that when the outage is over, that s/he can just assume
that what the bulldozer, crane, and welders did in the mean time was
exactly what needed to happen.  S/He needs to go and verify that
everything is exactly like s/he expects it, except s/he can't yell "All
Stop!" but has to check and verify everything while the work keeps
going.  Not simple.


> Suddenly all the state dependent on a D-Bus service is suspect, because
> you have no idea what's going on while the bus is down.  You have to
> re-synchronize your state after the bus comes back, and that's not a
> simple task.
> > - Why isn't there a recovery mechanism
> The recovery mechanism would be in each service, because the service
> knows whether or not it needs recovery or not, and would know how to
> merge/synchronize it's state with the services that it depends on.  Some
> don't need to.  But ones with state dependent on other D-Bus services
> would.
> > - Why does network manager have to do the work itself not the support code
> Like above, because NM has specific state, and when D-Bus goes away,
> it's communication channels with the daemons that affect that
> NM-specific state are gone, and NM can't make any assumptions about
> what's happening in any other daemon while the bus is gone.  Maybe your
> VPN just came up for rekeying, but the signal got lost because D-Bus
> isn't around.  So when the bus comes back, your VPN connection is
> already dropped.
> Or DHCP re-bound while the bus was down, and your sysadmin changed DNS
> servers on you, and the signal from dhclient got lost (because the bus
> was down).  Unless you re-do the entire DHCP transaction (or teach
> dhclient about dbus properly so it can answer questions without having
> to exec() stupid scripts that then re-emit state back over D-Bus) NM
> would have no idea that the returned DHCP options had changed.  And thus
> your DNS is dead.
> > And more fundamentally
> > 
> > Why the ... are people still writing software which doesn't try and tolerate
> > faults that are recoverable to a useful extent.  Yes dbus might have to lose
> > a few messages and send everyone a "duh whoops" event so they can recover but
> > "oh dear it broke everyone reboot" is not good engineering.
> In some cases, it's a cost/benefit analysis.  Is the cost of writing and
> maintaining a pile of code that handles a D-Bus restart, which shouldn't
> ever happen, worth the benefit?  In some cases, definitely.  In other
> cases, probably not.  That isn't an excuse to write crappy software, but
> it's certainly not as simple of a problem as you present it.
> Dan
> > So I'm likewise pleased the Debian people raised a sensible point.
> > 
> > Alan
> > 

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]