Races / crashes in shutdown of libvirtd daemon
Nikolay Shirokovskiy
nshirokovskiy at virtuozzo.com
Tue Apr 28 15:18:50 UTC 2020
On 27.04.2020 18:54, Daniel P. Berrangé wrote:
> We got a new BZ filed about a libvirtd crash in shutdown
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1828207
>
> We can see from the stack trace that the "interface" driver is in
> the middle of servicing an RPC call for virConnectListAllInterfaces()
>
> Meanwhile the libvirtd daemon is doing virObjectUnref(dmn) on the
> virNetDaemonPtr object.
>
> The fact that it is doing this unref, means that it must have already
> call virStateCleanup(), given the code sequence:
>
>
> /* Run event loop. */
> virNetDaemonRun(dmn);
>
> ret = 0;
>
> virHookCall(VIR_HOOK_DRIVER_DAEMON, "-", VIR_HOOK_DAEMON_OP_SHUTDOWN,
> 0, "shutdown", NULL, NULL);
>
> cleanup:
> /* Keep cleanup order in inverse order of startup */
> virNetDaemonClose(dmn);
>
> virNetlinkEventServiceStopAll();
>
> if (driversInitialized) {
> /* NB: Possible issue with timing window between driversInitialized
> * setting if virNetlinkEventServerStart fails */
> driversInitialized = false;
> virStateCleanup();
> }
>
> virObjectUnref(adminProgram);
> virObjectUnref(srvAdm);
> virObjectUnref(qemuProgram);
> virObjectUnref(lxcProgram);
> virObjectUnref(remoteProgram);
> virObjectUnref(srv);
> virObjectUnref(dmn);
>
>
> Unless I'm missing something non-obvious, this cleanup code path is
> inherantly broken & racy. When virNetDaemonRun() returns the RPC
> worker threads are all still active. They are all liable to still
> be executing RPC calls, which means any of the drivers may be in
> use. So calling virStateCleanup() is an inherantly dangerous
> thing to do. There is the further complication that once we have
> exitted the main loop we may prevent the RPC calls from ever
> completing, as they may be waiting on an event to be dispatched.
>
> I know we're had various patch proposals in the past to improve the
> robustness of shutdown cleanup but I can't remember the outcome of the
> reviews. Hopefully people involved in those threads can jump in here...
Hi, all!
Yeah I and John Ferlan attempted to address the issue in the past.
The last reference I found is [1]. One can dig down the history
of the issue thru the links inside.
IIRC the latest approach was similar to what you suggested below namely
run event loop for a while after quit is seen in order to let
RPC worker threads waiting for event to finish.
[1] https://www.redhat.com/archives/libvir-list/2018-July/msg00382.html
>
>
> IMHO the key problem here is the virNetDeamonRun() method which just
> looks at the "quit" flag and immediately returns if it is set.
> This needs to be changed so that when it sees quit == true, it takes
> the following actions
>
> 1. Call virNetDaemonClose() to drop all RPC clients and thus prevent
> new RPC calls arriving
> 2. Flush any RPC calls which are queued but not yet assigned to a
> worker thread
> 3. Tell worker threads to exit after finishing their current job
> 4. Wait for all worker threads to exit
> 5. Now virNetDaemonRun may return
>
> At this point we can call virStateCleanup and the various other
> things, as we know no drivers are still active in RPC calls.
>
> Having said that, there could be background threads in the the
> drivers which are doing work that uses the event loop thread.
>
> So we probably need a virStateClose() method that we call from
> virNetDaemonRun, *after* all worker threads are gone, which would
> cleanup any background threads while the event loop is still
> running.
I guess RPC worker threads may need some signal to finish in time too.
For example in case of migration.
Nikolay
>
>
> The issue is that step 4 above ("Wait for all worker threads to exit")
> may take too long, or indeed never complete. To deal with this, it
> will need a timeout. In the remote_daemon.c cleanup code path, if
> there are still worker threads present, then we need to skip all
> cleanup and simply call _exit(0) to terminate the process with no
> attempt at cleanup, since it would be unsafe to try anything else.
>
> Regards,
> Daniel
>
More information about the libvir-list
mailing list