[libvirt] [PATCH v2]daemon: Fix a crash during virNetlinkEventServiceStopAll

Haitaoliu Haitao.Liu at windriver.com
Tue Aug 13 06:54:34 UTC 2019


HI peter,

Could you help me to review it ?  I  sent it about two months ago.

thanks,

haitao

On 6/12/19 3:18 PM, Liu Haitao wrote:
> When reboot the host, a core dump file would be generated.
>
> The call traces are:
>
> Note.In this case, the  main thread is thread 5.
>                  
> (gdb) thread 5
> [Switching to thread 5 (LWP 4142)]
> (gdb) bt
> 0  0x00007f00a6838273 in futex_wait_cancelable (private=<optimized out>,
>      expected=0, futex_word=0x7f004c0125c0)
>      at /usr/src/debug/glibc/2.24-r0/git/sysdeps/unix/sysv/linux/futex-internal.h:88
> 1  __pthread_cond_wait_common (abstime=0x0, mutex=0x7f004c012540,
>      cond=0x7f004c012598)
>      at /usr/src/debug/glibc/2.24-r0/git/nptl/pthread_cond_wait.c:502
> 2  __pthread_cond_wait (cond=0x7f004c012598, mutex=0x7f004c012540)
>      at /usr/src/debug/glibc/2.24-r0/git/nptl/pthread_cond_wait.c:655
> 3  0x00007f00aa467246 in virCondWait (c=<optimized out>, m=<optimized out>)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthread.c:154
> 4  0x00007f00aa467eb0 in virThreadPoolFree (pool=<optimized out>)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthreadpool.c:286
> 5  0x00007f0074349f9d in qemuStateCleanup ()
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/qemu/qemu_driver.c:1036
> 6  0x00007f00aa5e9486 in virStateCleanup ()
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/libvirt.c:682
> 7  0x000055a687ab86a4 in main (argc=<optimized out>, argv=<optimized out>)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/remote/remote_daemon.c:1473
>
> (gdb) thread 1
> [Switching to thread 1 (LWP 4403)]
> (gdb) bt
> 0  __GI___pthread_mutex_lock (mutex=mutex at entry=0x0)
>      at /usr/src/debug/glibc/2.24-r0/git/nptl/pthread_mutex_lock.c:67
> 1  0x00007f00aa467165 in virMutexLock (m=m at entry=0x0)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthread.c:89
> 2  0x00007f00aa43c0f9 in virNetlinkEventServerLock (driver=<optimized out>)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virnetlink.c:799
> 3  virNetlinkEventRemoveClient (watch=watch at entry=0,
>      macaddr=macaddr at entry=0x7f0088014944, protocol=protocol at entry=0)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virnetlink.c:1197
> 4  0x00007f00aa4341df in virNetDevMacVLanDeleteWithVPortProfile (
>      ifname=<optimized out>, macaddr=macaddr at entry=0x7f0088014944,
>      linkdev=0x7f0088014920 "eth1", mode=mode at entry=1,
>      virtPortProfile=virtPortProfile at entry=0x0,
>      stateDir=stateDir at entry=0x7f004c12fa90 "/var/run/libvirt/qemu")
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virnetdevmacvlan.c:1112
> 5  0x00007f0074312251 in qemuProcessStop (driver=driver at entry=0x7f004c0ecef0,
>      vm=vm at entry=0x7f0088000b00,
>      reason=reason at entry=VIR_DOMAIN_SHUTOFF_SHUTDOWN,
>      asyncJob=asyncJob at entry=QEMU_ASYNC_JOB_NONE, flags=<optimized out>)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/qemu/qemu_process.c:7291
> 6  0x00007f007437a5ea in processMonitorEOFEvent (vm=0x7f0088000b00, driver=0x7f004c0ecef0)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/qemu/qemu_driver.c:4756
> 7  qemuProcessEventHandler (data=0x55a687d6df10, opaque=0x7f004c0ecef0)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/qemu/qemu_driver.c:4859
> 8  0x00007f00aa467c5b in virThreadPoolWorker (
>      opaque=opaque at entry=0x55a687d6c110)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthreadpool.c:163
> 9  0x00007f00aa466fe8 in virThreadHelper (data=<optimized out>)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthread.c:206
> 10 0x00007f00a68323f4 in start_thread (arg=0x7f00699df700)
>      at /usr/src/debug/glibc/2.24-r0/git/nptl/pthread_create.c:456
> 11 0x00007f00a616e10f in clone ()
>      at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
>
>
> 1. The execution flow of main thread (Thread 5 LWP 4142):
> main()
>    -->virNetDaemonRun()
>    -->virNetDaemonClose(dmn)  //cleanup
>    -->virNetlinkEventServiceStopAll()
>    -->virStateCleanup()
> 	 -->qemuStateCleanup()
> 	   -->virThreadPoolFree()
> 	     -->__pthread_cond_wait()
>
> virNetDaemonRun()
>      -->virEventRunDefaultImpl
>        -->virEventPollRunOnce
>         -->virEventPollDispatchHandles
>          -->qemuMonitorIO
>            -->qemuProcessHandleMonitorEOF
>              -->processEvent->eventType = QEMU_PROCESS_EVENT_MONITOR_EOF
>               -->virThreadPoolSendJob()
>
> After typing reboot command on the host, the main thread would send an event message to another thread.
> Here it would let thread 1 to handle the shutdown of qemu process. But it could
> not be executed immediately.
>
> virNetlinkEventServiceStopAll()
> 	--> virNetlinkEventServiceStop()
> 	  --> server[protocol] = NULL;   // set server to null
>
> IN virNetlinkEventServiceStopAll(), some variables related to network are freed,
> like (static virNetlinkEventSrvPrivatePtr server).
>
> virStateCleanup()
> 	-->qemuStateCleanup()
> 	   -->virThreadPoolFree()
> 	     -->__pthread_cond_wait()
>
> In virThreadPoolFree() it will wait other thread to end up.
>
> 2. The execution flow of thread 5 (LWP 4403):
> qemuProcessStop()
>     -->virNetDevMacVLanDeleteWithVPortProfile()
> 	  -->virNetlinkEventRemoveClient()
> 	     --> srv = server[protocol]
>
>
> Although the main thread had sent the message to thread 1(4403), it could not be
> run instantly. It means that the  virNetlinkEventServiceStopAll() might be
> executed earlier than virNetlinkEventRemoveClient(). We could get it from the log file.
>
> ""
> 2019-06-12 00:10:09.230+0000: 4142: info : virNetlinkEventServiceStopAll:941 : stopping all netlink event services
> 2019-06-12 00:10:09.230+0000: 4142: info : virNetlinkEventServiceStop:904 : stopping netlink event service
> 2019-06-12 00:10:21.165+0000: 4403: debug : virNetlinkEventRemoveClient:1190 : removing client watch=0, mac=0x7f0088014944.
> "
>
> In virNetlinkEventRemoveClient() the variable server is used again, but now it
> is null that is freed by virNetlinkEventServiceStopAll().So it would case a crash .
>
> The virNetlinkEventServiceStopAll() should be executed behind virStateCleanup(),
>
> Signed-off-by: Liu Haitao <haitao.liu at windriver.com>
> ---
>   src/remote/remote_daemon.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/src/remote/remote_daemon.c b/src/remote/remote_daemon.c
> index c3782971f1..7da20a6644 100644
> --- a/src/remote/remote_daemon.c
> +++ b/src/remote/remote_daemon.c
> @@ -1464,8 +1464,6 @@ int main(int argc, char **argv) {
>       /* Keep cleanup order in inverse order of startup */
>       virNetDaemonClose(dmn);
>   
> -    virNetlinkEventServiceStopAll();
> -
>       if (driversInitialized) {
>           /* NB: Possible issue with timing window between driversInitialized
>            * setting if virNetlinkEventServerStart fails */
> @@ -1473,6 +1471,8 @@ int main(int argc, char **argv) {
>           virStateCleanup();
>       }
>   
> +    virNetlinkEventServiceStopAll();
> +
>       virObjectUnref(adminProgram);
>       virObjectUnref(srvAdm);
>       virObjectUnref(qemuProgram);




More information about the libvir-list mailing list