[libvirt] [PATCH v2] qemu_migration: Avoid crashing if domain dies too quickly

Wangyufei (A) james.wangyufei at huawei.com
Fri Oct 18 04:06:07 UTC 2013


Thanks at first, this patch some kinda solve my problem until now. But I still have a doubt about this patch.

> -----Original Message-----
> From: libvir-list-bounces at redhat.com
> [mailto:libvir-list-bounces at redhat.com] On Behalf Of Michal Privoznik
> Sent: Friday, October 11, 2013 8:15 PM
> To: libvir-list at redhat.com
> Subject: [libvirt] [PATCH v2] qemu_migration: Avoid crashing if domain dies
> too quickly
> 
> I've noticed a SIGSEGV-ing libvirtd on the destination when the qemu
> died too quickly = in Prepare phase. What is happening here is:
> 
> 1) [Thread 3493] We are in qemuMigrationPrepareAny() and calling
> qemuProcessStart() which subsequently calls qemuProcessWaitForMonitor()
> and qemuConnectMonitor(). So far so good. The qemuMonitorOpen()
> succeeds, however switching monitor to QMP mode fails as qemu died
> meanwhile. That is qemuMonitorSetCapabilities() returns -1.
> 
> 2013-10-08 15:54:10.629+0000: 3493: debug :
> qemuMonitorSetCapabilities:1356 : mon=0x14a53da0
> 2013-10-08 15:54:10.630+0000: 3493: debug :
> qemuMonitorJSONCommandWithFd:262 : Send command
> '{"execute":"qmp_capabilities","id":"libvirt-1"}' for write with FD -1
> 2013-10-08 15:54:10.630+0000: 3493: debug :
> virEventPollUpdateHandle:147 : EVENT_POLL_UPDATE_HANDLE: watch=17
> events=13
> ...
> 2013-10-08 15:54:10.631+0000: 3493: debug : qemuMonitorSend:956 :
> QEMU_MONITOR_SEND_MSG: mon=0x14a53da0
> msg={"execute":"qmp_capabilities","id":"libvirt-1"}
>  fd=-1
> 2013-10-08 15:54:10.631+0000: 3262: debug : virEventPollRunOnce:641 :
> Poll got 1 event(s)
> 
> 2) [Thread 3262] The event loop is trying to do the talking to monitor.
> However, qemu is dead already, remember?
> 
> 2013-10-08 15:54:13.436+0000: 3262: error : qemuMonitorIORead:551 :
> Unable to read from monitor: Connection reset by peer
> 2013-10-08 15:54:13.516+0000: 3262: debug : virFileClose:90 : Closed fd 25
> ...
> 2013-10-08 15:54:13.533+0000: 3493: debug : qemuMonitorSend:968 : Send
> command resulted in error internal error: early end of file from monitor:
> possible problem:
> 
> 3) [Thread 3493] qemuProcessStart() failed. No big deal. Go to the
> 'endjob' label and subsequently to the 'cleanup'. Since the domain is
> not persistent and ret is -1, the qemuDomainRemoveInactive() is called.
> This has an (unpleasant) effect of virObjectUnref()-in the @vm object.
> Unpleasant because the event loop which is about to trigger EOF callback
> still holds a pointer to the @vm (not the reference). See the valgrind
> output below.
> 
> 4) [Thread 3262] So the even loop starts triggering EOF:
> 
> 2013-10-08 15:54:13.542+0000: 3262: debug : qemuMonitorIO:729 :
> Triggering EOF callback
> 2013-10-08 15:54:13.543+0000: 3262: debug :
> qemuProcessHandleMonitorEOF:294 : Received EOF on 0x14549110 'migt10'
> 
> And the monitor is cleaned up. This results in calling
> qemuProcessHandleMonitorEOF with the @vm pointer passed. The pointer
> is
> kept in qemuMonitor struct.
> 
> ==3262== Thread 1:
> ==3262== Invalid read of size 4
> ==3262==    at 0x77ECCAA: pthread_mutex_lock (in
> /lib64/libpthread-2.15.so)
> ==3262==    by 0x52FAA06: virMutexLock (virthreadpthread.c:85)
> ==3262==    by 0x52E3891: virObjectLock (virobject.c:320)
> ==3262==    by 0x11626743: qemuProcessHandleMonitorEOF
> (qemu_process.c:296)
> ==3262==    by 0x11642593: qemuMonitorIO (qemu_monitor.c:730)
> ==3262==    by 0x52BD526: virEventPollDispatchHandles
> (vireventpoll.c:501)
> ==3262==    by 0x52BDD49: virEventPollRunOnce (vireventpoll.c:648)
> ==3262==    by 0x52BBC68: virEventRunDefaultImpl (virevent.c:274)
> ==3262==    by 0x542D3D9: virNetServerRun (virnetserver.c:1112)
> ==3262==    by 0x11F368: main (libvirtd.c:1513)
> ==3262==  Address 0x14549128 is 24 bytes inside a block of size 136 free'd
> ==3262==    at 0x4C2AF5C: free (in
> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==3262==    by 0x529B1FF: virFree (viralloc.c:580)
> ==3262==    by 0x52E3703: virObjectUnref (virobject.c:270)
> ==3262==    by 0x531557E: virDomainObjListRemove
> (domain_conf.c:2355)
> ==3262==    by 0x1160E899: qemuDomainRemoveInactive
> (qemu_domain.c:2061)
> ==3262==    by 0x1163A0C6: qemuMigrationPrepareAny
> (qemu_migration.c:2450)
> ==3262==    by 0x1163A923: qemuMigrationPrepareDirect
> (qemu_migration.c:2626)
> ==3262==    by 0x11682D71: qemuDomainMigratePrepare3Params
> (qemu_driver.c:10309)
> ==3262==    by 0x53B0976: virDomainMigratePrepare3Params
> (libvirt.c:7266)
> ==3262==    by 0x1502D3:
> remoteDispatchDomainMigratePrepare3Params (remote.c:4797)
> ==3262==    by 0x12DECA:
> remoteDispatchDomainMigratePrepare3ParamsHelper
> (remote_dispatch.h:5741)
> ==3262==    by 0x54322EB: virNetServerProgramDispatchCall
> (virnetserverprogram.c:435)
> 
> The mon->vm is set in qemuMonitorOpenInternal() which is the correct
> place to increase @vm ref counter. The correct place to decrease the ref
> counter is then qemuMonitorDispose().
> 
> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
> ---
> 
> It turned out the hack from v1 is no longer needed. In fact, my reasoning
> there
> flavouring the hack was flawed.
> 
>  src/qemu/qemu_capabilities.c | 14 ++++++++++----
>  src/qemu/qemu_monitor.c      |  4 +++-
>  2 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c
> index 7c39c1c..17095b4 100644
> --- a/src/qemu/qemu_capabilities.c
> +++ b/src/qemu/qemu_capabilities.c
> @@ -2587,7 +2587,8 @@ virQEMUCapsInitQMP(virQEMUCapsPtr
> qemuCaps,
>      char *monpath = NULL;
>      char *pidfile = NULL;
>      pid_t pid = 0;
> -    virDomainObj vm;
> +    virDomainObjPtr vm = NULL;
> +    virDomainXMLOptionPtr xmlopt = NULL;
> 
>      /* the ".sock" sufix is important to avoid a possible clash with a qemu
>       * domain called "capabilities"
> @@ -2650,10 +2651,13 @@ virQEMUCapsInitQMP(virQEMUCapsPtr
> qemuCaps,
>          goto cleanup;
>      }
> 
> -    memset(&vm, 0, sizeof(vm));
> -    vm.pid = pid;
> +    if (!(xmlopt = virDomainXMLOptionNew(NULL, NULL, NULL)) ||
> +        !(vm = virDomainObjNew(xmlopt)))
> +        goto cleanup;
> +
> +    vm->pid = pid;
> 
> -    if (!(mon = qemuMonitorOpen(&vm, &config, true, &callbacks, NULL)))
> {
> +    if (!(mon = qemuMonitorOpen(vm, &config, true, &callbacks, NULL))) {
>          ret = 0;
>          goto cleanup;
>      }
> @@ -2673,6 +2677,8 @@ cleanup:
>      virCommandFree(cmd);
>      VIR_FREE(monarg);
>      VIR_FREE(monpath);
> +    virObjectUnref(vm);

Is this virObjectUnref(vm) corresponding to mon->vm = virObjectRef(vm) added in qemuMonitorOpenInternal?
If it is, how can we confirm virObjectRef(vm) has been done in qemuMonitorOpenInternal? If an error (anyone follows)happened in qemuMonitorOpenInternal is before mon->vm = virObjectRef(vm),
then we still goto cleanup and do virObjectUnref(vm), the refs will be wrong. Am I right?

    if (!cb->eofNotify) {
        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
                       _("EOF notify callback must be supplied"));
        return NULL;
    }
    if (!cb->errorNotify) {
        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
                       _("Error notify callback must be supplied"));
        return NULL;
    }

    if (qemuMonitorInitialize() < 0)
        return NULL;

    if (!(mon = virObjectLockableNew(qemuMonitorClass)))
        return NULL;

    mon->fd = -1;
    mon->logfd = -1;
    if (virCondInit(&mon->notify) < 0) {
        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
                       _("cannot initialize monitor condition"));
        goto cleanup;
    }
    mon->fd = fd;
    mon->hasSendFD = hasSendFD;
    mon->vm = virObjectRef(vm);

> +    virObjectUnref(xmlopt);
> 
>      if (pid != 0) {
>          char ebuf[1024];
> diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c
> index a601ee0..2bafe28 100644
> --- a/src/qemu/qemu_monitor.c
> +++ b/src/qemu/qemu_monitor.c
> @@ -255,6 +255,8 @@ static void qemuMonitorDispose(void *obj)
>      VIR_DEBUG("mon=%p", mon);
>      if (mon->cb && mon->cb->destroy)
>          (mon->cb->destroy)(mon, mon->vm, mon->callbackOpaque);
> +    virObjectUnref(mon->vm);
> +
>      virCondDestroy(&mon->notify);
>      VIR_FREE(mon->buffer);
>      virJSONValueFree(mon->options);
> @@ -781,7 +783,7 @@ qemuMonitorOpenInternal(virDomainObjPtr vm,
>      }
>      mon->fd = fd;
>      mon->hasSendFD = hasSendFD;
> -    mon->vm = vm;
> +    mon->vm = virObjectRef(vm);
>      mon->json = json;
>      if (json)
>          mon->waitGreeting = true;
> --
> 1.8.1.5
> 
> --
> libvir-list mailing list
> libvir-list at redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list




More information about the libvir-list mailing list