[libvirt] [PATCH v2] qemu_migration: Avoid crashing if domain dies too quickly
Wangyufei (A)
james.wangyufei at huawei.com
Fri Oct 18 04:06:07 UTC 2013
Thanks at first, this patch some kinda solve my problem until now. But I still have a doubt about this patch.
> -----Original Message-----
> From: libvir-list-bounces at redhat.com
> [mailto:libvir-list-bounces at redhat.com] On Behalf Of Michal Privoznik
> Sent: Friday, October 11, 2013 8:15 PM
> To: libvir-list at redhat.com
> Subject: [libvirt] [PATCH v2] qemu_migration: Avoid crashing if domain dies
> too quickly
>
> I've noticed a SIGSEGV-ing libvirtd on the destination when the qemu
> died too quickly = in Prepare phase. What is happening here is:
>
> 1) [Thread 3493] We are in qemuMigrationPrepareAny() and calling
> qemuProcessStart() which subsequently calls qemuProcessWaitForMonitor()
> and qemuConnectMonitor(). So far so good. The qemuMonitorOpen()
> succeeds, however switching monitor to QMP mode fails as qemu died
> meanwhile. That is qemuMonitorSetCapabilities() returns -1.
>
> 2013-10-08 15:54:10.629+0000: 3493: debug :
> qemuMonitorSetCapabilities:1356 : mon=0x14a53da0
> 2013-10-08 15:54:10.630+0000: 3493: debug :
> qemuMonitorJSONCommandWithFd:262 : Send command
> '{"execute":"qmp_capabilities","id":"libvirt-1"}' for write with FD -1
> 2013-10-08 15:54:10.630+0000: 3493: debug :
> virEventPollUpdateHandle:147 : EVENT_POLL_UPDATE_HANDLE: watch=17
> events=13
> ...
> 2013-10-08 15:54:10.631+0000: 3493: debug : qemuMonitorSend:956 :
> QEMU_MONITOR_SEND_MSG: mon=0x14a53da0
> msg={"execute":"qmp_capabilities","id":"libvirt-1"}
> fd=-1
> 2013-10-08 15:54:10.631+0000: 3262: debug : virEventPollRunOnce:641 :
> Poll got 1 event(s)
>
> 2) [Thread 3262] The event loop is trying to do the talking to monitor.
> However, qemu is dead already, remember?
>
> 2013-10-08 15:54:13.436+0000: 3262: error : qemuMonitorIORead:551 :
> Unable to read from monitor: Connection reset by peer
> 2013-10-08 15:54:13.516+0000: 3262: debug : virFileClose:90 : Closed fd 25
> ...
> 2013-10-08 15:54:13.533+0000: 3493: debug : qemuMonitorSend:968 : Send
> command resulted in error internal error: early end of file from monitor:
> possible problem:
>
> 3) [Thread 3493] qemuProcessStart() failed. No big deal. Go to the
> 'endjob' label and subsequently to the 'cleanup'. Since the domain is
> not persistent and ret is -1, the qemuDomainRemoveInactive() is called.
> This has an (unpleasant) effect of virObjectUnref()-in the @vm object.
> Unpleasant because the event loop which is about to trigger EOF callback
> still holds a pointer to the @vm (not the reference). See the valgrind
> output below.
>
> 4) [Thread 3262] So the even loop starts triggering EOF:
>
> 2013-10-08 15:54:13.542+0000: 3262: debug : qemuMonitorIO:729 :
> Triggering EOF callback
> 2013-10-08 15:54:13.543+0000: 3262: debug :
> qemuProcessHandleMonitorEOF:294 : Received EOF on 0x14549110 'migt10'
>
> And the monitor is cleaned up. This results in calling
> qemuProcessHandleMonitorEOF with the @vm pointer passed. The pointer
> is
> kept in qemuMonitor struct.
>
> ==3262== Thread 1:
> ==3262== Invalid read of size 4
> ==3262== at 0x77ECCAA: pthread_mutex_lock (in
> /lib64/libpthread-2.15.so)
> ==3262== by 0x52FAA06: virMutexLock (virthreadpthread.c:85)
> ==3262== by 0x52E3891: virObjectLock (virobject.c:320)
> ==3262== by 0x11626743: qemuProcessHandleMonitorEOF
> (qemu_process.c:296)
> ==3262== by 0x11642593: qemuMonitorIO (qemu_monitor.c:730)
> ==3262== by 0x52BD526: virEventPollDispatchHandles
> (vireventpoll.c:501)
> ==3262== by 0x52BDD49: virEventPollRunOnce (vireventpoll.c:648)
> ==3262== by 0x52BBC68: virEventRunDefaultImpl (virevent.c:274)
> ==3262== by 0x542D3D9: virNetServerRun (virnetserver.c:1112)
> ==3262== by 0x11F368: main (libvirtd.c:1513)
> ==3262== Address 0x14549128 is 24 bytes inside a block of size 136 free'd
> ==3262== at 0x4C2AF5C: free (in
> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==3262== by 0x529B1FF: virFree (viralloc.c:580)
> ==3262== by 0x52E3703: virObjectUnref (virobject.c:270)
> ==3262== by 0x531557E: virDomainObjListRemove
> (domain_conf.c:2355)
> ==3262== by 0x1160E899: qemuDomainRemoveInactive
> (qemu_domain.c:2061)
> ==3262== by 0x1163A0C6: qemuMigrationPrepareAny
> (qemu_migration.c:2450)
> ==3262== by 0x1163A923: qemuMigrationPrepareDirect
> (qemu_migration.c:2626)
> ==3262== by 0x11682D71: qemuDomainMigratePrepare3Params
> (qemu_driver.c:10309)
> ==3262== by 0x53B0976: virDomainMigratePrepare3Params
> (libvirt.c:7266)
> ==3262== by 0x1502D3:
> remoteDispatchDomainMigratePrepare3Params (remote.c:4797)
> ==3262== by 0x12DECA:
> remoteDispatchDomainMigratePrepare3ParamsHelper
> (remote_dispatch.h:5741)
> ==3262== by 0x54322EB: virNetServerProgramDispatchCall
> (virnetserverprogram.c:435)
>
> The mon->vm is set in qemuMonitorOpenInternal() which is the correct
> place to increase @vm ref counter. The correct place to decrease the ref
> counter is then qemuMonitorDispose().
>
> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
> ---
>
> It turned out the hack from v1 is no longer needed. In fact, my reasoning
> there
> flavouring the hack was flawed.
>
> src/qemu/qemu_capabilities.c | 14 ++++++++++----
> src/qemu/qemu_monitor.c | 4 +++-
> 2 files changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c
> index 7c39c1c..17095b4 100644
> --- a/src/qemu/qemu_capabilities.c
> +++ b/src/qemu/qemu_capabilities.c
> @@ -2587,7 +2587,8 @@ virQEMUCapsInitQMP(virQEMUCapsPtr
> qemuCaps,
> char *monpath = NULL;
> char *pidfile = NULL;
> pid_t pid = 0;
> - virDomainObj vm;
> + virDomainObjPtr vm = NULL;
> + virDomainXMLOptionPtr xmlopt = NULL;
>
> /* the ".sock" sufix is important to avoid a possible clash with a qemu
> * domain called "capabilities"
> @@ -2650,10 +2651,13 @@ virQEMUCapsInitQMP(virQEMUCapsPtr
> qemuCaps,
> goto cleanup;
> }
>
> - memset(&vm, 0, sizeof(vm));
> - vm.pid = pid;
> + if (!(xmlopt = virDomainXMLOptionNew(NULL, NULL, NULL)) ||
> + !(vm = virDomainObjNew(xmlopt)))
> + goto cleanup;
> +
> + vm->pid = pid;
>
> - if (!(mon = qemuMonitorOpen(&vm, &config, true, &callbacks, NULL)))
> {
> + if (!(mon = qemuMonitorOpen(vm, &config, true, &callbacks, NULL))) {
> ret = 0;
> goto cleanup;
> }
> @@ -2673,6 +2677,8 @@ cleanup:
> virCommandFree(cmd);
> VIR_FREE(monarg);
> VIR_FREE(monpath);
> + virObjectUnref(vm);
Is this virObjectUnref(vm) corresponding to mon->vm = virObjectRef(vm) added in qemuMonitorOpenInternal?
If it is, how can we confirm virObjectRef(vm) has been done in qemuMonitorOpenInternal? If an error (anyone follows)happened in qemuMonitorOpenInternal is before mon->vm = virObjectRef(vm),
then we still goto cleanup and do virObjectUnref(vm), the refs will be wrong. Am I right?
if (!cb->eofNotify) {
virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
_("EOF notify callback must be supplied"));
return NULL;
}
if (!cb->errorNotify) {
virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
_("Error notify callback must be supplied"));
return NULL;
}
if (qemuMonitorInitialize() < 0)
return NULL;
if (!(mon = virObjectLockableNew(qemuMonitorClass)))
return NULL;
mon->fd = -1;
mon->logfd = -1;
if (virCondInit(&mon->notify) < 0) {
virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
_("cannot initialize monitor condition"));
goto cleanup;
}
mon->fd = fd;
mon->hasSendFD = hasSendFD;
mon->vm = virObjectRef(vm);
> + virObjectUnref(xmlopt);
>
> if (pid != 0) {
> char ebuf[1024];
> diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c
> index a601ee0..2bafe28 100644
> --- a/src/qemu/qemu_monitor.c
> +++ b/src/qemu/qemu_monitor.c
> @@ -255,6 +255,8 @@ static void qemuMonitorDispose(void *obj)
> VIR_DEBUG("mon=%p", mon);
> if (mon->cb && mon->cb->destroy)
> (mon->cb->destroy)(mon, mon->vm, mon->callbackOpaque);
> + virObjectUnref(mon->vm);
> +
> virCondDestroy(&mon->notify);
> VIR_FREE(mon->buffer);
> virJSONValueFree(mon->options);
> @@ -781,7 +783,7 @@ qemuMonitorOpenInternal(virDomainObjPtr vm,
> }
> mon->fd = fd;
> mon->hasSendFD = hasSendFD;
> - mon->vm = vm;
> + mon->vm = virObjectRef(vm);
> mon->json = json;
> if (json)
> mon->waitGreeting = true;
> --
> 1.8.1.5
>
> --
> libvir-list mailing list
> libvir-list at redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list
More information about the libvir-list
mailing list