[libvirt] [PATCH] qemu: Avoid libvirtd crash in qemuDomainObjExitAgentInternal
Daniel P. Berrange
berrange at redhat.com
Wed Aug 8 10:27:20 UTC 2012
On Wed, Aug 08, 2012 at 02:26:26PM +0800, Alex Jia wrote:
> On 08/07/2012 07:34 PM, Daniel P. Berrange wrote:
> >On Tue, Aug 07, 2012 at 03:18:38PM +0800, Alex Jia wrote:
> >>* src/qemu/qemu_domain.c (qemuDomainObjExitAgentInternal): fix crashing
> >> libvirtd due to derefing a NULL pointer.
> >>
> >>For details, please see bug:
> >>RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=845966
> >>
> >>Signed-off-by: Alex Jia<ajia at redhat.com>
> >>---
> >> src/qemu/qemu_domain.c | 10 ++++++----
> >> 1 files changed, 6 insertions(+), 4 deletions(-)
> >>
> >>diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
> >>index 86f0265..8667b6c 100644
> >>--- a/src/qemu/qemu_domain.c
> >>+++ b/src/qemu/qemu_domain.c
> >>@@ -1136,12 +1136,14 @@ qemuDomainObjExitAgentInternal(struct qemud_driver *driver,
> >> virDomainObjPtr obj)
> >> {
> >> qemuDomainObjPrivatePtr priv = obj->privateData;
> >>- int refs;
> >>+ int refs = -1;
> >>
> >>- refs = qemuAgentUnref(priv->agent);
> >>+ if (priv->agent) {
> >>+ refs = qemuAgentUnref(priv->agent);
> >>
> >>- if (refs> 0)
> >>- qemuAgentUnlock(priv->agent);
> >>+ if (refs> 0)
> >>+ qemuAgentUnlock(priv->agent);
> >>+ }
> >>
> >> if (driver_locked)
> >> qemuDriverLock(driver);
> >I'm not convinced this is the right fix. The whole point of the Enter/ExitAgent
> >methods is to hold an extra reference on priv->agent, so that it is *not*
> >deleted while a agent command is run.
> >
> >What is setting priv->agent to NULL while the command is still active ?
>
> In fact, the command 'guest-suspend-disk' is freed by
> virJSONValueFree() in qemuAgentSuspend() after the command is
> successfully sent via 'qemuAgentCommand()':
>
> (gdb) s
> qemuDomainPMSuspendForDuration (dom=<value optimized out>, target=1,
> duration=<value optimized out>, flags=<value optimized out>) at
> qemu/qemu_driver.c:13123
> 13123 qemuDomainObjExitAgent(driver, vm);
> (gdb) p *vm
> $68 = {object = {magic = 3405643788, refs = 4, klass =
> 0x7f4ce815a9b0}, lock = {lock = {__data = {__lock = 1, __count = 0,
> __owner = 20285, __nusers = 1, __kind = 0, __spins = 0, __list =
> {__prev = 0x0,
> __next = 0x0}}, __size =
> "\001\000\000\000\000\000\000\000=O\000\000\001", '\000' <repeats 26
> times>, __align = 1}}, pid = 20379, *state = {state = 4, reason =
> 0}*, autostart = 0, persistent = 1,
> updated = 0, def = 0x7f4ce815e500, newDef = 0x7f4ce8069b80,
> snapshots = {objs = 0x7f4ce815e240, metaroot = {def = 0x0, parent =
> 0x0, sibling = 0x0, nchildren = 0, first_child = 0x0}},
> current_snapshot = 0x0, hasManagedSave = false, privateData =
> 0x7f4ce8154660, privateDataFreeFunc = 0x7f4cefada190
> <qemuDomainObjPrivateFree>, taint = 4}
> (gdb) s
> 13122 ret = qemuAgentSuspend(priv->agent, target);
>
> (gdb) p *priv
> $70 = {job = {cond = {cond = {__data = {__lock = 0, __futex = 0,
> __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0,
> __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' <repeats 47
> times>,
> __align = 0}}, active = QEMU_JOB_MODIFY, owner = 20286,
> asyncCond = {cond = {__data = {__lock = 0, __futex = 0, __total_seq
> = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters =
> 0,
> __broadcast_seq = 0}, __size = '\000' <repeats 47 times>,
> __align = 0}}, asyncJob = QEMU_ASYNC_JOB_NONE, asyncOwner = 0, phase
> = 0, mask = 0, start = 0, dump_memory_only = false, info = {
> type = 0, timeElapsed = 0, timeRemaining = 0, dataTotal = 0,
> dataProcessed = 0, dataRemaining = 0, memTotal = 0, memProcessed =
> 0, memRemaining = 0, fileTotal = 0, fileProcessed = 0,
> fileRemaining = 0}}, mon = 0x7f4ce80a3570, monConfig = 0x0,
> monJSON = 1, monError = false, monStart = 0, *agent = 0x0*,
> agentError = false, agentStart = 1344402957193, gotShutdown = true,
> beingDestroyed = false, pidfile = 0x7f4ce816ba90
> "/var/run/libvirt/qemu/myRHEL6.pid", nvcpupids = 1, vcpupids =
> 0x7f4ce8146be0, pciaddrs = 0x7f4ce8171b30, persistentAddrs = 1,
> qemuCaps = 0x7f4ce80b4030,
> lockState = 0x0, fakeReboot = false, jobs_queued = 1,
> migMaxBandwidth = 32, origname = 0x0, cons = 0x7f4ce8165ce0,
> cleanupCallbacks = 0x0, ncleanupCallbacks = 0, ncleanupCallbacks_max
> = 0}
> (gdb) p priv->agent
> $71 =*(qemuAgentPtr) 0x0*
> (gdb) s
> 1138 refs = qemuAgentUnref(priv->agent);
> (gdb) s
> qemuAgentUnref (mon=0x0) at qemu/qemu_agent.c:168
> (gdb) s
> 170 VIR_DEBUG("%d", mon->refs);
> (gdb) s
> 168 {
> (gdb) s
> 169 mon->refs--;
> (gdb) s
>
> Program received signal SIGSEGV, Segmentation fault.
> qemuAgentUnref (*mon=0x0*) at qemu/qemu_agent.c:169
> 169 *mon->refs--*;
> (gdb) s
> virNetServerFatalSignal (sig=11, siginfo=0x7f4cf748f630,
> context=0x7f4cf748f500) at rpc/virnetserver.c:296
>
>
> In addition, the old qemuAgentUnref(mon) hasn't judge whether its
> parameter is NULL then will deref a NULL pointer, I should simply
> fix it in qemuAgentUnref(), for example, if 'mon' is NULL then
> directly return.
>
> Fortunately, your commit b57ee09 potentially fix this issue via
> using virObjectUnref() instead of qemuAgentUnref(), if the parameter
> 'priv->agent' is NULL then the virObjectUnref(priv->agent) will
> directly return false:
That is just lucky and we should not rely on that. There is still a
bug here. The 'priv->agent' field should *not* be set to NULL in the
first place, while a monitor command is active. The GDB trace above
does not show *what* is setting priv->agent to NULL, and that is
what we need to find out.
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
More information about the libvir-list
mailing list