[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] [PATCH 1/2] qemu: Add missing lock of virDomainObj before calling virDomainUnref

On 03/04/2011 02:46 PM, Eric Blake wrote:
>> +++ b/src/qemu/qemu_process.c
>> @@ -596,7 +596,9 @@ static void qemuProcessHandleMonitorDestroy(qemuMonitorPtr mon,
>>      qemuDomainObjPrivatePtr priv = vm->privateData;
>>      if (priv->mon == mon)
>>          priv->mon = NULL;
>> -    virDomainObjUnref(vm);
>> +    virDomainObjLock(vm);
>> +    if (virDomainObjUnref(vm) > 0)
>> +        virDomainObjUnlock(vm);
> Phooey, this causes 'virsh save domain file' to deadlock.
> I'm still looking into why, but it may be that we're going about fixing
> this wrong.  Maybe we should be making sure that qemuProcessStop ensures
> that the monitor callback function won't trigger if there is no domain
> for it to trigger on.

I think the real reason for the deadlock is that:

qemudDomainSaveFlag holds the domain lock while it calls
qemuDomainEventQueue, which requests the event loop lock

the event loop thread holds the event loop lock, and can make several
different callbacks that can result in requesting a domain lock

I counted 29 callers of qemuDomainEventQueue; the majority of those were
in situations where the domain lock had already been dropped (in fact,
it makes sense to generate the event while you own the domain, but wait
to fire it off until after you are done with the domain).  But these
culprits either held a domain lock, or I couldn't quickly see if they
might hold a domain lock higher in the call stack:

qemuMigrationSetOffline in qemu_migration.c has the domain lock
qemudDomainSaveFlag in qemu_driver.c
qemuDomainRevertToSnapshot in qemu_driver.c

Not sure about:
qemuMigrationFinish in qemu_migration.c
qemudNotifyLoadDomain in qemu_driver.c
qemudDomainSaveImageStart  in qemu_driver.c
qemuDomainObjStart in qemu_driver.c

Of those, qemuDomainRevertToSnapshot can easily be fixed by swapping the
unlock and event queue calls.  But the rest could probably use a helper
method that increases the vm ref count, unlocks the domain lock, fires
the event, grabs the domain lock, reduces the vm ref count, and checks
that the domain is still active.  However, I ran out of time to code
that up for now, and am not positive that it is the right solution.

Dan, your thoughts?

Eric Blake   eblake redhat com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]