[libvirt] [PATCH 2/2] qemu: completely rework reference counting
Peter Krempa
pkrempa at redhat.com
Wed Dec 3 10:45:37 UTC 2014
On 12/03/14 10:54, Peter Krempa wrote:
> On 12/03/14 06:49, Martin Kletzander wrote:
>> There is one problem that causes various errors in the daemon. When
>> domain is waiting for a job, it is unlocked while waiting on the
>> condition. However, if that domain is for example transient and
>> being removed in another API (e.g. cancelling incoming migration), it
>> get's unref'd. If the first call, that was waiting, fails to get the
>> job, it unref's the domain object, and because it was the last
>> reference, it causes clearing of the whole domain object. However,
>> when finishing the call, the domain must be unlocked, but there is no
>> way for the API to know whether it was cleaned or not (unless there
>> is some ugly temporary variable, but let's scratch that).
>>
>> The root cause is that our APIs don't ref the objects they are using
>> and all use the implicit reference that the object has when it is in
>> the domain list. That reference can be removed when the API is
>> waiting for a job. And because each domain doesn't do its ref'ing,
>> it results in the ugly checking of the return value of
>> virObjectUnref() that we have everywhere.
>>
>> This patch changes qemuDomObjFromDomain() to ref the domain (using
>> virDomainObjListFindByUUIDRef()) and adds qemuDomObjEndAPI() which
>> should be the only function in which the return value of
>> virObjectUnref() is checked. This makes all reference counting
>> deterministic and makes the code a bit clearer.
>>
>> Signed-off-by: Martin Kletzander <mkletzan at redhat.com> ---
>> src/qemu/THREADS.txt | 40 ++- src/qemu/qemu_domain.c | 28
>> +- src/qemu/qemu_domain.h | 12 +- src/qemu/qemu_driver.c | 708
>> ++++++++++++++++------------------------------
>> src/qemu/qemu_migration.c | 108 +++---- src/qemu/qemu_migration.h |
>> 10 +- src/qemu/qemu_process.c | 126 ++++----- 7 files changed, 379
>> insertions(+), 653 deletions(-)
>>
>
> ...
>
>> index 08d6b7c..a0582d3 100644 --- a/src/qemu/qemu_process.c +++
>> b/src/qemu/qemu_process.c
>
> ...
>
>> @@ -3756,6 +3736,9 @@ qemuProcessReconnectHelper(virDomainObjPtr
>> obj, struct qemuProcessReconnectData *src = opaque; struct
>> qemuProcessReconnectData *data;
>>
>> + virObjectLock(obj); + virObjectRef(obj); + if (!obj->pid)
>> return 0;
>>
>
> This hunk causes deadlock of the daemon when starting if you have at
> least one inactive VM. You lock the object, find out that the pid
> is 0 and return without unlocking.
>
> The daemon then locks up when trying to reload snapshot list for
> the domain as the domain is still left locked:
>
> Thread 2 (Thread 0x7fd6165f1700 (LWP 847791)):
> #0 0x00007fd62c16faf4 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1 0x00007fd62c16b459 in _L_lock_535 () from /lib64/libpthread.so.0
> #2 0x00007fd62c16b280 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3 0x00007fd62f99b453 in virMutexLock (m=0x7fd61c0e4810) at util/virthread.c:88
> #4 0x00007fd62f97cdc6 in virObjectLock (anyobj=0x7fd61c0e4800) at util/virobject.c:323
> #5 0x00007fd617672af5 in qemuDomainSnapshotLoad (vm=0x7fd61c0e4800, data=0x7fd61c0023f0) at qemu/qemu_driver.c:474
> #6 0x00007fd62f9f20b8 in virDomainObjListHelper (payload=0x7fd61c0e4800, name=0x7fd61c0f0de0, opaque=0x7fd6165f0810) at conf/domain_conf.c:20454
> #7 0x00007fd62f9559be in virHashForEach (table=0x7fd61c008050, iter=0x7fd62f9f2072 <virDomainObjListHelper>, data=0x7fd6165f0810) at util/virhash.c:521
> #8 0x00007fd62f9f213e in virDomainObjListForEach (doms=0x7fd61c03bdd0, callback=0x7fd617672a55 <qemuDomainSnapshotLoad>, opaque=0x7fd61c0023f0) at conf/domain_conf.c:20467
> #9 0x00007fd61767445c in qemuStateInitialize (privileged=true, callback=0x7fd630584ab2 <daemonInhibitCallback>, opaque=0x7fd63258dba0) at qemu/qemu_driver.c:877
> #10 0x00007fd62fa5f05a in virStateInitialize (privileged=true, callback=0x7fd630584ab2 <daemonInhibitCallback>, opaque=0x7fd63258dba0) at libvirt.c:742
> #11 0x00007fd630584b7a in daemonRunStateInit (opaque=0x7fd63258dba0) at libvirtd.c:918
> #12 0x00007fd62f99b8b4 in virThreadHelper (data=0x7fd6325b4480) at util/virthread.c:197
> #13 0x00007fd62c169023 in start_thread () from /lib64/libpthread.so.0
> #14 0x00007fd62bea470d in clone () from /lib64/libc.so.6
>
>
>
>
>> @@ -3780,8 +3763,6 @@ qemuProcessReconnectHelper(virDomainObjPtr
>> obj, * this early phase. */
>>
>> - virObjectLock(obj); - qemuDomainObjRestoreJob(obj,
>> &data->oldjob);
>>
>> if (qemuDomainObjBeginJob(src->driver, obj, QEMU_JOB_MODIFY) < 0)
>
> Rest of the review will follow once I find a way to solve that issue.
>
I think the best solution will be to refactor the reconnect function
before this change so that it makes more sense.
I'll post patches later today hopefully.
Peter
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20141203/05f7627a/attachment-0001.sig>
More information about the libvir-list
mailing list