[libvirt] [PATCHv2 2/3] util: fix domain object leaks on closecallbacks

Wed Jan 18 23:34:50 UTC 2017

On 01/10/2017 01:23 AM, Wang King wrote:
> Suppose that we have two hosts A and B, migrate VM from A to B
> by virDomainMigrateToURI2.
> 1.qemuProcessLaunch was been called on host B with
> VIR_QEMU_PROCESS_START_AUTODESTROY flag, and VM's object
> reference was been increased in virCloseCallbacksSet called by
> qemuProcessAutoDestroyAdd.
> 
> 2. Restart host A's libvirtd service to interrupt migration job,
> virCloseCallbacksRun was been called on host B by qemuConnectClose,

I assume: s/was been/is (there's 3 occurrences).

> VM's virDriverCloseDef struct was been removed from connection
> callback list before execute qemuProcessAutoDestroy callback function.
> 
> 3. Then qemuProcessAutoDestroy was been called on host B to destroy
> the transient VM.
> -->qemuProcessAutoDestroy
>   -->qemuProcessStop
>     -->qemuProcessAutoDestroyRemove
>       -->virCloseCallbacksUnset
> 
> At last, VM's object reference has been decreased in
> virCloseCallbacksUnset expectably, however VM's virDriverCloseDef
> struct cannot be found in callback list[2] lead to VM's object leak.

I'm not sure I'm following your description... So, restated in order to
help me understand:

1. Host A domain migrates to Host B

2. Host B adds a closeCallback during qemuProcessLaunch

3. Host A's libvirtd is restarted causing Host B to destroy the VM

4. Host B calls virCloseCallbacksRun in order to run closeCallbacks for
the domain from qemuConnectClose.

5. Host B creates the look-side list by removing the entry from
closeCallbacks and placing it on the look-aside list for "later" processing.

6. Host B removes the closeCallbacks lock

7. Host B runs through the list->nentries, searches by UUID for the
domain, returning with a locked domain

8. Host B calls the callback function qemuProcessAutoDestroy

9. During that processing the call to qemuProcessAutoDestroyRemove will
fail because this UUID isn't on that list, but that's no big deal since
the failure is ignored.

10. After returning from the callback there's a call to virObjectUnlock

However, we still own a reference on the domain because
virCloseCallbacksUnset that wasn't run would virObjectUnref(vm)

BUT, if you move the list, I believe you run into the possibility of
running the callback function twice (one may cause an error).

So I'm not convinced yet that moving the list will do as you claim and
resolve the object leak.
> 
> Signed-off-by: Wang King <king.wang at huawei.com>
> ---
>  src/util/virclosecallbacks.c | 21 +++++++++++----------
>  1 file changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/src/util/virclosecallbacks.c b/src/util/virclosecallbacks.c
> index 1fa9596..633b22c 100644
> --- a/src/util/virclosecallbacks.c
> +++ b/src/util/virclosecallbacks.c
> @@ -331,17 +331,9 @@ virCloseCallbacksRun(virCloseCallbacksPtr closeCallbacks,
>  
>      virObjectLock(closeCallbacks);
>      list = virCloseCallbacksGetForConn(closeCallbacks, conn);
> -    if (!list) {
> -        virObjectLock(closeCallbacks);
> +    virObjectLock(closeCallbacks);
> +    if (!list)

Something's not right here. Has this been tested using the scenario
described? You're taking out the closeCallbacks lock twice and
attempting to call the closeCallback with the lock.

So let's assume the Lock should be an Unlock...

I don't see how moving the list to after resolves the leak and
furthermore I think it causes more problems.

>          return;
> -    }
> -
> -    for (i = 0; i < list->nentries; i++) {
> -        char uuidstr[VIR_UUID_STRING_BUFLEN];
> -        virUUIDFormat(list->entries[i].uuid, uuidstr);
> -        virHashRemoveEntry(closeCallbacks->list, uuidstr);
> -    }
> -    virObjectUnlock(closeCallbacks);
>  
>      for (i = 0; i < list->nentries; i++) {
>          virDomainObjPtr vm;
> @@ -358,6 +350,15 @@ virCloseCallbacksRun(virCloseCallbacksPtr closeCallbacks,
>          if (vm)
>              virObjectUnlock(vm);

In order to resolve the leak, it would seem that after the callback has
been run we can run virObjectUnlock(); however, doing so on 'vm' is a
no-op (at least for the path you're concerned about) since the end of
qemuProcessAutoDestroy there is:

virDomainObjEndAPI(&dom);
return dom;

The *EndAPI will set *dom = NULL;, which means we return NULL, which
means that virObjectUnlock generally doesn't do what it claims I think.

So wouldn't a virUnrefObject(vm); do what you expect?  That is unref the
vm just like virCloseCallbacksUnset would have done when it removed the
callback from the list.

So not only does this code remove/process the callback, but it reduces
the refcount when it's done.

John
>      }
> +
> +    virObjectLock(closeCallbacks);
> +    for (i = 0; i < list->nentries; i++) {
> +        char uuidstr[VIR_UUID_STRING_BUFLEN];
> +        virUUIDFormat(list->entries[i].uuid, uuidstr);
> +        virHashRemoveEntry(closeCallbacks->list, uuidstr);
> +    }
> +    virObjectUnlock(closeCallbacks);
> +
>      VIR_FREE(list->entries);
>      VIR_FREE(list);
>  }
>