[libvirt] [PATCH] Fix race condition reconnecting to vms & loading configs

John Ferlan jferlan at redhat.com
Mon Oct 28 13:17:16 UTC 2013


On 10/28/2013 07:52 AM, Daniel P. Berrange wrote:
> From: "Daniel P. Berrange" <berrange at redhat.com>
> 
> The following sequence
> 
>  1. Define a persistent QMEU guest

s/QMEU/QEMU

>  2. Start the QEMU guest
>  3. Stop libvirtd
>  4. Kill the QEMU process
>  5. Start libvirtd
>  6. List persistent guets

s/guets/guests

> 
> At the last step, the previously running persistent guest
> will be missing. This is because of a race condition in the
> QEMU driver startup code. It does
> 
>  1. Load all VM state files
>  2. Spawn thread to reconnect to each VM
>  3. Load all VM config files
> 
> Only at the end of step 3, does the 'virDomainObjPtr' get
> marked as "persistent". There is therefore a window where
> the thread reconnecting to the VM will remove the persistent
> VM from the list.
> 
> The easy fix is to simply switch the order of steps 2 & 3.
> 
> Signed-off-by: Daniel P. Berrange <berrange at redhat.com>
> ---
>  src/qemu/qemu_driver.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 

The fix seems reasonable, although I immediately wondered why "at some
time" it was considered OK to reconnect before being persistent flag was
set for inactive guests. The condition/fault initially described
includes a host reboot in the processing. In that case (I would assume)
the restart of the guest would occur if autostart was set. The external
action of the kill()'ing of a guest outside of libvirtd's control
results in some unknown/unpredictable state for the guest. Is there
something in that initial load that could detect this condition better?
I tried following the steps without the patch, but on my host the guest
was listed after the restart - so yes a timing condition - but what
causes that timing condition.

Would setting the dom->persistent before the virObjectUnlock(dom) in
virDomainObjListLoadAllConfigs() change the results?

Beyond that keeping the virConnectOpen() and qemuProcessReconnectAll()
"together" after the loading of the inactive persistent configs seems to
keep code flow more normal.  Whether that comes before or after the
Snapshot/ManagedSave load is I suppose just an "implementation detail".

Also, other drivers follow the load running, reconnect, and load
inactive/persistent configs. Should those have similar patches as well?

John


> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> index c613967..9c3daad 100644
> --- a/src/qemu/qemu_driver.c
> +++ b/src/qemu/qemu_driver.c
> @@ -816,8 +816,6 @@ qemuStateInitialize(bool privileged,
>  
>      conn = virConnectOpen(cfg->uri);
>  
> -    qemuProcessReconnectAll(conn, qemu_driver);
> -
>      /* Then inactive persistent configs */
>      if (virDomainObjListLoadAllConfigs(qemu_driver->domains,
>                                         cfg->configDir,
> @@ -828,6 +826,7 @@ qemuStateInitialize(bool privileged,
>                                         NULL, NULL) < 0)
>          goto error;
>  
> +    qemuProcessReconnectAll(conn, qemu_driver);
>  
>      virDomainObjListForEach(qemu_driver->domains,
>                              qemuDomainSnapshotLoad,
> 




More information about the libvir-list mailing list