[libvirt] [PATCH v2] Close the source fd if the destination qemu exits during tunnelled migration

John Ferlan jferlan at redhat.com
Fri Sep 18 17:04:27 UTC 2015



On 09/14/2015 10:44 AM, Shivaprasad G Bhat wrote:
> Tunnelled migration can hang if the destination qemu exits despite all the
> ABI checks. This happens whenever the destination qemu exits before the
> complete transfer is noticed by source qemu. The savevm state checks at
> runtime can fail at destination and cause qemu to error out.
> The source qemu cant notice it as the EPIPE is not propogated to it.
> The qemuMigrationIOFunc() notices the stream being broken from virStreamSend()
> and it cleans up the stream alone. The qemuMigrationWaitForCompletion() would
> never get to 100% transfer completion.
> The qemuMigrationWaitForCompletion() never breaks out as well since
> the ssh connection to destination is healthy, and the source qemu also thinks
> the migration is ongoing as the Fd to which it transfers, is never
> closed or broken. So, the migration will hang forever. Even Ctrl-C on the
> virsh migrate wouldn't be honoured. Close the source side FD when there is
> an error in the stream. That way, the source qemu updates itself and
> qemuMigrationWaitForCompletion() notices the failure.
> 
> Close the FD for all kinds of errors to be sure. The error message is not
> copied for EPIPE so that the destination error is copied instead later.
> 
> Note:
> Reproducible with repeated migrations between Power hosts running in different
> subcores-per-core modes.
> 
> Changes from v1 -> v2:
>    VIR_FORCE_CLOSE() was called twice for this use case which would log
>    unneccessary warnings. So, move the fd close to qemuMigrationIOFunc
>    so that there are no unnecessary duplicate attempts.(Would this trigger
>    a Coverity error? I don't have a setup to check.)
> 
> Signed-off-by: Shivaprasad G Bhat <sbhat at linux.vnet.ibm.com>
> ---
>  src/qemu/qemu_migration.c |    8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
> index ff89ab5..9602fb2 100644
> --- a/src/qemu/qemu_migration.c
> +++ b/src/qemu/qemu_migration.c
> @@ -4012,6 +4012,7 @@ static void qemuMigrationIOFunc(void *arg)
>      if (virStreamFinish(data->st) < 0)
>          goto error;
>  
> +    VIR_FORCE_CLOSE(data->sock);
>      VIR_FREE(buffer);
>  
>      return;
> @@ -4029,7 +4030,11 @@ static void qemuMigrationIOFunc(void *arg)
>      }
>  
>   error:
> -    virCopyLastError(&data->err);
> +    /* Let the source qemu know that the transfer cant continue anymore.
> +     * Don't copy the error for EPIPE as destination has the actual error. */
> +    VIR_FORCE_CLOSE(data->sock);
> +    if (!virLastErrorIsSystemErrno(EPIPE))
> +        virCopyLastError(&data->err);
>      virResetLastError();
>      VIR_FREE(buffer);
>  }
> @@ -4397,7 +4402,6 @@ qemuMigrationRun(virQEMUDriverPtr driver,
>          if (iothread && qemuMigrationStopTunnel(iothread, ret < 0) < 0)
>              ret = -1;
>      }
> -    VIR_FORCE_CLOSE(fd);

^^^

This causes Coverity to claim a RESOURCE_LEAK

Feels like this was a mistake edit...

The rest of the patch looks reasonable; however, I'm far from the expert
in these matters.

John
>  
>      if (priv->job.completed) {
>          qemuDomainJobInfoUpdateTime(priv->job.completed);
> 
> --
> libvir-list mailing list
> libvir-list at redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list
> 




More information about the libvir-list mailing list