[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] [PATCH 2/5] qemu: Avoid dangling migration-in job on shutoff domains



On 03/19/2012 10:18 AM, Jiri Denemark wrote:
> Destination daemon should not rely on the client or source daemon
> (depending on the type of migration) to call Finish when migration
> fails, because the client may crash before it can do so. The domain
> prepared for incoming migration is set to be destroyed (and migration
> job cleaned up) when connection with the client closes but this is not
> enough. If the associated qemu process crashes after Prepare step and
> the domain is cleaned up before the connection gets closed, autodestroy
> is not called for the domain and migration jobs remains set. In case the
> domain is defined on destination host (i.e., it is not completely
> removed once destroyed) we keep the job set for ever. To fix this, we
> register a cleanup callback which is responsible to clean migration-in
> job when a domain dies anywhere between Prepare and Finish steps. Note
> that we can't blindly clean any job when spotting EOF on monitor since
> normally an API is running at that time.
> ---
>  src/qemu/qemu_domain.c    |    2 --
>  src/qemu/qemu_domain.h    |    2 ++
>  src/qemu/qemu_migration.c |   22 ++++++++++++++++++++++
>  3 files changed, 24 insertions(+), 2 deletions(-)

I'm restating my understanding of the bug, to make sure I am sure why
your patch helps:

- src requests a migration
- dest starts a qemu process using information from the src, but the
destination happens to be running an older qemu that can't support the
full migration
- qemu dies, but the destination hasn't seen a 'Finish' from the source,
so the job remains open and the domain remains
- connection is broken, but the open job prevents reclaiming the
autodestroy domain on the destination
- new connection is made, but source can't migrate because destination
is already locked up on the stale attempt

and the fix is adding a new callback, which says if qemu dies while the
callback is registered, we cancel the migration job; therefore, even
without a 'Finish' from the source, the autodestroy can now kick in

ACK.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]