[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] [PATCH] qemu: Fix post-copy migration on the source



On Fri, Nov 16, 2018 at 02:16:33PM +0100, Jiri Denemark wrote:
Post-copy migration has been broken on the source since commit
v3.8.0-245-g32c29f10db which implemented support for
pause-before-switchover QEMU migration capability.

Even though the migration itself went well, the source did not really
know when it switched to the post-copy mode despite the messages logged
by MIGRATION event handler. As a result of this, the events emitted by
source libvirtd were not accurate and statistics of the completed
migration would cover only the pre-copy part of migration. Moreover, if
migration failed during the post-copy phase for some reason, the source
libvirtd would just happily resume the domain, which could lead to disk
corruption.

With the pause-before-switchover capability enabled, the order of events
emitted by QEMU changed:

                   pause-before-switchover
          disabled                        enabled
   MIGRATION, potcopy-active       STOP

s/pot/post/

But I guess it's just a matter of time until someone invents a
virtualization technology using plants, pots and gardening for the
terminology.

   STOP                            MIGRATION, pre-switchover
                                   MIGRATION, postcopy-active

The STOP even handler checks the migration status (postcopy-active) and
sets the domain state accordingly. Which is sufficient when
pause-before-switchover is disabled, but once we enable it, the
migration status is still active when we get STOP from QEMU. Thus the
domain state set in the STOP handler has to be corrected once we are
notified that migration changed to postcopy-active.

This results in two SUSPENDED events to be emitted by the source
libvirtd during post-copy migration. The first one with
VIR_DOMAIN_EVENT_SUSPENDED_MIGRATED detail, while the second one reports
the corrected VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY detail. This is
inevitable because we don't know whether migration will eventually
switch to post-copy at the time we emit the first event.

https://bugzilla.redhat.com/show_bug.cgi?id=1647365

Signed-off-by: Jiri Denemark <jdenemar redhat com>
---
src/qemu/qemu_process.c | 26 +++++++++++++++++++++++++-
1 file changed, 25 insertions(+), 1 deletion(-)


Reviewed-by: Ján Tomko <jtomko redhat com>

Jano

Attachment: signature.asc
Description: PGP signature


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]