[libvirt] [PATCH v2 02/15] Add public APIs for post-copy migration

Jiri Denemark jdenemar at redhat.com
Fri Jan 22 15:22:08 UTC 2016


On Fri, Jan 22, 2016 at 15:10:16 +0000, Daniel P. Berrange wrote:
> On Thu, Jan 21, 2016 at 11:20:47AM +0100, Jiri Denemark wrote:
> > From: Cristian Klein <cristiklein at gmail.com>
> > 
> > To use post-copy one has to start the migration with
> > VIR_MIGRATE_POSTCOPY flag and, while migration is in progress, call
> > virDomainMigrateStartPostCopy() to switch from pre-copy to post-copy.
> > 
> > Signed-off-by: Cristian Klein <cristiklein at gmail.com>
> > Signed-off-by: Jiri Denemark <jdenemar at redhat.com>
> > ---
> > 
> > Notes:
> >     Version 2:
> >     - POSTCOPY_AFTER_PRECOPY flag removed
> > 
> >  include/libvirt/libvirt-domain.h |   4 ++
> >  src/driver-hypervisor.h          |   5 ++
> >  src/libvirt-domain.c             | 114 +++++++++++++++++++++++++++++++++++++++
> >  src/libvirt_public.syms          |   4 ++
> >  src/remote/remote_driver.c       |   1 +
> >  src/remote/remote_protocol.x     |  13 ++++-
> >  src/remote_protocol-structs      |   5 ++
> >  7 files changed, 145 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h
> > index 50342d2..ccbf6a7 100644
> > --- a/include/libvirt/libvirt-domain.h
> > +++ b/include/libvirt/libvirt-domain.h
> > @@ -662,6 +662,7 @@ typedef enum {
> >      VIR_MIGRATE_ABORT_ON_ERROR    = (1 << 12), /* abort migration on I/O errors happened during migration */
> >      VIR_MIGRATE_AUTO_CONVERGE     = (1 << 13), /* force convergence */
> >      VIR_MIGRATE_RDMA_PIN_ALL      = (1 << 14), /* RDMA memory pinning */
> > +    VIR_MIGRATE_POSTCOPY          = (1 << 15), /* enable (but do not start) post-copy migration */
> >  } virDomainMigrateFlags;
> >  
> >  
> > @@ -810,6 +811,9 @@ int virDomainMigrateGetMaxSpeed(virDomainPtr domain,
> >                                  unsigned long *bandwidth,
> >                                  unsigned int flags);
> >  
> > +int virDomainMigrateStartPostCopy(virDomainPtr domain,
> > +                                  unsigned int flags);
> > +
> >  char * virConnectGetDomainCapabilities(virConnectPtr conn,
> >                                         const char *emulatorbin,
> >                                         const char *arch,
> > diff --git a/src/driver-hypervisor.h b/src/driver-hypervisor.h
> > index ae2ec4d..68a7730 100644
> > --- a/src/driver-hypervisor.h
> > +++ b/src/driver-hypervisor.h
> > @@ -638,6 +638,10 @@ typedef int
> >                                      const char *dom_xml);
> >  
> >  typedef int
> > +(*virDrvDomainMigrateStartPostCopy)(virDomainPtr domain,
> > +                                    unsigned int flags);
> > +
> > +typedef int
> >  (*virDrvConnectIsEncrypted)(virConnectPtr conn);
> >  
> >  typedef int
> > @@ -1443,6 +1447,7 @@ struct _virHypervisorDriver {
> >      virDrvDomainGetFSInfo domainGetFSInfo;
> >      virDrvDomainInterfaceAddresses domainInterfaceAddresses;
> >      virDrvDomainSetUserPassword domainSetUserPassword;
> > +    virDrvDomainMigrateStartPostCopy domainMigrateStartPostCopy;
> >  };
> >  
> >  
> > diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c
> > index 9491845..676f0f7 100644
> > --- a/src/libvirt-domain.c
> > +++ b/src/libvirt-domain.c
> > @@ -3537,6 +3537,7 @@ virDomainMigrateUnmanaged(virDomainPtr domain,
> >   *                                 automatically when supported).
> >   *   VIR_MIGRATE_UNSAFE    Force migration even if it is considered unsafe.
> >   *   VIR_MIGRATE_OFFLINE Migrate offline
> > + *   VIR_MIGRATE_POSTCOPY Enable (but do not start) post-copy
> >   *
> >   * VIR_MIGRATE_TUNNELLED requires that VIR_MIGRATE_PEER2PEER be set.
> >   * Applications using the VIR_MIGRATE_PEER2PEER flag will probably
> > @@ -3573,6 +3574,11 @@ virDomainMigrateUnmanaged(virDomainPtr domain,
> >   * not support this feature and will return an error if bandwidth
> >   * is not 0.
> >   *
> > + * Enabling the VIR_MIGRATE_POSTCOPY flag tells libvirt to enable post-copy
> > + * migration.  Use virDomainMigrateStartPostCopy to switch migration into
> > + * the post-copy mode.  See virDomainMigrateStartPostCopy for more details
> > + * about post-copy.
> > + *
> >   * To see which features are supported by the current hypervisor,
> >   * see virConnectGetCapabilities, /capabilities/host/migration_features.
> >   *
> > @@ -3748,6 +3754,7 @@ virDomainMigrate(virDomainPtr domain,
> >   *                                 automatically when supported).
> >   *   VIR_MIGRATE_UNSAFE    Force migration even if it is considered unsafe.
> >   *   VIR_MIGRATE_OFFLINE Migrate offline
> > + *   VIR_MIGRATE_POSTCOPY Enable (but do not start) post-copy
> >   *
> >   * VIR_MIGRATE_TUNNELLED requires that VIR_MIGRATE_PEER2PEER be set.
> >   * Applications using the VIR_MIGRATE_PEER2PEER flag will probably
> > @@ -3784,6 +3791,11 @@ virDomainMigrate(virDomainPtr domain,
> >   * not support this feature and will return an error if bandwidth
> >   * is not 0.
> >   *
> > + * Enabling the VIR_MIGRATE_POSTCOPY flag tells libvirt to enable post-copy
> > + * migration.  Use virDomainMigrateStartPostCopy to switch migration into
> > + * the post-copy mode.  See virDomainMigrateStartPostCopy for more details
> > + * about post-copy.
> > + *
> >   * To see which features are supported by the current hypervisor,
> >   * see virConnectGetCapabilities, /capabilities/host/migration_features.
> >   *
> > @@ -3968,6 +3980,11 @@ virDomainMigrate2(virDomainPtr domain,
> >   * can use either VIR_MIGRATE_NON_SHARED_DISK or
> >   * VIR_MIGRATE_NON_SHARED_INC as they are mutually exclusive.
> >   *
> > + * Enabling the VIR_MIGRATE_POSTCOPY flag tells libvirt to enable post-copy
> > + * migration.  Use virDomainMigrateStartPostCopy to switch migration into
> > + * the post-copy mode.  See virDomainMigrateStartPostCopy for more details
> > + * about post-copy.
> > + *
> >   * There are many limitations on migration imposed by the underlying
> >   * technology - for example it may not be possible to migrate between
> >   * different processors even with the same architecture, or between
> > @@ -4208,6 +4225,7 @@ int virDomainMigrateUnmanagedCheckCompat(virDomainPtr domain,
> >   *                                 automatically when supported).
> >   *   VIR_MIGRATE_UNSAFE    Force migration even if it is considered unsafe.
> >   *   VIR_MIGRATE_OFFLINE Migrate offline
> > + *   VIR_MIGRATE_POSTCOPY Enable (but do not start) post-copy
> >   *
> >   * The operation of this API hinges on the VIR_MIGRATE_PEER2PEER flag.
> >   * If the VIR_MIGRATE_PEER2PEER flag is NOT set, the duri parameter
> > @@ -4240,6 +4258,11 @@ int virDomainMigrateUnmanagedCheckCompat(virDomainPtr domain,
> >   * not support this feature and will return an error if bandwidth
> >   * is not 0.
> >   *
> > + * Enabling the VIR_MIGRATE_POSTCOPY flag tells libvirt to enable post-copy
> > + * migration.  Use virDomainMigrateStartPostCopy to switch migration into
> > + * the post-copy mode.  See virDomainMigrateStartPostCopy for more details
> > + * about post-copy.
> > + *
> >   * To see which features are supported by the current hypervisor,
> >   * see virConnectGetCapabilities, /capabilities/host/migration_features.
> >   *
> > @@ -4321,6 +4344,7 @@ virDomainMigrateToURI(virDomainPtr domain,
> >   *                                 automatically when supported).
> >   *   VIR_MIGRATE_UNSAFE    Force migration even if it is considered unsafe.
> >   *   VIR_MIGRATE_OFFLINE Migrate offline
> > + *   VIR_MIGRATE_POSTCOPY Enable (but do not start) post-copy
> >   *
> >   * The operation of this API hinges on the VIR_MIGRATE_PEER2PEER flag.
> >   *
> > @@ -4366,6 +4390,11 @@ virDomainMigrateToURI(virDomainPtr domain,
> >   * not support this feature and will return an error if bandwidth
> >   * is not 0.
> >   *
> > + * Enabling the VIR_MIGRATE_POSTCOPY flag tells libvirt to enable post-copy
> > + * migration.  Use virDomainMigrateStartPostCopy to switch migration into
> > + * the post-copy mode.  See virDomainMigrateStartPostCopy for more details
> > + * about post-copy.
> > + *
> >   * To see which features are supported by the current hypervisor,
> >   * see virConnectGetCapabilities, /capabilities/host/migration_features.
> >   *
> > @@ -4446,6 +4475,11 @@ virDomainMigrateToURI2(virDomainPtr domain,
> >   * can use either VIR_MIGRATE_NON_SHARED_DISK or
> >   * VIR_MIGRATE_NON_SHARED_INC as they are mutually exclusive.
> >   *
> > + * Enabling the VIR_MIGRATE_POSTCOPY flag tells libvirt to enable post-copy
> > + * migration.  Use virDomainMigrateStartPostCopy to switch migration into
> > + * the post-copy mode.  See virDomainMigrateStartPostCopy for more details
> > + * about post-copy.
> > + *
> >   * There are many limitations on migration imposed by the underlying
> >   * technology - for example it may not be possible to migrate between
> >   * different processors even with the same architecture, or between
> > @@ -9163,6 +9197,86 @@ virDomainMigrateGetMaxSpeed(virDomainPtr domain,
> >  
> >  
> >  /**
> > + * virDomainMigrateStartPostCopy:
> > + * @domain: a domain object
> > + * @flags: extra flags; not used yet, so callers should always pass 0
> > + *
> > + * Starts post-copy migration. This function has to be called while
> > + * migration (initiated with VIR_MIGRATE_POSTCOPY flag) is in progress.
> > + *
> > + * Traditional post-copy migration iteratively walks through guest memory
> > + * pages and migrates those that changed since the previous iteration. The
> > + * iterative phase stops when the number of dirty pages is low enough so that
> > + * the virtual CPUs can be paused, all dirty pages transferred to the
> > + * destination, where the virtual CPUs are unpaused, and all this can happen
> > + * within a predefined downtime period. It's clear that this process may never
> > + * converge if downtime is too short and/or the guest keeps changing a lot of
> > + * memory pages.
> > + *
> > + * When migration is switched to post-copy mode, the virtual CPUs are paused
> > + * immediately, only a minimum set of pages is transferred, and the CPUs are
> > + * unpaused on destination. The source keeps sending all remaining memory pages
> > + * to the destination while the guest is already running there. Whenever the
> > + * guest tries to read a memory page which has not been migrated yet, the
> > + * hypervisor has to tell the source to transfer that page in a priority
> > + * channel. To minimize such page faults, it is a good idea to run at least one
> > + * iteration of pre-copy migration before switching to post-copy.
> > + *
> > + * Post-copy migration is guaranteed to converge since each page is transferred
> > + * at most once no matter how fast it changes. On the other hand once the
> > + * guest is running on the destination host, the migration can no longer be
> > + * rolled back because none of the hosts has complete state. If this happens,
> > + * libvirt will leave the domain paused on both hosts with
> > + * VIR_DOMAIN_PAUSED_POSTCOPY_FAILED reason. It's up to the upper layer to
> > + * decide what to do in such case.
> > + *
> > + * The following domain life cycle events are emitted during post-copy
> > + * migration:
> > + *  VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY (on the source) -- migration entered
> > + *      post-copy mode.
> > + *  VIR_DOMAIN_EVENT_RESUMED_POSTCOPY (on the destination) -- the guest is
> > + *      running on the destination host while some of its memory pages still
> > + *      remain on the source host; neither the source nor the destination host
> > + *      contain a complete guest state from this point until migration
> > + *      finishes.
> > + *  VIR_DOMAIN_EVENT_RESUMED_MIGRATED (on the destination),
> > + *  VIR_DOMAIN_EVENT_STOPPED_MIGRATED (on the source) -- migration finished
> > + *      successfully and the destination host holds a complete guest state.
> > + *  VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY_FAILED (on either side) -- emitted
> > + *      when migration fails in post-copy mode and it's unclear whether any
> > + *      of the hosts has a complete guest state.
> 
> You say either side, but IMHO issuing a suspended event when we are already
> in a suspended state is not correct, as the VM is not undergoing a lifecycle
> change in that scenario - you're essentially just using the reason field as
> a way to report failure which I don't think we should really do.
> 
> So IMHO POSTCOPY_FAILED is only something to emit on the target.
> 
> The domain job on the source will show the migration failure in any
> case, so I don't think its important on the source

Thinking about it a bit more... you're right, setting POSTCOPY_FAILED
reason for PAUSED state makes sense even on the source, but the event is
redundant there.

Jirka




More information about the libvir-list mailing list