[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] [PATCH 04/10] qemu: Recover from interrupted migration

On 07/18/2011 06:27 PM, Jiri Denemark wrote:
  src/qemu/qemu_process.c |  110 ++++++++++++++++++++++++++++++++++++++++++++++-
  1 files changed, 109 insertions(+), 1 deletions(-)

  static int
+qemuProcessRecoverMigration(struct qemud_driver *driver,
+                            virDomainObjPtr vm,
+                            virConnectPtr conn,
+                            enum qemuDomainAsyncJob job,
+                            enum qemuMigrationJobPhase phase,
+                            virDomainState state,
+                            int reason)
+    if (job == QEMU_ASYNC_JOB_MIGRATION_IN) {
+        switch (phase) {

Should we reject as impossible the phases that should never be encountered on MIGRATION_IN? For example, QEMU_MIGRATION_PHASE_BEGIN3 belongs to MIGRATION_OUT, so if our job is MIGRATION_IN but we see that phase, we should probably fail rather than return 0.

+            /* migration is still in progress, let's cancel it and resume the
+             * domain */
+            VIR_DEBUG("Canceling unfinished outgoing migration of domain %s",
+                      vm->def->name);
+            /* TODO cancel possibly running migrate operation */

As in issue qemuMonitorMigrateCancel, but ignoring if it fails? Might be reasonable, but probably as a separate patch.

+            /* resume the domain but only if it was paused as a result of
+             * migration */
+            if (state == VIR_DOMAIN_PAUSED&&
+                (reason == VIR_DOMAIN_PAUSED_MIGRATION ||
+                 reason == VIR_DOMAIN_PAUSED_UNKNOWN)) {
+                if (qemuProcessStartCPUs(driver, vm, conn,
+                                         VIR_DOMAIN_RUNNING_UNPAUSED)<  0) {

On the other hand, will the monitor command to restart cpus even work if a pending migration is underway? So we may have to do the qemuMonitorMigrateCancel no matter what, to ensure the monitor will let us resume.

I think what you have works (strict improvement over what we have now), even if it can be further improved with later patches, so:


Eric Blake   eblake redhat com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]