[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] [PATCHv4 4/4] snapshot: qemu: Implement reverting of external snapshots



On 11/19/2012 09:06 AM, Peter Krempa wrote:
> This patch adds support for reverting of external snapshots. The support
> is somewhat limited yet (you can only revert to a snapshot that has no
> children or delete the children that would have their image chains
> invalidated).
> 
> While reverting an external snapshot, the domain has to be taken offline
> as there's no possibility to start a migration from file on a running
> machine. This poses a few difficulties when the user has virt-viewer
> attached as appropriate events need to be re-emitted even if the machine
> doesn't change states.
> ---
> Adapt for the revert flag and a few minor fixes.
> ---
>  src/qemu/qemu_driver.c | 330 ++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 325 insertions(+), 5 deletions(-)
> 
> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> index 2178798..d66551b 100644
> --- a/src/qemu/qemu_driver.c
> +++ b/src/qemu/qemu_driver.c
> @@ -12112,6 +12112,315 @@ qemuDomainSnapshotRevertInactive(struct qemud_driver *driver,
>      return ret > 0 ? -1 : ret;
>  }
> 
> +/* helper to create lifecycle for possible outcomes of reverting snapshots
> + * the reemit parameter is used when specific events need to be reemitted

Punctuation helps; since these two lines are different sentences:

s/snapshots the/snapshots. The/

> + * so that virt-viewer for example is able to re-connect */
> +static int
> +qemuDomainSnapshotCreateEvent(struct qemud_driver *driver,

> +        }
> +        break;
> +
> +    /* fallthrough */

Spurious comment.


> +/* The domain and driver is expected to be locked */
> +static int
> +qemuDomainSnapshotRevertExternal(virConnectPtr conn,
> +                                 struct qemud_driver *driver,
> +                                 virDomainObjPtr vm,
> +                                 virDomainSnapshotObjPtr snap,
> +                                 unsigned int flags)
> +{
> +    int ret = -1;
> +    int loadfd = -1;
> +
> +    virDomainDefPtr save_def = NULL;
> +    struct qemud_save_header header;
> +    virFileWrapperFdPtr wrapperFd = NULL;
> +
> +    virDomainState old_state = virDomainObjGetState(vm, NULL);
> +    virDomainDefPtr config = NULL;
> +
> +    /* check if domain is running and should be running afterwards:
> +     * qemu has to be re-started to allow the migration from file
> +     */
> +    if (!(flags & VIR_DOMAIN_SNAPSHOT_REVERT_RESPAWN) &&
> +        virDomainObjIsActive(vm) &&
> +        !(snap->def->state == VIR_DOMAIN_DISK_SNAPSHOT ||
> +          snap->def->state == VIR_DOMAIN_SHUTOFF)) {
> +        virReportError(VIR_ERR_SNAPSHOT_REVERT_RISKY,
> +                       _("Reverting of snapshot '%s' needs to restart the "
> +                         "hypervisor."), snap->def->name);
> +        goto cleanup;
> +    }

Makes sense.

> +
> +    /* check if snapshot has children */
> +    if (snap->nchildren > 0) {
> +        if (!(flags & VIR_DOMAIN_SNAPSHOT_REVERT_FORCE)) {
> +            virReportError(VIR_ERR_SNAPSHOT_REVERT_RISKY, "%s",
> +                           _("Reverting of snapshots with children "
> +                             "is not implemented yet."));
> +            goto cleanup;
> +        } else {
> +            VIR_WARN("Invalidating image chain of snapshot '%s'.",
> +                     snap->def->name);
> +        }
> +    }

Aha - the difference between --force and --allow-respawn becomes even
more apparent.

> +
> +    if (vm->current_snapshot) {
> +        vm->current_snapshot->def->current = false;
> +        if (qemuDomainSnapshotWriteMetadata(vm, vm->current_snapshot,
> +                                            driver->snapshotDir) < 0)
> +            goto cleanup;
> +        vm->current_snapshot = NULL;
> +        /* XXX Should we restore vm->current_snapshot after this point
> +         * in the failure cases where we know there was no change?  */
> +    }
> +
> +    /* Prepare to copy the snapshot inactive xml as the config of this
> +     * domain.  Easiest way is by a round trip through xml.

I've got a pending patch that may make this easier; depending on which
of us gets something working and pushed first, I can rebase it to cover
this additional call site.

> +     *
> +     * XXX Should domain snapshots track live xml rather
> +     * than inactive xml?  */

I think we can do a subsequent patch that cleans up all instances of
this comment - by now, we should be pretty happy with non-live XML for
external checkpoints, as we have proven that it is sufficient for our
needs, and matches with the fact that we have to respawn qemu (live XML
is only useful when re-using an existing qemu process).

> +    snap->def->current = true;
> +    if (snap->def->dom) {
> +        char *xml;
> +        if (!(xml = qemuDomainDefFormatXML(driver,
> +                                           snap->def->dom,
> +                                           VIR_DOMAIN_XML_INACTIVE |
> +                                           VIR_DOMAIN_XML_SECURE |
> +                                           VIR_DOMAIN_XML_MIGRATABLE)))
> +            goto cleanup;
> +
> +        config = virDomainDefParseString(driver->caps, xml,
> +                                         QEMU_EXPECTED_VIRT_TYPES,
> +                                         VIR_DOMAIN_XML_INACTIVE);
> +        VIR_FREE(xml);
> +        if (!config)
> +            goto cleanup;
> +    } else {
> +        virReportError(VIR_ERR_INTERNAL_ERROR,
> +                       _("Domain definition not found in snapshot '%s'"),
> +                       snap->def->name);
> +        goto cleanup;
> +    }
> +
> +    /* try to open the memory save image if one exists and it's needed */
> +    if (!(flags & VIR_DOMAIN_SNAPSHOT_REVERT_STOPPED) &&
> +        (snap->def->state == VIR_DOMAIN_RUNNING ||
> +         snap->def->state == VIR_DOMAIN_PAUSED ||
> +         flags & VIR_DOMAIN_SNAPSHOT_REVERT_RUNNING ||
> +         flags & VIR_DOMAIN_SNAPSHOT_REVERT_PAUSED)) {
> +
> +        if (snap->def->memory == VIR_DOMAIN_SNAPSHOT_LOCATION_EXTERNAL) {
> +            if (!snap->def->file) {
> +                virReportError(VIR_ERR_INTERNAL_ERROR,
> +                               _("Memory image path does not exist for "
> +                                 "snapshot '%s'"), snap->def->name);
> +                goto cleanup;
> +            }
> +
> +            if (!virFileExists(snap->def->file)) {
> +                virReportError(VIR_ERR_INTERNAL_ERROR,
> +                               _("memory save file '%s' does not exist."),
> +                               snap->def->file);
> +                goto cleanup;
> +            }
> +
> +            /* open the save image file */
> +            if ((loadfd = qemuDomainSaveImageOpen(driver, snap->def->file,
> +                                                  &save_def, &header,
> +                                                  false, &wrapperFd, NULL,
> +                                                  -1, false, false)) < 0)
> +                goto cleanup;
> +
> +            /* opening succeeded, there's a big probability the revert will work.
> +             * We can now get rid of the active qemu, if it runs */
> +        } else {
> +            virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
> +                           _("Can't revert snapshot to running state: "
> +                             "Memory image not found"));

Well, _technically_ we could revert into a running state by reverting
the disk image and then booting it (of course, the guest would end up
doing an fsck or such).  But saving that for a followup patch is okay,
given that we are still doing incremental improvements.

In fact, for a disk-snapshot or offline external snapshot, there is no
memory state to revert to, but that's expected.  The only time where a
missing memory state is unexpected is if the snapshot state was running
or paused.

> +            goto cleanup;
> +        }
> +    }
> +
> +    if (qemuDomainObjBeginJobWithDriver(driver, vm, QEMU_JOB_MODIFY) < 0)
> +        goto cleanup;
> +
> +    /* Now we destroy the (possibly) running qemu process.
> +     * Unfortunately, the guest can be restored only using incoming migration.
> +     */
> +    if (virDomainObjIsActive(vm)) {
> +        qemuProcessStop(driver, vm,
> +                        VIR_DOMAIN_SHUTOFF_FROM_SNAPSHOT, 0);
> +        virDomainAuditStop(vm, "from-snapshot");
> +    }
> +
> +    /* assign domain definition */
> +    virDomainObjAssignDef(vm, config, false);
> +    config = NULL;
> +
> +    /* wipe and re-create disk images - qemu-img doesn't care if it exists*/
> +    if (qemuDomainSnapshotCreateInactiveExternal(driver, vm, snap, false) < 0)
> +        goto endjob;

This comment says you are throwing away state, but without the use of
any flag guarding things that the user meant to throw away state.  I'm a
bit worried that we're missing a flag here to state that we are
reverting to the point of the external snapshot _and_ want to throw away
all changes that happened in the external file.

Also, does qemuDomainSnapshotCreateInactiveExternal(,,,false) actually
do what you want here?  I thought that invocation fails if the file
already exists (and passing true assumes that the file already exists,
but does not otherwise touch it).  Do we instead need to change that
boolean into an 'int', with multiple values (0 and 1 for when the user
first creates the snapshot, depending on whether they passed the
REUSE_EXT flag, and 2 for this function, which says to forcefully recreate)?

> +
> +    /* save the config files */
> +    if (virDomainSaveConfig(driver->configDir, vm->def) < 0) {
> +        virReportError(VIR_ERR_OPERATION_INVALID, "%s",
> +                       _("Write to config file failed"));
> +        goto endjob;
> +    }
> +
> +    /* re-start the guest if necessary */
> +    if (loadfd > 0) {

>= 0 (not that we plan on it ever being libvirtd's stdin, but otherwise
tools like coverity will warn about a possible off-by-one)

> +        if ((ret = qemuDomainSaveImageLoad(conn, driver,
> +                                           vm, &loadfd,
> +                                           &header, snap->def->file)) < 0) {
> +            virDomainAuditStart(vm, "from-snapshot", false);
> +            goto endjob;
> +        }
> +
> +        virDomainAuditStart(vm, "from-snapshot", true);
> +
> +        if ((snap->def->state == VIR_DOMAIN_RUNNING &&
> +             !(flags & VIR_DOMAIN_SNAPSHOT_REVERT_PAUSED)) ||
> +            flags & VIR_DOMAIN_SNAPSHOT_REVERT_RUNNING) {
> +            if (qemuProcessStartCPUs(driver, vm, conn,
> +                                     VIR_DOMAIN_RUNNING_FROM_SNAPSHOT,
> +                                     QEMU_ASYNC_JOB_NONE) < 0)
> +                goto endjob;
> +        }
> +    }
> +
> +    /* create corresponding life cycle events */
> +    if (qemuDomainSnapshotCreateEvent(driver, vm, old_state,
> +                                      virDomainObjGetState(vm, NULL),
> +                                      !!(flags & VIR_DOMAIN_SNAPSHOT_REVERT_RESPAWN)) < 0)
> +        goto endjob;
> +
> +    ret = 0;
> +endjob:
> +    if (qemuDomainObjEndJob(driver, vm) == 0) {
> +        vm = NULL;
> +    } else if (ret < 0 && !vm->persistent) {
> +        qemuDomainRemoveInactive(driver, vm);
> +        vm = NULL;
> +    }
> +
> +cleanup:
> +    virDomainDefFree(config);
> +    virDomainDefFree(save_def);
> +    VIR_FORCE_CLOSE(loadfd);
> +    virFileWrapperFdFree(wrapperFd);
> +

Overall, looks relatively good.

> +    return ret;
> +}
> +
>  static int qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot,
>                                        unsigned int flags)
>  {
> @@ -12128,8 +12437,9 @@ static int qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot,
>      virDomainDefPtr config = NULL;
> 
>      virCheckFlags(VIR_DOMAIN_SNAPSHOT_REVERT_RUNNING |
> -                  VIR_DOMAIN_SNAPSHOT_REVERT_PAUSED |
> -                  VIR_DOMAIN_SNAPSHOT_REVERT_FORCE |
> +                  VIR_DOMAIN_SNAPSHOT_REVERT_PAUSED  |
> +                  VIR_DOMAIN_SNAPSHOT_REVERT_STOPPED |
> +                  VIR_DOMAIN_SNAPSHOT_REVERT_FORCE   |
>                    VIR_DOMAIN_SNAPSHOT_REVERT_RESPAWN, -1);
> 
>      if (flags & VIR_DOMAIN_SNAPSHOT_REVERT_FORCE)
> @@ -12176,12 +12486,21 @@ static int qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot,
>                           "to revert to inactive snapshot"));
>          goto cleanup;
>      }
> -    if (snap->def->state == VIR_DOMAIN_DISK_SNAPSHOT) {
> +
> +    /* Check if reverting an external snapshot */
> +    if (virDomainSnapshotIsExternal(snap)) {
> +        ret = qemuDomainSnapshotRevertExternal(snapshot->domain->conn,
> +                                               driver, vm, snap, flags);
> +        goto cleanup;
> +    }
> +
> +    if (flags & (VIR_DOMAIN_SNAPSHOT_REVERT_STOPPED)) {
>          virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
> -                       _("revert to external disk snapshot not supported "
> -                         "yet"));
> +                      _("VIR_DOMAIN_SNAPSHOT_REVERT_STOPPED is not supported "
> +                        "with internal snapshots"));

Technically, it _could_ be; but as a followup patch (that is, do a
loadvm into paused state, then kill the VM; thus effectively reverting
to _just_ the disk state of that internal checkpoint).

>          goto cleanup;
>      }
> +
>      if (!(flags & VIR_DOMAIN_SNAPSHOT_REVERT_FORCE)) {
>          if (!snap->def->dom) {
>              virReportError(VIR_ERR_SNAPSHOT_REVERT_RISKY,
> @@ -12233,6 +12552,7 @@ static int qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot,
>              goto cleanup;
>      }
> 
> +    /* Internal snapshot revert code starts below */
>      if (qemuDomainObjBeginJobWithDriver(driver, vm, QEMU_JOB_MODIFY) < 0)
>          goto cleanup;
> 

I'm still the most worried about the concept of whether we are blindly
discarding user disk state since the snapshot was created, and whether
we are properly recreating external files if that's what the user really
wanted.  I'm not sure whether to say ACK and then fix the fallout, or to
wait a bit longer to see what else you come up with in the series.  I
guess we still have a few more development days before the 1.0.1 freeze
to decide, so for now, I'd like to see what you have in store for
snapshot deletion, and to also get my patches posted for snapshot
revert-and-create, before I decide on this one.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]