[Date Prev][Date Next] [Thread Prev][Thread Next]
Re: [libvirt] RFC: mirrored live block migration in libvirt 0.9.11
- From: Paolo Bonzini <bonzini gnu org>
- To: Eric Blake <eblake redhat com>
- Cc: "libvir-list redhat com" <libvir-list redhat com>
- Subject: Re: [libvirt] RFC: mirrored live block migration in libvirt 0.9.11
- Date: Wed, 14 Mar 2012 09:16:20 +0100
Il 13/03/2012 23:20, Eric Blake ha scritto:
> virDomainSnapshotCreateXML will learn a new flag:
> VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC. If this flag is present, then
> libvirt guarantees that the snapshot operation will either succeed, or
> that failure will be reported without changing domain XML or qemu
> runtime state. If present, the creation API will fail if qemu lacks the
> 'transaction' command and more than one disk snapshot was requested in
> the <domainsnapshot> XML. If this flag is not present, then libvirt
> will use 'transaction' if available, but fall back to
> 'blockdev-snapshot-sync', so that it works with older qemu, but where
> the caller then has to check virDomainGetXMLDesc on failure to see if a
> partial snapshot occurred. This flag will be implied by any other part
> of the API that requires the use of 'transaction'.
> The VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT flag was added to
> virDomainSnapshotCreateXML in 0.9.10, with semantics that it would stop
> libvirt from complaining if a regular file already existed as the
> snapshot destination, but without interacting with qemu, which would
> blindly overwrite the contents of that file. Since this flag is
> relatively new, and has not had much use, I propose to slightly alter
> its documented semantics to now interact with the qemu 1.1 feature being
> added as part of 'transaction'. If qemu supports 'transaction', then
> presence of this flag implies that libvirt will explicitly request
> 'mode':'existing' for each snapshot, which tells qemu to open the
> existing file without writing any new metadata, and that the caller is
> responsible to ensure that the file has identical guest contents
> (generally by creating a qcow2 file with the current file as backing
> image and no additional contents). Additionally, libvirt will now
> require the file to already exist (in 0.9.10, libvirt silently ignored
> the fact if the flag was requested but the file did not exist).
> Presence of the flag without qemu support for 'transaction' will now
> fail (that is, VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT will now imply
Also looks ok.
Absence of the flag means that
> libvirt will rely on qemu's default to 'mode':'absolute-paths', and will
> require that the file does not exist as a regular file; this maps to
> qemu 1.0 always writing a new qcow2 header with absolute backing file
> name. If we want to later expose additional modes, like
> 'no-backing-file', it would be done via per-<disk> annotations in the
> <domainsnapshot> XML rather than via new flags, but for this proposal, I
> think oVirt is okay using the flag to set a single policy for all disks
> mentioned in a given snapshot request.
> virDomainSnapshotCreateXML's xml argument, <domainsnapshot>, will learn
> an optional <mirror> sub-element to each <disk>. While the
> 'transaction' command supports multiple mirrors in one transaction, for
> now, libvirt will enforce at most one mirror, which should be sufficient
> for oVirt's needs. (Adding more support for the rest of the power of
> 'transaction' is probably best left for new libvirt API, but that's
> outside the scope of this proposal). As an example,
> <disk name='/src/base.img' snapshot='external'>
> <source file='/src/snap.img'/>
> <mirror file='/dest/snap.img'/>
> would create a new libvirt snapshot object with /src/snap.img as the
> read-write new image, and /dest/snap.img as the new write-only mirror.
> On success, this rewrites the domain's live XML to point to
> /src/snap.img as its current file.
This is an awfully low-level API; you're designing for oVirt rather than
for everything else. The problem here is twofold:
1) you're defining a snapshot that cannot be started without losing the
2) in case the snapshotting is aborted early for any reason, oVirt has
to do a rebase operation manually. This is currently O(size-of-disk),
not O(changes-in-the-last-image), so it wastes both disk space and time.
If it works, I cannot really say "don't do it", but I think the oVirt
mirrored snapshots idea is a dead-end and a workaround for lack of block
device streaming (which is now supported). You could have a simpler,
high-level API based on streaming rather than snapshotting. So, if you
have /src/disk.img as your image, you would have a new API:
which would do all that is needed:
- start mirroring writes to /dst/disk.img; no snapshotting needed. A
flag VIR_DOMAIN_BLOCK_COPY_REUSE_EXT would let you specify the
"existing" mode. Another flag VIR_DOMAIN_BLOCK_COPY_CREATE_RAW would
use the raw format on the destination and specify the no-backing-file
mode (of course only valid if base == NULL).
- call virDomainBlockRebase(dom, "disk", "/src/base.img", bandwidth, 0)
to start the streaming job.
If something doesn't work here, it's a QEMU bug.
> Finally, virDomainSnapshotDelete will learn a new flag,
> VIR_DOMAIN_SNAPSHOT_DELETE_REOPEN_MIRROR, which says that the libvirt
> snapshot object will be deleted, but only after first calling the qemu
> 'drive-reopen' monitor command for all disks that had a <mirror> in the
> associated snapshot object. That is, for the above example, this would
> reopen the disk from it's current read-write of /src/snap.img over to
> the second storage domain's /dest/snap.img with it's accompanying
> mirrored backing chain. On success, this rewrites the domain's live XML
> to point to the just-opened mirror location. This flag will fail if the
> libvirt snapshot being deleted is not the current image, or if the
> snapshot being deleted does not have any mirrored disks.
I think you also need VIR_DOMAIN_SNAPSHOT_DELETE_REMOVE_MIRROR, to be
used in case of abort so that the domain can actually be started. Or it
could be an event MIRROR_DROPPED or something like that.
[Date Prev][Date Next] [Thread Prev][Thread Next]