[libvirt] RFC: API additions for enhanced snapshot support

Tue Jul 5 21:51:26 UTC 2011

On 07/05/2011 04:02 AM, Stefan Hajnoczi wrote:
> On Tue, Jul 5, 2011 at 2:53 AM, Jagane Sundar <jagane at sundar.org> wrote:
>>> /* Create a snapshot of a storage volume.  XML is optional, if non-NULL,
>>>  * it would be a new top-level element<volsnapshot>  which is similar to
>>>  * the top-level<domainsnapshot>  for virDomainSnapshotCreateXML, to
>>>  * specify name and description. Flags is 0 for now.
>>>  */
>>> virStorageVolSnapshotPtr virDomainSnapshotCreateXML(
>>>     virStorageVolPtr vol, const char *xml, unsigned int flags);
>>>
>> There are two types of snapshots that I am aware of:
>> - Base file is left unmodified after snapshot, snapshot file is created and
>> modified. e.g. qcow2 (I think)
> 
> More detail on this approach as implemented by QEMU's snapshot_blkdev:
> 
> Create snapshot.qcow2 with base.img as backing file.  base.img is now
> read-only and can be accessed as a "snapshot".  All writes go to
> snapshot.qcow2.
> 
> When the snapshot is no longer needed it is necessary to merge the COW
> data back into base.img before deleting snapshot.qcow2.

or, to merge all of base.img into snapshot.qcow2 then change
snapshot.qcow2 to no longer have a backing file, before deleting base.img.

As I understand it, either file can be deleted when a snapshot is no
longer needed, but having the flexibility to decide which of the two
files to delete would be useful, and may require knowing how dirty the
snapshot file is in relation to the original file (if it is 95% dirty,
it is faster to just pull in the last few blocks from base into snapshot
before deleting base, whereas if it is only 5% dirty, it is faster to
sync the dirtied blocks from snapshot back to base before deleting
snapshot).  And even if you have control over which of the two images to
delete, you may also want to have control over the final filename used
for the merged image (that is, in the 5% dirty case, use the
snapshot->base merge followed by rename(base,snapshot), rather than
wasting time on the base->snapshot merge, to still get the end result
that the final filename is snapshot).

>  This merge
> has not been implemented in QEMU yet.

Not to mention that it overlaps somewhat with the concept of live block
copying.

> 
>> - Base file continues to be modified. The snapshot file gets COW blocks
>> copied into it. e.g. LVM, Livebackup, etc.
>>
>> Can we enhance the libvirt API to indicate what type of snapshot is desired.
>> Also, when a snapshot is listed, can we try and describe it as one kind or
>> the other?
> 
> I think the snapshot mechanism will depend on your storage backend.
> If the disk image is an LVM volume, then it is natural for the
> snapshot to be an LVM snapshot.  If the disk image is a qcow2 file,
> then it is natural for the snapshot be a QEMU snapshot_blkdev
> snapshot.

What if it is both at once?  That is, it is possible to create an LVM
partition whose contents are a qcow2 image.  In that case, it seems like
the user might want the flexibility to determine whether the snapshot is
done at the qcow2 level or at the LVM level.

> 
> Also, it is often not possible to mix these snapshot mechanisms.  For
> example, LVM snapshots don't work on qcow2 image files.

Why not?  They might not be as space-efficient (the whole idea of LVM
cloning is that each block of the original LVM partition is now
COW-shared between multiple partitions, and that the backup partition
only consumes as additional space according to the amount of blocks that
get dirtied in the original partition), but I'm not seeing a technical
reason that would prohibit them (and I welcome evidence to the contrary,
so that I know more about what I am up against).

> 
> Does the application have to be aware of which snapshotting approach
> is used by the backend?  Perhaps there are a few cases where it is
> technically possible to mix-and-match but it just seems to expose
> complexity without much gain.
> 
> Put another way: "If a storage backend fundamentally doesn't support
> snapshotting the you like, use a different backend".

So is this an accurate summary of your suggestion?

vir{StorageVol,Domain}SnapshotGetXMLDesc should have a sub-element or
attribute stating which method of snapshotting is in use, but other than
telling you about the method, libvirt doesn't expose any further control
over the matter (each disk gets snapshotted in the most efficient manner
for that disk, given the constraints of the storage pool [directory vs.
LVM partition] and image type [raw vs. qcow2 vs. qed] involved in that
storage volume).

> 
> Yes, dirty bitmap support is important.  This will make backup much
> more efficient on storage backends that support it.
> 
> For QEMU image files it will be possible to provide dirty block
> information in the future.  btrfs and a SAN appliance that I have
> looked both have mechanisms that could be used to provide dirty block
> tracking.

-- 
Eric Blake   eblake at redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 619 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20110705/e6b8b0f1/attachment-0001.sig>