[libvirt] Overview of libvirt incremental backup API, part 2 (incremental/differential pull mode)

Fri Oct 5 04:57:57 UTC 2018

On 10/4/18 12:05 AM, Eric Blake wrote:
> The following (long) email describes a portion of the work-flow of how 
> my proposed incremental backup APIs will work, along with the backend 
> QMP commands that each one executes.  I will reply to this thread with 
> further examples (the first example is long enough to be its own email). 
> This is an update to a thread last posted here:
> https://www.redhat.com/archives/libvir-list/2018-June/msg01066.html
> 

> More to come in part 2.
> 

- Second example: a sequence of incremental backups via pull model

In the first example, we did not create a checkpoint at the time of the 
full pull. That means we have no way to track a delta of changes since 
that point in time. Let's repeat the full backup (reusing the same 
backup.xml from before), but this time, we'll add a new parameter, a 
second XML file for describing the checkpoint we want to create.

Actually, it was easy enough to get virsh to write the XML for me 
(because it was very similar to existing code in virsh that creates XML 
for snapshot creation):

$ $virsh checkpoint-create-as --print-xml $dom check1 testing \
    --diskspec sdc --diskspec sdd | tee check1.xml
<domaincheckpoint>
   <name>check1</name>
   <description>testing</description>
   <disks>
     <disk name='sdc'/>
     <disk name='sdd'/>
   </disks>
</domaincheckpoint>

I had to supply two --diskspec arguments to virsh to select just the two 
qcow2 disks that I am using in my example (rather than every disk in the 
domain, which is the default when <disks> is not present). I also picked 
a name (mandatory) and description (optional) to be associated with the 
checkpoint.

The backup.xml file that we plan to reuse still mentions scratch1.img 
and scratch2.img as files needed for staging the pull request. However, 
any contents in those files could interfere with our second backup 
(after all, every cluster written into that file from the first backup 
represents a point in time that was frozen at the first backup; but our 
second backup will want to read the data as the guest sees it now rather 
than what it was at the first backup), so we MUST regenerate the scratch 
files. (Perhaps I should have just deleted them at the end of example 1 
in my previous email, had I remembered when typing that mail).

$ $qemu_img create -f qcow2 -b $orig1 -F qcow2 scratch1.img
$ $qemu_img create -f qcow2 -b $orig2 -F qcow2 scratch2.img

Now, to begin the full backup and create a checkpoint at the same time. 
Also, this time around, it would be nice if the guest had a chance to 
freeze I/O to the disks prior to the point chosen as the checkpoint. 
Assuming the guest is trusted, and running the qemu guest agent (qga), 
we can do that with:

$ $virsh fsfreeze $dom
$ $virsh backup-begin $dom backup.xml check1.xml
Backup id 1 started
backup used description from 'backup.xml'
checkpoint used description from 'check1.xml'
$ $virsh fsthaw $dom

and eventually, we may decide to add a VIR_DOMAIN_BACKUP_BEGIN_QUIESCE 
flag to combine those three steps into a single API (matching what we've 
done on some other existing API).  In other words, the sequence of QMP 
operations performed during virDomainBackupBegin are quick enough that 
they won't stall a freeze operation (at least Windows is picky if you 
stall a freeze operation longer than 10 seconds).

The tweaked $virsh backup-begin now results in a call to:
  virDomainBackupBegin(dom, "<domainbackup ...>",
    "<domaincheckpoint ...", 0)
and in turn libvirt makes a similar sequence of QMP calls as before, 
with a slight modification in the middle:
{"execute":"nbd-server-start",...
{"execute":"blockdev-add",...
{"execute":"transaction",
  "arguments":{"actions":[
   {"type":"blockdev-backup", "data":{
    "device":"$node1", "target":"backup-sdc", "sync":"none",
    "job-id":"backup-sdc" }},
   {"type":"blockdev-backup", "data":{
    "device":"$node2", "target":"backup-sdd", "sync":"none",
    "job-id":"backup-sdd" }}
   {"type":"block-dirty-bitmap-add", "data":{
    "node":"$node1", "name":"check1", "persistent":true}},
   {"type":"block-dirty-bitmap-add", "data":{
    "node":"$node2", "name":"check1", "persistent":true}}
  ]}}
{"execute":"nbd-server-add",...

The only change was adding more actions to the "transaction" command - 
in addition to kicking off the fleece image in the scratch nodes, it 
ALSO added a persistent bitmap to each of the original images, to track 
all changes made after the point of the transaction.  The bitmaps are 
persistent - at this point (well, it's better if you wait until after 
backup-end), you could shut the guest down and restart it, and libvirt 
will still remember that the checkpoint exists, and qemu will continue 
track guest writes via the bitmap. However, the backup job itself is 
currently live-only, and shutting down the guest while a backup 
operation is in effect will lose track of the backup job.  What that 
really means is that if the guest shuts down, your current backup job is 
hosed (you cannot ever get back the point-in-time data from your API 
request - as your next API request will be a new point in time) - but 
you have not permanently ruined the guest, and your recovery is to just 
start a new backup.

Pulling the data out from the backup is unchanged from example 1; virsh 
backup-dumpxml will show details about the job (yes, the job id is still 
1 for now), and when ready, virsh backup-end will end the job and 
gracefully take down the NBD server with no difference in QMP commands 
from before.  Thus, the creation of a checkpoint didn't change any of 
the fundamentals of capturing the current backup, but rather is in 
preparation for the next step.

$ $virsh backup-end $dom 1
Backup id 1 completed
$ rm scratch1.img scratch2.img

[We have not yet designed how qemu bitmaps will interact with external 
snapshots - but I see two likely scenarios:
  1. Down the road, I add a virDomainSnapshotCheckpointCreateXML() API, 
which adds a checkpointXML parameter but otherwise behaves like the 
existing virDomainSnapshotCreateXML - if that API is added in a 
different release than my current API proposals, that's yet another 
libvirt.so rebase to pickup the new API.
  2. My current proposal of virDomainBackupBegin(dom, "<domainbackup>", 
"<domaincheckpoint>", flags) could instead be tweaked to a single XML 
parameter, virDomainBackupBegin(dom, "
<domainbackup>
   <domaincheckpoint> ... </domaincheckpoint>
</domainbackup>", flags) prior to adding my APIs to libvirt 4.9, then 
down the road, we also tweak <domainsnapshot> to take an optional 
<domaincheckpoint> sub-element, and thus reuse the existing 
virDomainSnapshotCreateXML() to now also create checkpoints without a 
further API addition.
Speak up now if you have a preference between the two ideas]

Now that we have concluded the full backup and created a checkpoint, we 
can do more things with the checkpoint (it is persistent, after all). 
For example:

$ $virsh checkpoint-list $dom
  Name                 Creation Time
--------------------------------------------
  check1               2018-10-04 15:02:24 -0500

called virDomainListCheckpoints(dom, &array, 0) under the hood to get a 
list of virDomainCheckpointPtr objects, then called 
virDomainCheckpointGetXMLDesc(array[0], 0) to scrape the XML describing 
that checkpoint in order to display information.  Or another approach, 
using virDomainCheckpointGetXMLDesc(virDomainCheckpointCurrent(dom, 0), 0):

$ $virsh checkpoint-current $dom | head
<domaincheckpoint>
   <name>check1</name>
   <description>testing</description>
   <creationTime>1538683344</creationTime>
   <disks>
     <disk name='vda' checkpoint='no'/>
     <disk name='sdc' checkpoint='bitmap' bitmap='check1'/>
     <disk name='sdd' checkpoint='bitmap' bitmap='check1'/>
   </disks>
   <domain type='kvm'>

which shows the current checkpoint (that is, the checkpoint owning the 
bitmap that is still receiving live updates), and which bitmap names in 
the qcow2 files are in use. For convenience, it also recorded the full 
<domain> description at the time the checkpoint was captured (I used 
head to limit the size of this email), so that if you later hot-plug 
things, you still have a record of what state the machine had at the 
time the checkpoint was created.

The XML output of a checkpoint description is normally static, but 
sometimes it is useful to know an approximate size of the guest data 
that has been dirtied since a checkpoint was created (a dynamic value 
that grows as a guest dirties more clusters).  For that, it makes sense 
to have a flag to request the dynamic data; it's also useful to have a 
flag that suppresses the (length) <domain> output:

$ $virsh checkpoint-current $dom --size --no-domain
<domaincheckpoint>
   <name>check1</name>
   <description>testing</description>
   <creationTime>1538683344</creationTime>
   <disks>
     <disk name='vda' checkpoint='no'/>
     <disk name='sdc' checkpoint='bitmap' bitmap='check1' size='1048576'/>
     <disk name='sdd' checkpoint='bitmap' bitmap='check1' size='65536'/>
   </disks>
</domaincheckpoint>

This maps to virDomainCheckpointGetXMLDesc(chk, 
VIR_DOMAIN_CHECKPOINT_XML_NO_DOMAIN | VIR_DOMAIN_CHECKPOINT_XML_SIZE). 
Under the hood, libvirt calls
{"execute":"query-block"}
and converts the bitmap size reported by qemu into an estimate of the 
number of bytes that would be required if you were to start a backup 
from that checkpoint right now.  Note that the result is just an 
estimate of the storage taken by guest-visible data; you'll probably 
want to use 'qemu-img measure' to convert that into a size of how much a 
matching qcow2 image would require when metadata is added in; also 
remember that the number is constantly growing as the guest writes and 
causes more of the image to become dirty.  But having a feel for how 
much has changed can be useful for determining if continuing a chain of 
incremental backups still makes more sense, or if enough of the guest 
data has changed that doing a full backup is smarter; it is also useful 
for preallocating how much storage you will need for an incremental backup.

Technically, libvirt mapping that a checkpoint size request to a single 
{"execute":"query-block"} works only when querying the size of the 
current bitmap. The command also works when querying the cumulative size 
since an older checkpoint, but under the hood, libvirt must juggle 
things to create a temporary bitmap, call a few 
x-block-dirty-bitmap-merge, query the size of that temporary bitmap, 
then clean things back up again (after all, size(A) + size(B) >= 
size(A|B), depending on how many clusters were touched during both A and 
B's tracking of dirty clusters).  Again, a nice benefit of having 
libvirt manage multiple qemu bitmaps under a single libvirt API.

Of course, the real reason we created a checkpoint with our full backup 
is that we want to take an incremental backup next, rather than 
repeatedly taking full backups. For this, we need a one-line 
modification to our backup XML to add an <incremental> element; we also 
want to update our checkpoint XML to start yet another checkpoint when 
we run our first incremental backup.

$ cat > backup.xml <<EOF
<domainbackup mode='pull'>
   <server transport='tcp' name='localhost' port='10809'/>
   <incremental>check1</incremental>
   <disks>
     <disk name='$orig1' type='file'>
       <scratch file='$PWD/scratch1.img'/>
     </disk>
     <disk name='sdd' type='file'>
       <scratch file='$PWD/scratch2.img'/>
     </disk>
   </disks>
</domainbackup>
EOF
$ $virsh checkpoint-create-as --print-xml $dom check2 \
    --diskspec sdc --diskspec sdd | tee check2.xml
<domaincheckpoint>
   <name>check2</name>
   <disks>
     <disk name='sdc'/>
     <disk name='sdd'/>
   </disks>
</domaincheckpoint>
$ $qemu_img create -f qcow2 -b $orig1 -F qcow2 scratch1.img
$ $qemu_img create -f qcow2 -b $orig2 -F qcow2 scratch2.img

And again, it's time to kick off the backup job:

$ $virsh backup-begin $dom backup.xml check2.xml
Backup id 1 started
backup used description from 'backup.xml'
checkpoint used description from 'check2.xml'

This time, the incremental backup causes libvirt to do a bit more work 
under the hood:

{"execute":"nbd-server-start",
  "arguments":{"addr":{"type":"inet",
   "data":{"host":"localhost", "port":"10809"}}}}
{"execute":"blockdev-add",
  "arguments":{"driver":"qcow2", "node-name":"backup-sdc",
   "file":{"driver":"file",
    "filename":"$PWD/scratch1.img"},
    "backing":"'$node1'"}}
{"execute":"blockdev-add",
  "arguments":{"driver":"qcow2", "node-name":"backup-sdd",
   "file":{"driver":"file",
    "filename":"$PWD/scratch2.img"},
    "backing":"'$node2'"}}
{"execute":"block-dirty-bitmap-add",
  "arguments":{"node":"$node1", "name":"backup-sdc"}}
{"execute":"x-block-dirty-bitmap-merge",
  "arguments":{"node":"$node1", "src_name":"check1",
  "dst_name":"backup-sdc"}}'
{"execute":"block-dirty-bitmap-add",
  "arguments":{"node":"$node2", "name":"backup-sdd"}}
{"execute":"x-block-dirty-bitmap-merge",
  "arguments":{"node":"$node2", "src_name":"check1",
  "dst_name":"backup-sdd"}}'
{"execute":"transaction",
  "arguments":{"actions":[
   {"type":"blockdev-backup", "data":{
    "device":"$node1", "target":"backup-sdc", "sync":"none",
    "job-id":"backup-sdc" }},
   {"type":"blockdev-backup", "data":{
    "device":"$node2", "target":"backup-sdd", "sync":"none",
    "job-id":"backup-sdd" }},
   {"type":"x-block-dirty-bitmap-disable", "data":{
    "node":"$node1", "name":"backup-sdc"}},
   {"type":"x-block-dirty-bitmap-disable", "data":{
    "node":"$node2", "name":"backup-sdd"}},
   {"type":"x-block-dirty-bitmap-disable", "data":{
    "node":"$node1", "name":"check1"}},
   {"type":"x-block-dirty-bitmap-disable", "data":{
    "node":"$node2", "name":"check1"}},
   {"type":"block-dirty-bitmap-add", "data":{
    "node":"$node1", "name":"check2", "persistent":true}},
   {"type":"block-dirty-bitmap-add", "data":{
    "node":"$node2", "name":"check2", "persistent":true}}
  ]}}
{"execute":"nbd-server-add",
  "arguments":{"device":"backup-sdc", "name":"sdc"}}
{"execute":"nbd-server-add",
  "arguments":{"device":"backup-sdd", "name":"sdd"}}
{"execute":"x-nbd-server-add-bitmap",
  "arguments":{"name":"sdc", "bitmap":"backup-sdc"}}
{"execute":"x-nbd-server-add-bitmap",
  "arguments":{"name":"sdd", "bitmap":"backup-sdd"}}

Two things stand out here, different from the earlier full backup. First 
is that libvirt is now creating a temporary non-persistent bitmap, 
merging all data fom check1 into the temporary, then freezing writes 
into the temporary bitmap during the transaction, and telling NBD to 
expose the bitmap to clients. The second is that since we want this 
backup to start a new checkpoint, we disable the old bitmap and create a 
new one. The two additions are independent - it is possible to create an 
incremental backup [<incremental> in backup XML]) without triggering a 
new checkpoint [presence of non-null checkpoint XML].  In fact, taking 
an incremental backup without creating a checkpoint is effectively doing 
differential backups, where multiple backups started at different times 
each contain all cumulative changes since the same original point in 
time, such that later backups are larger than earlier backups, but you 
no longer have to chain those backups to one another to reconstruct the 
state in any one of the backups).

Now that the pull-model backup job is running, we want to scrape the 
data off the NBD server.  Merely reading nbd://localhost:10809/sdc will 
read the full contents of the disk - but that defeats the purpose of 
using the checkpoint in the first place to reduce the amount of data to 
be backed up. So, let's modify our image-scraping loop from the first 
example, to now have one client utilizing the x-dirty-bitmap command 
line extension to drive other clients.  Note: that extension is marked 
experimental in part because it has screwy semantics: if you use it, you 
can't reliably read any data from the NBD server, but instead can 
interpret 'qemu-img map' output by treating any "data":false lines as 
dirty, and "data":true entries as unchanged.

$ image_opts=driver=nbd,export=sdc,server.type=inet,
$ image_opts+=server.host=localhost,server.port=10809,
$ image_opts+=x-dirty-bitmap=qemu:dirty-bitmap:backup-sdc
$ $qemu_img create -f qcow2 inc12.img $size_of_orig1
$ $qemu_img rebase -u -f qcow2 -F raw -b nbd://localhost:10809/sdc \
   inc12.img
$ while read line; do
   [[ $line =~ .*start.:.([0-9]*).*length.:.([0-9]*).*data.:.false.* ]] ||
     continue
   start=${BASH_REMATCH[1]} len=${BASH_REMATCH[2]}
   qemu-io -C -c "r $start $len" -f qcow2 inc12.img
done < <($qemu_img map --output=json --image-opts 
$image_optsdriver=nbd,export=sdc,server.type=inet,server.host=localhost,server.port=10809,x-dirty-bitmap=qemu:dirty-bitmap:backup-sdc)
$ $qemu_img rebase -u -f qcow2 -b '' inc12.img

As captured, inc12.img is an incomplete qcow2 file (it only includes 
clusters touched by the guest since the last incremental or full 
backup); but since we output into a qcow2 file, we can easily repair the 
damage:

$ $qemu_img rebase -u -f qcow2 -F qcow2 -b full1.img inc12.img

creating the qcow2 chain 'full1.img <- inc12.img' that contains 
identical guest-visible contents as would be present in a full backup 
done at the same moment.

Of course, with the backups now captured, we clean up:

$ $virsh backup-end $dom 1
Backup id 1 completed
$ rm scratch1.img scratch2.img

and this time, virDomainBackupEnd() had to do one additional bit of work 
to delete the temporary bitmaps:

{"execute":"nbd-server-remove",
  "arguments":{"name":"sdc"}}
{"execute":"nbd-server-remove",
  "arguments":{"name":"sdd"}}
{"execute":"nbd-server-stop"}
{"execute":"block-job-cancel",
  "arguments":{"device":"backup-sdc"}}
{"execute":"block-job-cancel",
  "arguments":{"device":"backup-sdd"}}
{"execute":"blockdev-del",
  "arguments":{"node-name":"backup-sdc"}}
{"execute":"blockdev-del",
  "arguments":{"node-name":"backup-sdd"}}
{"execute":"block-dirty-bitmap-remove",
  "arguments":{"node":"$node1", "name":"backup-sdc"}}
{"execute":"block-dirty-bitmap-remove",
  "arguments":{"node":"$node2", "name":"backup-sdd"}}

At this point, it should be fairly obvious that you can create more 
incremental backups, by repeatedly updating the <incremental> line in 
backup.xml, and adjusting the checkpoint XML to move on to a successive 
name.  And while incremental backups are the most common (using the 
current active checkpoint as the <incremental> when starting the next), 
the scheme is also set up to permit differential backups from any 
existing checkpoint to the current point in time (since libvirt is 
already creating a temporary bitmap as its basis for the 
x-nbd-server-add-bitmap, all it has to do is just add an appropriate 
number of x-block-dirty-bitmap-merge calls to collect all bitmaps in the 
chain from the requested checkpoint to the current checkpoint).

More to come in part 3.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org