[libvirt] [RFC v3] external (pull) backup API

Mon May 21 15:52:16 UTC 2018

18.05.2018 01:43, Eric Blake wrote:
> Here's my updated counterproposal for a backup API.
>

[...]

>
> Representing things on a timeline, when a guest is first created,
> there is no dirty bitmap; later, the checkpoint "check1" is created,
> which in turn creates "bitmap1" in the qcow2 image for all changes
> past that point; when a second checkmark "check2" is created, a qemu
> transaction is used to create and enable the new "bitmap2" bitmap at
> the same time as disabling "bitmap1" bitmap.  (Actually, it's probably
> easier to name the bitmap in the qcow2 file with the same name as the
> Checkpoint object being tracked in libvirt, but for discussion
> purposes, it's less confusing if I use separate names for now.)
>
> creation ....... check1 ....... check2 ....... active
>         no bitmap       bitmap1        bitmap2
>
> When a user wants to create a backup, they select which point in time
> the backup starts from; the default value NULL represents a full
> backup (all content since disk creation to the point in time of the
> backup call, no bitmap is needed, use sync=full for push model or
> sync=none for the pull model); any other value represents the name of
> a checkpoint to use as an incremental backup (all content from the
> checkpoint to the point in time of the backup call; libvirt forms a
> temporary bitmap as needed, the uses sync=incremental for push model
> or sync=none plus exporting the bitmap for the pull model).  For
> example, requesting an incremental backup from "check2" can just reuse
> "bitmap2", but requesting an incremental backup from "check1" requires
> the computation of the bitmap containing the union of "bitmap1" and
> "bitmap2".

I have a bit of criticism on this part, exactly on ability to create a 
backup not from last checkpoint but from any from the past. For this 
ability we are implementing the whole api with checkpoints, we are going 
to store several bitmaps in Qemu (and possibly going to implement 
checkpoints in Qemu in future). But personally, I don't know any real 
and adequate use cases for this ability.

I heard about the following cases:
1. Incremental restore: we want to rollback to some point in time (some 
element in incremental backup chain), and don't want to copy all the 
data, but only changed.
- It's not real case, because information about dirtiness is already in 
backup chain: we just need to find allocated areas and copy them + we 
should copy areas, corresponding to dirty bits in active dirty bitmap in 
Qemu.

2. Several backup solutions backing up the same vm
- Ok, if we implement checkpoints, instead of maintaining several active 
dirty bitmaps, we can have only one active bitmap and others disabled, 
which lead to performance gain and possibility to save RAM space (if we 
unload disabled bitmaps from RAM to qcow2). But what are real cases? 
What is the real benefit? I doubt that somebody will use more than 2 - 3 
different backup providers on same vm, so is it worth implementing such 
a big feature for this? It of course worth doing if we have 100 
independent backup providers.
Note: the word "independent" is important here. For example it may be 
two external backup tools, managed by different subsystems or different 
people or something like this. If we are just doing a backup weekly + 
daily, actually, we can synchronize them, so that weekly backup will be 
a merge of last 7 daily backups, so weekly backup don't need personal 
active dirty bitmap and even backup operation.

3. Some of backups in incremental backup chain are lost, and we want to 
recreate part of the chain as a new backup, instead of just dropping all 
chain and create full backup.
In this case, I can say the following:
disabled bitmaps (~ all checkpoints except the last one) are constant 
metadata, related to the backup chain, not to the vm. And it should be 
stored as constant data: may be on the same server as backup chain, 
maybe on the other, maybe in some database, but not in vm. VM is a 
dynamic structure, and I don't see any reason of storing (almost) 
unrelated constant metadata in it. Also, saving this constant 
backup-related metadata separately from vm will allow to check it's 
consistency with a help of checksums or something like this. Finally, 
I'm not a specialist in storing constant data, but I think that the vm 
is not the best place.

Note: Hmm, do someone have real examples of such user cases? Why backups 
are lost, is it often case? (I heard an assumption, that it may be a 
tool, checking backups (for example create a vm over the backup and 
check that it at least can start), which is running in background. But 
I'm not sure, that we must drop backup if it failed, may be it's enough 
to merge it up)

3.1 About external backup: we have even already exported this metadata 
to the third backup tool. So, this tool should store this information 
for future use, instead of exporting from Qemu again.

To summarize:
1. I doubt that discussed ability is really needed.
2. If it is needed, I doubt that it's a true way to store related 
disabled bitmaps (or checkpoints) in Qemu.

-- 
Best regards,
Vladimir