[libvirt] [PATCH 1/8] snapshots: Avoid term 'checkpoint' for full system snapshot

Tue Jun 26 12:12:25 UTC 2018

On 06/25/2018 06:27 PM, Eric Blake wrote:
> On 06/22/2018 04:16 PM, John Ferlan wrote:
>>
>>
>> On 06/13/2018 12:42 PM, Eric Blake wrote:
>>> Upcoming patches plan to introduce virDomainCheckpointPtr as a new
>>> object for use in incremental backups, along with documentation
>>> how incremental backups differ from snapshots.  But first, we need
>>> to rename any existing mention of a 'system checkpoint' to instead
>>> be a 'full system state snapshot', so that we aren't overloading
>>> the term checkpoint.
>>>
>>> Signed-off-by: Eric Blake <eblake at redhat.com>
>>>
>>> ---
>>> Bikeshed suggestions on what to name the new object for use in
>>> backups is welcome, if we would rather keep the term 'checkpoint'
>>> for a disk+memory snapshot.
>>> ---
>>
>> "Naming is hard" and opinions can vary greatly - be careful for what you
>> ask in case you receive something not wanted ;-).
>>
>> I haven't followed the discussions thus far all that closely, but I'll
>> give this a go anyway since it's languishing and saying nothing is akin
>> to implicitly agreeing everything is fine. Fair warning, I'm not all
>> that familiar with snapshot algorithms having largely tried to ignore it
>> since others (Eric and Peter) have far more in depth knowledge.
>>
>> In any case, another option for the proposed "checkpoint" could be a
>> "snapshot reference". One can start or end a reference period and then
>> set or clear a reference point.
>>
>> What I'm not clear on yet is whether the intention is to have this
>> checkpoint (and backup) be integrated in any way to the existing
>> snapshot algorithms. I guess part of me thinks that if I take a full
>> system snapshot, then any backup/checkpoint data should be included so
>> that if/when I go back to that point in time I can start from whence I
>> left as it relates to my backup. Kind of a superset and/or integrated
>> model rather than something bolted onto the side to resolve a specific
>> need.
> 
> That's a tough call.  My current design has incremental backups
> completely separate from the existing checkpoint code for several reasons:
> - the snapshot code is already confusing with lots of flags
> (internal/external, disk/memory, etc)
> - snapshots can be reverted to (well, in theory - we STILL can't revert
> to an external snapshot cleanly, even though the design supports it)
> - incremental backups are not direct revert points
> 
> so, rather than bolt something on to the existing design, I went with a
> new concept. As you found later in the series, I then tried to provide a
> good summary page describing the different pieces, and what tradeoffs
> are involved in order to know which approach will work for a given need.
> 

Understood - since I've made it further now. Domain state is somewhat of
a tangled or interwoven part of the domain XML lifecycle fabric, so I
perhaps process it that way when reading new code.

It seems the design is that XML for domain{checkpoint|backup} will be
similar to domainstatus, but far more restrictive to the specific needs
for each since you're saving the original config in the output.

>>
>> I suppose a reservation I have about separate virDomainCheckpoint* and
>> virDomainBackup* API's is understanding the relationship between the two
>> naming spaces. IIUC though a Checkpoint would be reference point in time
>> within a Backup period.
> 
> A sequence of snapshots are different points in time you can revert to.
> A sequence of checkpoints are different points in time you can use as
> the reference point for starting an incremental backup.
> 
> So if we don't like the term 'checkpoint', maybe
> virDomainBlockBackupReference would work.  But it is longer, and would
> make for some mouthful API names.

I saw things as more of "{Create|Start|Set|Clear|End|Destroy}" type
operations initially, but I've gone through my viewpoint has changed.
I'm still concerned with over complicating with too many flags which is
where we seem to have gotten ourselves in trouble with snapshot as time
went on and more or less functionality points were desired.

Checkpoint/backup is complex enough without adding levels of features
for consumers that may or may not be used. In the long run, it's where
do you give up control and how much can/should be assumed to be
libvirt's "job".

If a consumer wants to do something particularly tricky, then maybe we
should just hand them the gun with one bullet in the chamber (so to
speak). Provide an API that "locks out" other threads instead of doing
that for them ;-) - someone always thinks they can do it better!

In the long run, we don't necessarily know how all consumers would like
to use this and so far there's been mixed opinions on what should be
done. At some level the operation of starting checkpoints, setting
checkpoints, and allowing/performing backups from a specific checkpoint
are fairly straightforward type operations. It's the additional cruft
and flags to do "fancy" stuff or the desire to worry about every
possible operation option someone could possibly want that I think
causes over complication. Let the up-the-stack app keep track of things
if that's the complicated level they desire. Keep the libvirt side simple.

> 
> Also, you commented elsewhere that 'virDomainBackupBegin' misses out on
> the fact that under the hood, it is a block operation (only disk state);
> would 'virDomainBlockBackupBegin' be any better?  There are fewer APIs
> with the term 'Backup' than with 'Checkpoint', if we do want go with
> that particular rename.
> 

I think so, but I haven't reached the Backup API's yet. There's the
interaction with NBD that's going to keep things "interesting".
Reminding myself that under the covers, migrations start/stop an NBD
server is I assume the impetus behind [re]using NBD as the 3rd party
integration point. Perhaps something that becomes part of the
documentation description for describing the difference between
pull/push backup models.

>>
>> I do have more comments in patch2, but I want to make them coherent
>> before posting. Still I wanted to be sure you got at least "some"
>> feedback for this and well of course an opinion on checkpoint ;-)
>>
>>>   docs/formatsnapshot.html.in               | 14 +++++++-------
>>>   include/libvirt/libvirt-domain-snapshot.h |  2 +-
>>>   src/conf/snapshot_conf.c                  |  2 +-
>>>   src/libvirt-domain-snapshot.c             |  4 ++--
>>>   src/qemu/qemu_driver.c                    | 12 ++++++------
>>>   tools/virsh-snapshot.c                    |  2 +-
>>>   tools/virsh.pod                           | 14 +++++++-------
>>>   7 files changed, 25 insertions(+), 25 deletions(-)
>>>
>>> diff --git a/docs/formatsnapshot.html.in b/docs/formatsnapshot.html.in
>>> index fbbecfd242..f2e51df5ab 100644
>>> --- a/docs/formatsnapshot.html.in
>>> +++ b/docs/formatsnapshot.html.in
>>> @@ -33,7 +33,7 @@
>>>           resume in a consistent state; but if the disks are modified
>>>           externally in the meantime, this is likely to lead to data
>>>           corruption.</dd>
>>> -      <dt>system checkpoint</dt>
>>> +      <dt>full system state</dt>
>>
>> Is "state" superfluous in this context?  IOW: Everywhere that "full
>> system state" exists, it seems "full system" could be used.
>>
>> Other synonyms that came up are complete, entire, integrated, or
>> thorough (hah!). But I think "Full System" conveys enough meaning even
>> though it could convey more meaning than intended.
> 
> Okay, I can live with shortening the replacement to 'full system'. Don't
> know if it will happen in the v2 series that I hope to post later
> tonight, or if it would be done on top (my immediate short-term goal is
> to get a demo of incremental backups working, to show the API is usable;
> although the demo depends on unreleased qemu code so only the API would
> actually go in this month's libvirt release, while the underlying
> src/qemu/* changes can be delayed and polished to be better than the
> demo for the time when qemu 3.0 releases the needed bitmap/NBD features).
> 

OK - it's not 100% clear in my mind all the various pieces needed for
this to all work properly. I understand the desire/need for inclusion;
however, we do need to consider that in more recent times "waiting" for
a more functional solution to be "ready" in QEMU before pushing libvirt
changes - especially ones that bake an API...

>>
>>
>>>         <dd>A combination of disk snapshots for all disks as well as VM
>>>           memory state, which can be used to resume the guest from
>>> where it
>>>           left off with symptoms similar to hibernation (that is, TCP
>>> @@ -55,7 +55,7 @@
>>>         as <code>virDomainSaveImageGetXMLDesc()</code> to work with
>>>         those files.
>>>       </p>
>>> -    <p>System checkpoints are created
>>> +    <p>Full system state snapshots are created
>>>         by <code>virDomainSnapshotCreateXML()</code> with no flags, and
>>>         disk snapshots are created by the same function with
>>>         the <code>VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY</code> flag; in
>>
>> BTW: Existing and maybe it's just me, but when I read a conjunctive
>> sentence with only two parts I don't expect to see ", and" or ", or" -
>> it's just "and" or "or" without the comma....
> 
> Thanks for the careful grammar/legibility review.  I'll try to fold in
> those suggestions (again, might not make it into v2).
> 

Understood.

John