[libvirt] [PATCH v2 5/8] Added job type VIR_DOMAIN_JOB_PHASE1_COMPLETED

Jiri Denemark jdenemar at redhat.com
Thu Oct 9 13:09:32 UTC 2014


On Wed, Oct 08, 2014 at 23:45:17 +0900, Cristian Klein wrote:
> 
> On 07 Oct 2014, at 22:08 , Jiri Denemark <jdenemar at redhat.com> wrote:
> 
> > On Tue, Sep 30, 2014 at 16:39:26 +0200, Cristian Klein wrote:
> >> Some jobs feature several phases. For example, post-copy migration is
> >> composed of a first phase, from migration start to switching to
> >> post-copy, and a second phase, to migration completion. This
> >> job type allows to flag that the job has completed the first phase,
> >> but is not yet fully completed.
> >> 
> >> Signed-off-by: Cristian Klein <cristian.klein at cs.umu.se>
> >> ---
> >> include/libvirt/libvirt.h.in | 1 +
> >> tools/virsh-domain.c         | 3 ++-
> >> 2 files changed, 3 insertions(+), 1 deletion(-)
> >> 
> >> diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in
> >> index 84cd5a4..81044f0 100644
> >> --- a/include/libvirt/libvirt.h.in
> >> +++ b/include/libvirt/libvirt.h.in
> >> @@ -4307,6 +4307,7 @@ typedef enum {
> >>     VIR_DOMAIN_JOB_COMPLETED = 3, /* Job has finished, but isn't cleaned up */
> >>     VIR_DOMAIN_JOB_FAILED    = 4, /* Job hit error, but isn't cleaned up */
> >>     VIR_DOMAIN_JOB_CANCELLED = 5, /* Job was aborted, but isn't cleaned up */
> >> +    VIR_DOMAIN_JOB_PHASE1_COMPLETED = 6, /* Job completed first phase, e.g., post-copy activation */
> > 
> > This is not a job type. If we need to advertise this to libvirt clients,
> > we may introduce a new job statistics typed parameter but we should not
> > misuse job type. And I'm not entirely convinced we need to advertise
> > this. To me it seems the only interested thing is whether a domain is
> > still running on the source host or it was already resumed on the
> > destination host. And it's easy to get this kind of information via
> > existing ways, e.g., listening to domain life cycle events or by
> > checking domain's status.
> 
> My rationale for introducing this flag was to prevent a future
> breakage of an internal API. I felt like none of the previous flags
> where accurately representing the status of this job after post-copy
> has started.

On the other hand, you changed something that is externally visible.
Apps would be pretty confused by job type changing from UNBOUNDED to
PHASE1_COMPLETED during migration.

> Sure, I could use `job->status.status` in
> `qemuMigrationWaitForCompletion`, but I was concerned that a future
> `libvirt` addition might not take this subtly into account and might
> break post-copy.

Well, we have to be careful :-) Also I'm not sure how the new
VIR_DOMAIN_JOB_PHASE1_COMPLETED would help...

> If you think I should go ahead, suppress this job type and use
> `job->status.status` instead, what should I use instead,
> `VIR_DOMAIN_JOB_UNBOUNDED` or `VIR_DOMAIN_JOB_BOUNDED`? To be honest,
> I’m not sure to have completely understood the difference between
> them, except that `VIR_DOMAIN_JOB_BOUNDED` seems to be used to signal
> that a job is almost complete.

virDomainJobType is a bit overloaded since it is mostly used to describe
what kind of job is running and also about its state.

- VIR_DOMAIN_JOB_NONE - no job is currently running
- VIR_DOMAIN_JOB_BOUNDED - there is a job running and we know how much
  data we will need to process during the job. That is, apps may easily
  indicate its progress because we report the total amount of data and
  how much we already processed. Currently we don't have such jobs in
  libvirt.
- VIR_DOMAIN_JOB_UNBOUNDED - there is a job running and we have no idea
  how much data we will need to process. That is, while we are
  processing data, something we already processed gets changed and we
  need to process it again. This is a live migration, once all memory is
  processed, we need to process the memory that changed in the
  meantime...
- VIR_DOMAIN_JOB_COMPLETED - there was a job running and it has just
  finished (very hard to be seen externally until recently when we added
  support for explicitly querying statistics about a completed job)
- VIR_DOMAIN_JOB_FAILED - there was a job running and it failed
- VIR_DOMAIN_JOB_CANCELLED - there was a job running and it was
  cancelled either by libvirt itself or by a user/app

Apps know VIR_DOMAIN_{UN,}BOUNDED job mean there is a job running. We
can't change that without breaking them.

Jirka




More information about the libvir-list mailing list