[libvirt] [PATCH 1/6] Refactor job statistics

Jiri Denemark jdenemar at redhat.com
Mon Sep 8 14:46:55 UTC 2014


On Fri, Sep 05, 2014 at 14:41:21 -0400, John Ferlan wrote:
> 
> 
> On 09/01/2014 11:05 AM, Jiri Denemark wrote:
> > Job statistics data were tracked in several structures and variables.
> > Let's make a new qemuDomainJobInfo structure which can be used as a
> > single source of statistics data as a preparation for storing data about
> > completed a job.
> > 
> > Signed-off-by: Jiri Denemark <jdenemar at redhat.com>
> > ---
> >  src/qemu/qemu_domain.c    | 157 ++++++++++++++++++++++++++++++++++++++++++++--
> >  src/qemu/qemu_domain.h    |  24 ++++++-
> >  src/qemu/qemu_driver.c    | 119 ++++-------------------------------
> >  src/qemu/qemu_migration.c |  72 ++++++++-------------
> >  4 files changed, 213 insertions(+), 159 deletions(-)
> > 
> 
> Curious why the decision to use "Ptr" instead of inline struct for
> current & completed?

It's just easier to check whether completed is filled in or not and
current has the same type for consistency.

> When I saw an alloc in the middle of a structure
> and then a few free's along the way I began to wonder and attempt to
> research the various paths to make sure all accesses knew/checked that
> the impending deref would be OK - I marked those I found... I think one
> or two may need double check since I assume you know the overall code
> paths better.

Basically, current is allocated iff there is an async job running so
any code that knows an async job exists may access current without
having to worry about it too much.

> I see there's a path from qemuProcessReconnect() - that's
> the libvirtd restart path right? So when the VIR_ALLOC() is done for
> current/completed - who's address space is that? Wouldn't that be
> address space from the "old" libvirtd?

I'm not quite sure what you are talking about here :-) However,
qemuProcessReconnect calls qemuProcessRecoverJob and that one does not
care about any statistics.

> > diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
> > index e9506e0..2c709e9 100644
> > --- a/src/qemu/qemu_domain.c
> > +++ b/src/qemu/qemu_domain.c
> > @@ -163,11 +163,9 @@ qemuDomainObjResetAsyncJob(qemuDomainObjPrivatePtr priv)
> >      job->asyncOwner = 0;
> >      job->phase = 0;
> >      job->mask = DEFAULT_JOB_MASK;
> > -    job->start = 0;
> >      job->dump_memory_only = false;
> >      job->asyncAbort = false;
> > -    memset(&job->status, 0, sizeof(job->status));
> > -    memset(&job->info, 0, sizeof(job->info));
> > +    VIR_FREE(job->current);
> 
> This was one of those free paths where there were multiple callers and I
> wasn't 100% other accesses needed to be checked w/r/t to this occurring
> before an unchecked access in some other path...

Once an async job goes away so does current info about it.

> >  }
> >  
> >  void
...
> > @@ -211,6 +210,150 @@ qemuDomainTrackJob(qemuDomainJob job)
> >  }
> >  
> >  
> > +int
> > +qemuDomainJobInfoUpdateTime(qemuDomainJobInfoPtr jobInfo)
> > +{
> > +    unsigned long long now;
> > +
> > +    if (!jobInfo->started)
> > +        return 0;
> 
> any chance jobInfo == NULL?  I think the only path where the caller
> doesn't check for current (or completed) is qemuMigrationUpdateJobStatus

Shouldn't be. And this is definitely a good candidate for
ATTRIBUTE_NONNULL. qemuMigrationUpdateJobStatus is always called within
an async job so current is guaranteed to be non-NULL.

...
> > +    if (status->ram_duplicate_set) {
> > +        if (virTypedParamsAddULLong(&par, &npar, &maxpar,
> > +                                    VIR_DOMAIN_JOB_MEMORY_CONSTANT,
> > +                                    status->ram_duplicate) < 0 ||
> > +            virTypedParamsAddULLong(&par, &npar, &maxpar,
> > +                                    VIR_DOMAIN_JOB_MEMORY_NORMAL,
> > +                                    status->ram_normal) < 0 ||
> > +            virTypedParamsAddULLong(&par, &npar, &maxpar,
> > +                                    VIR_DOMAIN_JOB_MEMORY_NORMAL_BYTES,
> > +                                    status->ram_normal_bytes) < 0)
> 
> I have a note in patch 2 about ram_normal and ram_normal_bytes vis-a-vis
> qemu_monitor_json.c not checking status on their fetch and my latest
> coverity run complaining about it.

Not sure what you're talking about but I'll in patch 2 :-)

...
> > @@ -1071,7 +1214,7 @@ qemuDomainObjBeginJobInternal(virQEMUDriverPtr driver,
> >      unsigned long long then;
> >      bool nested = job == QEMU_JOB_ASYNC_NESTED;
> >      virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver);
> > -    int ret;
> > +    int ret = -1;
> >  
> >      VIR_DEBUG("Starting %s: %s (async=%s vm=%p name=%s)",
> >                job == QEMU_JOB_ASYNC ? "async job" : "job",
> 
> Just a note after this there's a
> 
>     priv->jobs_queued++;
> 
> which can be decremented at cleanup; however, if virTimeMillisNow()
> fails, then there's a return -1 *and* a virObjectUnref(obj) before the
> virObjectRef(obj)...

This is unrelated to this patch, but you are right jobs_queued++ should
be done after virTimeMillisNow(). The rest is correct because it's
virObjectUnref(cfg) that is called before return -1 (note obj vs. cfg).

> > @@ -1127,9 +1270,11 @@ qemuDomainObjBeginJobInternal(virQEMUDriverPtr driver,
> >                    qemuDomainAsyncJobTypeToString(asyncJob),
> >                    obj, obj->def->name);
> >          qemuDomainObjResetAsyncJob(priv);
> > +        if (VIR_ALLOC(priv->job.current) < 0)
> > +            goto cleanup;
> 
> If this fails - what are the chances some other path will check/access
> current without checking for it being NULL?  Perhaps low and there's
> going to be other problems anyway in a low memory situation...

If this fails then the caller will not get an async job and thus it
shouldn't do anything and especially it shouldn't touch job.current.

> >          priv->job.asyncJob = asyncJob;
> >          priv->job.asyncOwner = virThreadSelfID();
> > -        priv->job.start = now;
> > +        priv->job.current->started = now;
> >      }
> >  
> >      if (qemuDomainTrackJob(job))
...
> > diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h
> > index 8736889..119590e 100644
> > --- a/src/qemu/qemu_domain.h
> > +++ b/src/qemu/qemu_domain.h
...
> > @@ -378,4 +388,12 @@ bool qemuDomainDefCheckABIStability(virQEMUDriverPtr driver,
> >  bool qemuDomainAgentAvailable(qemuDomainObjPrivatePtr priv,
> >                                bool reportError);
> >  
> > +int qemuDomainJobInfoUpdateTime(qemuDomainJobInfoPtr jobInfo);
> > +int qemuDomainJobInfoToInfo(qemuDomainJobInfoPtr jobInfo,
> > +                            virDomainJobInfoPtr info);
> > +int qemuDomainJobInfoToParams(qemuDomainJobInfoPtr jobInfo,
> > +                              int *type,
> > +                              virTypedParameterPtr *params,
> > +                              int *nparams);
> > +
> 
> Should any of these new defs need to have/use the ATTRIBUTE_NONNULL()
> for params?

Yeah, all of them I guess.

> > diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> > index 239a300..265315d 100644
> > --- a/src/qemu/qemu_driver.c
> > +++ b/src/qemu/qemu_driver.c
...
> > @@ -3110,8 +3108,7 @@ qemuDomainSaveInternal(virQEMUDriverPtr driver, virDomainPtr dom,
> >          goto endjob;
> >      }
> >  
> > -    memset(&priv->job.info, 0, sizeof(priv->job.info));
> > -    priv->job.info.type = VIR_DOMAIN_JOB_UNBOUNDED;
> > +    priv->job.current->type = VIR_DOMAIN_JOB_UNBOUNDED;
> 
> Assumes current-> exists, but occurs after BeginAsyncJob() is
> successful, so it seems safe.

Exactly.

> 
> >  
> >      /* Pause */
> >      if (virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING) {
...
> > diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
> > index 9cfb77e..64adab4 100644
> > --- a/src/qemu/qemu_migration.c
> > +++ b/src/qemu/qemu_migration.c
...
> > @@ -1738,62 +1739,40 @@ qemuMigrationUpdateJobStatus(virQEMUDriverPtr driver,
> >  
> >      qemuDomainObjExitMonitor(driver, vm);
> >  
> > -    priv->job.status = status;
> > -
> > -    if (ret < 0 || virTimeMillisNow(&priv->job.info.timeElapsed) < 0)
> > +    if (ret < 0 ||
> > +        qemuDomainJobInfoUpdateTime(priv->job.current) < 0)
> 
> Here (and below) it's assumed job.current is valid.

qemuMigrationUpdateJobStatus is only called with an async job started.

...
> > @@ -1803,11 +1782,14 @@ qemuMigrationUpdateJobStatus(virQEMUDriverPtr driver,
> >   * QEMU reports failed migration.
> >   */
> >  static int
> > -qemuMigrationWaitForCompletion(virQEMUDriverPtr driver, virDomainObjPtr vm,
> > +qemuMigrationWaitForCompletion(virQEMUDriverPtr driver,
> > +                               virDomainObjPtr vm,
> >                                 qemuDomainAsyncJob asyncJob,
> > -                               virConnectPtr dconn, bool abort_on_error)
> > +                               virConnectPtr dconn,
> > +                               bool abort_on_error)
> >  {
> >      qemuDomainObjPrivatePtr priv = vm->privateData;
> > +    qemuDomainJobInfoPtr jobInfo = priv->job.current;
> 
> Hopefully nothing has cleared or could clear this before this point...

qemuMigrationWaitForCompletion is also called with async job set.

...

Jirka




More information about the libvir-list mailing list