[libvirt] Job Control API [RFC]

Mon May 26 14:31:30 UTC 2014

On Wed, May 21, 2014 at 11:13:06AM -0400, Tucker DiNapoli wrote:
>  My name is Tucker DiNapoli and I am working on implementing job control
>for
>the storage driver for the google summer of code, the first step in doing
>this
>is creating and implementing a unified api for job control.
>
>Currently there are several places where various aspects of job control are
>implemented. The qemu and libxl drivers both contain internal
>implementations
>for job control on domain level jobs, with the qemu driver containing
>support
>for asynchronous jobs. There is also code in the libvirt.c file for running
>block jobs and for querying domain jobs for information.
>
>I would like for the job control api to be as independent of different
>drivers
>as possible since it will need to be used with storage drivers as well as
>different virtualization drivers.
>

This definitely has to be independent in the code.  The less anyone
suffers with adding job control to other drivers, the better.

>I imagine most of the api will revolve around a job object, and I think it's
>important to decide what exactly should go in this job object.
>
>This is a response from my first post on the mailing list and I think this
>is a
>good idea.
>
>>>I'd _really_ like to see a common notion of a 'job id' that EVERY job
>>>(whether domain-level, like migration; or block-level, like
>>>commit/pull/rebase; or storage-level, like your new proposed storage
>>>jobs) shares a common job namespace.  The job id is a positive integer.
>>> Existing APIs will have to be retrofitted into the new job id notion;
>>>any action that starts a long-running job that currently returns 0 on
>>>success could be changed to return a positive job id; or we may need a
>>>new API that queries the notion of the 'current job' (the job most
>>>recently started) or even to set the 'current job' to a different job
>>>id.  We'll need new API for querying a job by id, and to be most
>>>portable, we should do job reporting via virTypedParameter
>>>(virDomainGetJobInfo and virDomainGetBlockJobInfo are hardcoded into
>>>returning a struct, so they are non-extensible; virDomainGetJobStats
>>>almost did it right, except that the user has to call it twice, once to
>>>learn how large to allocate, and again to pass in pre-allocated memory -
>>>the ideal API would allocate the memory on a single call).
>
>Currently there are separate types for block job info and job info, if
>possible
>I would like to merge these into a common job info type, and perhaps make
>this
>a part of the job object itself.
>

Anything that *can* be part of the job object itself, *should* be part
of it, however some things might require duplicating some info in
which case applying common sense should suffice.

>Currently (in libxl and qemu) jobs are a part of the domain struct, I think
>that jobs should be moved out of the domain struct instead using the idea of
>job ids for domains to keep track of currently running jobs. I'm still new
>to
>libvirt so it this doesn't make sense and the idea of keeping job objects
>attached to domains makes sense that's fine.
>
>I think at the minimum each job object should contain: the id of the thread
>running the job, the type of job, the job id, a condition variable to
>coordinate jobs, and information about the job, either as a separate job
>info
>object or as part of the job object itself. The job should also contain a
>reference to the domain or storage it is associated with.
>

I had an idea that job could have a list of domains/volumes/etc., but
those could relate to different (even not remotely connected)
drivers.  Would this be solved just with simple error "unknown job id"
when connected with another driver?

>There are a few basic functions that should definitely be part of the api:
>initialize a job, free a job, start a job, end a job, abort a job and get
>info
>on a job. It would be nice to be able to suspend a job and to change the
>currently running job as well. That's what I can come up with, but I don't
>have
>much experience in libvirt so if there are other features that make sense
>they
>can be added as well.
>

All the features may make sense, but lots of them might not be
available when the underlying tool doesn't support it.  If it's a
simple qemu-img process, you can suspend it, you can even kill it, but
how gracefull it is when handling images read-write?  That's a
question...  Anyway, these things should probably be callbacks that
will be added by the particular driver when initializing the job and
handled there.

>Finally (as far as I can think of right now) is the idea of parallel
>jobs. Currently the qemu driver allows some jobs to be run in parallel by
>allowing a job to be run asynchronously, this async job has a mask of job
>types
>associated with it that determine what types of regular jobs can be run
>during
>it. However I would like to allow an arbitrary number of jobs to be run at
>once
>(I'm not sure how useful this would be, but it seems best not to impose hard
>limits on things). The easiest way to deal with this is to just ignore it
>and
>put the burden of synchronizing jobs on the drivers. This is obviously a bad
>solution. Another way would be the way it is currently done it the qemu
>driver,
>have a mask of job types associated with each domain/storage which is
>updated
>when a job is started or ended which dictates what types of jobs can be
>started. Regardless of how this is done it will require support from the
>driver/domain/storage that each job is associated with.
>

And again, this can be decided by a mask or even a callback to the
driver as well.

Martin

>Tucker DINapoli
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20140526/582a4997/attachment-0001.sig>