[libvirt] [PATCH RFC 0/5] drop nested job concept

Nikolay Shirokovskiy nshirokovskiy at virtuozzo.com
Mon Nov 28 09:21:37 UTC 2016



On 28.11.2016 11:56, Peter Krempa wrote:
> On Mon, Nov 28, 2016 at 11:15:41 +0300, Nikolay Shirokovskiy wrote:
>> Instead I suggest to serialize qemu monitor commands.
> 
> That won't be possible. There are a few APIs that call multiple commands
> and expect the VM not to change in between. Thus they rely on nested
> jobs. See migration and snapshot code (btw, the two most complex parts
> of libvirt).

But I suggest to drop only nested jobs and use other means to accomplish
the same goal. The jobs in general stays as they are after this series. Sync jobs are mutually
exclusive, async jobs are mutually exclusive, query (and some other "safe") sync jobs
can run in parallel with async etc. The only thing is change how the last
are executed in parallels.

Probably I'd better use 'replace' instead of 'drop'.

> 
>> My original motivation is to prepare job exclusion code to add agent jobs. This
>> job is a special job to operate on guest agent only so first it can be run in
>> parallel with sync, async and nested jobs and second it should not block any
> 
> That is possible. The async jobs allow for query jobs to be run while
> they are executed you need to properly configure them.

This is not enough. I don't want agent jobs to block any current async or
sync jobs from executing. This is not possible with current code.

> 
>> qemu monitor job at all. Eventually I want to move from a lot of different
>> types of jobs to just a job with some kind of matrix which describes what jobs
>> can run in a parallel. As a first step I want to drop nested jobs.
> 
> I don't think you will be able to do this in steps. Nested jobs have a
> vital functions and unless you replace the mechanism or overhaul all the
> code that is doing nested jobs you won't be able to guarantee that the
> state of the objects are as the code expects.

Gosh, why i used 'drop', it is 'replace'. I replace serializing thru nested
jobs and sync jobs with just serializing at monitor level.

> 
> Doin a ful N*N matrix of possible parallel operations will be very hard
> to maintain. You will need to group them together in a way so that we
> don't have to deal with the quadratic complexity. How are you expecting
> to do this?
> 

This matrix exists already and it based of job types not every individual
job itsel. It is just not apparent in code. Here it's sample:

  Q M A
Q x x o
M x x x
A x x x

Q - query job
M - modify job
A - async job

x - inpossible
o - possible 

This matrix shows can jobs run in parallel. It is assymetrical now, you can
run query while async job is executing but can not start async job while
in query. This assymetric is probably current implementation restrictment.

>> AFAIU *main* nested job purpuse is to serialize requests to monitor from async
>> jobs and allowed sync jobs as monitor does not do it by himself now. Strictly
> 
> No, the main purpose is to allow to call multiple monitor commands while
> you are guaranteed that something did not change the VM state and thus
> invalidate the complex operation you are trying to execute.

For this reason in matrix above AM, AX are impossible. AQ are possible and
it is decided by matrix not nested jobs.

> 
>> speaking this assertion is rather rough because nested jobs and sync jobs
>> serialized at job level and thus chunks of code that serialized differ from
>> case when only monitor requests are serialized. Nevertheless IFAIU this part of
>> libvirt is not strict too. We have a lot of query commands and async commands
>> that can execute in parallel and have no definition of what query job or async
>> job are. Thus it is not clear is this code correct or not. So the same argument
> 
> Certainly it's more correct this way than dropping it without
> replacement.
> 
>> is applicable to new approach: query jobs are safe to be called during async
>> jobs.  (Except that new approach opens situation when async job interrupts
> 
> Some of qemu monitor commands block on execution. The job mechanism
> prevents the second query call to get stuck waiting forever for the job
> to finish by doing a timed wait.

This will not change. Monitor serialization has timeout too.

> 
>> query job itself but it generally not clear how this can break query job).


>> On this path job became high level concept which I think it should be.  Also we
>> drop boring passing of current async job which we need only for safety check
>> deep down below the stack.  We drop driver passing for nested jobs which
>> I guess we really don't need to pass at all as we need it only to check queue
>> size and nested job probably should skip this check.
> 
> I did not get what you wanted to say here.

Code becomes more simple and clear.

> 
>> The diff stat looks large but most of code is dumb renaming or dropping unused
>> parameter in patches 4 and 5. To reduce risk of introducing an error on this
>> tedious task I used sed and perl where I can.
>>
>> I ran only a simple test with this series of quering a domain during its
>> migration.
> 
> That is a weak test for doing such a complex change. I'd suggest you run
> parallel migrations and snapshot to see if it works. You need to run
> operations that change the VM configuration.

I don't need event test it as I don't touch code that make them mutually exclusive.

> 
> I don't think we can do this with a complete overhaul of snapshots,
> migration and a few others that actually have an expectation of the VM
> object not changing while we are in an async job.
> 

Please, reevalute this series, I don't unrerstand how you see this series that
radical. Thing stay the same, async jobs are still mutually exclusive with everything
except queries basically. I don't change this.


Nikolay




More information about the libvir-list mailing list