[libvirt] Re: [Qemu-devel] [PATCH 1/6] Allow multiple monitor devices (v2)

Avi Kivity avi at redhat.com
Thu Apr 9 16:30:19 UTC 2009


Anthony Liguori wrote:
> Avi Kivity wrote:
>> Suppose you have a command which changes the meaning of a 
>> notification.  If a notification arrives before the command 
>> completion, then it happened before the command was executed.
>
> If you want to make that reliable, you cannot have multiple monitors.  

Right.

> Since you can mask notifications, there can be an arbitrarily long 
> time between notification and the event happening.  Socket buffering 
> presents the same problem.  Image:
>
> Monitor 1:
> time 0: (qemu) hotadd_cpu 2
> time 1: (qemu) hello world <no new line>
> time 5: <new line>
> time 6: notification: cpu 2 added
> time 6: (qemu)
>
> Monitor 2:
> time 3: (qemu) hotremove_cpu 2
> time 4: (qemu)
> time 5: notification: cpu 2 removed
> time 6: (qemu)
>
> So to eliminate this, you have to ban multiple monitors.  

Well, not ban multiple monitors, but require that for non-racy operation 
commands and notifications be on the same session.

We can still debug on our dev-only monitor.

> Fine, let's say we did that, it's *still* racy because at time 3, the 
> guest may hot remove cpu 2 on it's own since the guests VCPUs get to 
> run in parallel to the monitor.

A guest can't hotremove a vcpu.  It may offline a vcpu, but that's not 
the same.

Obviously, if both the guest and the management application can initiate 
the same action, then there will be races.  But I don't think that's how 
things should be -- the guest should request a vcpu to be removed (or 
added), management thinks and files forms in triplicate, then hotadds or 
hotremoves the vcpu (most likely after it is no longer needed).

With the proper beaurocracy, there is no race.

>
> And even if you somehow eliminate the issue around masking 
> notifications, you still have socket buffering that introduces the 
> same problem.

If you have one monitor, the problem is much simpler, since events 
travelling in the same direction (command acknowledge and a 
notification) cannot be reordered.  With a command+wait, the problem is 
inherent.

>
> The best you can do is stick a time stamp on a notification and make 
> sure the management tool understands that the notification is 
> reflectively of the state when the event happened, not of the current 
> state.  

Timestamps are really bad.   They don't work at all if the management 
application is not on the same host.  They work badly if it is on the 
same host, since commands and events will be timestamped at different 
processes.

> FWIW, this problem is not at all unique to QEMU and is generally true 
> of most protocols that support an out-of-band notification mechanism.
>

command+wait makes it worse.  Let's stick with established practice.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.




More information about the libvir-list mailing list