[libvirt] Re: [Qemu-devel] [PATCH 1/6] Allow multiple monitor devices (v2)

Thu Apr 9 17:00:53 UTC 2009

Anthony Liguori wrote:
> Avi Kivity wrote:
>>> Fine, let's say we did that, it's *still* racy because at time 3, 
>>> the guest may hot remove cpu 2 on it's own since the guests VCPUs 
>>> get to run in parallel to the monitor.
>>
>> A guest can't hotremove a vcpu.  It may offline a vcpu, but that's 
>> not the same.
>>
>> Obviously, if both the guest and the management application can 
>> initiate the same action, then there will be races.  But I don't 
>> think that's how things should be -- the guest should request a vcpu 
>> to be removed (or added), management thinks and files forms in 
>> triplicate, then hotadds or hotremoves the vcpu (most likely after it 
>> is no longer needed).
>>
>> With the proper beaurocracy, there is no race.
>
> You still have the same basic problem:
>
> time 0: (qemu) notify-enable vnc-events
> time 1: (qemu) foo <no newline>
> time 4: <newline>
> time 5: notification: client connected
>
> time 0: vnc client connects
> time 2: vnc client disconnects
>
> At time 5, when the management app gets the notification, it cannot 
> make any assumptions about the state of the system.  You still need 
> timestamps.

You don't even need the foo <no newline> to trigger this, qemu->user 
traffic can be arbitrarily delayed (I don't think we should hold 
notifications on partial input anyway).  But there's no race here.

The notification at time 5 means that the connect happened sometime 
before time 5, and that it may not be true now.  The user cannot assume 
anything.  A race can only happen against something the user initiated.

Suppose we're implementing some kind of single sign on:

(qemu) notify vnc on

... time passes, we want to allow members of group x to log in

(qemu) vnc_set_acl group:x
OK
(qemu)
notification: vnc connect aliguori
(qemu)

with a single monitor, we can be sure that the connect happened the 
vnc_set_acl.  If the notification arrives on a different session, we 
have no way of knowing that.
>
>>>
>>> And even if you somehow eliminate the issue around masking 
>>> notifications, you still have socket buffering that introduces the 
>>> same problem.
>>
>> If you have one monitor, the problem is much simpler, since events 
>> travelling in the same direction (command acknowledge and a 
>> notification) cannot be reordered.  With a command+wait, the problem 
>> is inherent.
>
> Command acknowledge is not an event.  Events are out-of-band.  Command 
> completions are in-band.  Right now, they are synchronous and

That's all correct, but I don't see how that changes anything.

> I expect that in the short term future, we'll have a non-human monitor 
> mode that allows commands to be asynchronous.

Then let's defer this until then?  'wait' is not useful for humans, they 
won't be retyping 'wait' every time something happens.

>
> However, it's a mistake to muddle the distinction between an in-band 
> completion and an out-of-band event.  You cannot relate the 
> out-of-band events commands.

I can, if one happens before the other, and I have a single stream of 
command completions and event notifications.

>
>>>
>>> The best you can do is stick a time stamp on a notification and make 
>>> sure the management tool understands that the notification is 
>>> reflectively of the state when the event happened, not of the 
>>> current state.  
>>
>> Timestamps are really bad.   They don't work at all if the management 
>> application is not on the same host.  They work badly if it is on the 
>> same host, since commands and events will be timestamped at different 
>> processes.
>
> Timestamps are relative, not absolutely.  They should not be used to 
> associate anything with the outside world.  In fact, I have no problem 
> making the timestamps relative to QEMU startup just to ensure that 
> noone tries to do something silly like associate notification 
> timestamps with system time.

Dunno, seems totally artificial to me to have to introduce timestamps to 
compensate for different delays in multiple sockets that we introduced 
five patches earlier.

Please, let's keep this simple.

>
>>> FWIW, this problem is not at all unique to QEMU and is generally 
>>> true of most protocols that support an out-of-band notification 
>>> mechanism.
>>>
>>
>> command+wait makes it worse.  Let's stick with established practice.
>
> What's the established practice?  Do you know of any protocol that is 
> line based that does notifications like this?

I guess most MUDs?

>
> IMAP IDLE is pretty close to "wait-forever".

IMAP IDLE can be terminated by the client, and so does not require 
multiple sessions (though IMAP supports them).

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.