[libvirt] Libvirt domain event usage and consistency

Roman Mohr rmohr at redhat.com
Fri Nov 25 16:54:05 UTC 2016


On Fri, Nov 25, 2016 at 4:34 PM, Michal Privoznik <mprivozn at redhat.com>
wrote:

> On 25.11.2016 14:38, Roman Mohr wrote:
> > Hi,
> >
> > I recently started to use the libvirt domain events. With them I increase
> > the responsiveness of my VM state wachers.
> > In general it works pretty well. I just listen to the events and do a
> > periodic resync to cope with missed events.
> >
> > While watching the events I ran into a few interesting situations I
> wanted
> > to share. The points 1-3 describe some minor issues or irregularities.
> > Point 4 is about the fact that domain and state updates are not versioned
> > which makes it very hard to stay in sync with libvirt when using events.
> >
> > My libvirt version is 1.2.18.4.
>
> This might be the root cause. I'm unable to see some of the scenarios
> you're seeing. Have you tried the latest release (or even git HEAD) to
> check whether all the scenarios you are describing still stand?
>

Definitely better with latest HEAD but still it does not look completely
right.

>
> >
> > 1) Event order seems to be weird on startup:
> >
> > When listening for VM lifecycle events I get this order:
> >
> > {"event_type": "Started", "timestamp": "2016-11-25T11:59:53.209326Z",
> > "reason": "Booted", "domain_name": "generic", "domain_id":
> > "8ff7047b-fb46-44ff-a4c6-7c20c73ab86e"}
> > {"event_type": "Defined", "timestamp": "2016-11-25T11:59:53.435530Z",
> > "reason": "Added", "domain_name": "generic", "domain_id":
> > "8ff7047b-fb46-44ff-a4c6-7c20c73ab86e"}
> >
> > It is strange that a VM already boots before it is defined. Is this the
> > intended order?
>
> I don't see this order so probable this is fixed upstream.
>

On latest master a normal creation emits these events:

event 'lifecycle' for domain testvm: Resumed Unpaused
event 'lifecycle' for domain testvm: Started Booted

The Resumed event looks wrong. Further I get no more Defined/Undefined
events. Maybe they were removed?


>
> >
> > 2) Defining a VM with VIR_DOMAIN_START_PAUSED gives me this event order
>
> I don't think you can define a domain with that flag. What's the actual
> action?
>

That is the flag for the api, when using virsh using `--paused` does that.


>
> >
> > {"event_type": "Defined", "timestamp": "2016-11-25T12:02:44.037817Z",
> > "reason": "Added", "domain_name": "core_node", "domain_id":
> > "b9906489-6d5b-40f8-a742-ca71b2b84277"}
> > {"event_type": "Resumed", "timestamp": "2016-11-25T12:02:44.813104Z",
> > "reason": "Unpaused", "domain_name": "core_node", "domain_id":
> > "b9906489-6d5b-40f8-a742-ca71b2b84277"}
> > {"event_type": "Started", "timestamp": "2016-11-25T12:02:44.813733Z",
> > "reason": "Booted", "domain_name": "core_node", "domain_id":
> > "b9906489-6d5b-40f8-a742-ca71b2b84277"}
>
>
> Interesting, so here is "defined" event delivered before the "started"
> event. Also - where is "suspended" event?
>


With latest master the situation looks better. Now I see

event 'lifecycle' for domain testvm: Started Booted
event 'lifecycle' for domain testvm: Suspended Paused


>
> >
> > This boot-order makes it hard to track active domains by listening to
> > life-cycle events. One could theoretically still always fetch the VM
> state
> > in the event callback and check the state, but if the state is not
> > immediately transferred with the event itself, it can already be
> outdated,
> > so this might be racy (intransparent for the libvirt bindings user), and
> as
> > described in (3) currently not even possible. In general the real
> existing
> > events seem to differ quite significantly from the described life-cycle
> in
> > [1].
>
> Again, in the upstream I see something different:
> event 'lifecycle' for domain $domain: Started Booted
> event 'lifecycle' for domain $domain: Suspended Paused
>
>
On master I see that too when I start the VM with `virsh create --paused`.


>
> >
> > 3) "Defined" event is triggered before the domain is completely defined
> >
> > {"event_type": "Defined", "timestamp": "2016-11-25T12:02:44.037817Z",
> > "reason": "Added", "domain_name": "core_node", "domain_id":
> > "b9906489-6d5b-40f8-a742-ca71b2b84277"}
> > {"event_type": "Resumed", "timestamp": "2016-11-25T12:02:44.813104Z",
> > "reason": "Unpaused", "domain_name": "core_node", "domain_id":
> > "b9906489-6d5b-40f8-a742-ca71b2b84277"}
> > {"event_type": "Started", "timestamp": "2016-11-25T12:02:44.813733Z",
> > "reason": "Booted", "domain_name": "core_node", "domain_id":
> > "b9906489-6d5b-40f8-a742-ca71b2b84277"}
> >
> > When I try to process the first event and do a xmldump I get:
> >
> >    Event: [Code-42] [Domain-10] Domain not found: no domain with matching
> > uuid 'b9906489-6d5b-40f8-a742-ca71b2b84277' (core_node)
> >
> > So it seems like I get the event before the domain is completely ready.
>
> You know that you shouldn't be calling libvirt APIs from event callbacks?


No, good to know. Anyway, just tried to work around the above problems.


So if the Defined/Undefined events were removed deliberately, then only the
problem with the 'Resumed' event on a normal VM start remains.


> >
> > 4) There libvirt domain description is not versioned
> >
> > I would expect that every time I update a domainxml (update from third
> > party entity), or an event is generated (update from libvirt), that the
> > resource version of a Domain is increased and that I get this resource
> > version when I do a xmldump or when I get an event. Without this there is
> > afaik no way to stay in sync with libvirt, even if you do regular polling
> > of all domains. The main issue here is that I can never know if events in
> > the queue arrived before my latest domain resync or after it.
> >
> > Also not that this is not about delivery guarantees of events. It is just
> > about having a consistent view of a VM and the individual event. If I
> have
> > resource versions, I can decide if an event is still interesting for me
> or
> > not, which is exactly what I need to solve the syncing problem above.
> > When I do a complete relisting of all domains to syn, I know which
> version
> > I got and I can then see on every event if it is newer or older.
> >
> > If along side with the event, the domain xml, the VM state, and the
> > resource version would be sent to a client, it would be even better.
> Then,
> > whenever there is a new event for a VM in the queue, I can be sure that
> > this domainxml I see is the one which triggered the event. This xml is
> then
> > a complete representation for this revision number.
>
> I recall some people asking for this. Basically, they were worried about
> somebody from outside could manipulate their XMLs without them knowing.
> Frankly I don't recall what was our answer to that.


> Having a version number in live XML makes sense. However, it makes less
> sense for config XML - there would be no way how to start with version
> #0 once I've edited the file.


> Michal
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20161125/dda2c6d8/attachment-0001.htm>


More information about the libvir-list mailing list