[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] [PATCH] events: Fix domain event race on client disconnect



On Thu, Sep 06, 2012 at 03:06:41PM +0200, Christophe Fergeau wrote:
> GNOME Boxes sometimes stops getting domain events from libvirtd, even
> after restarting it. Further investigation in libvirtd shows that
> events are properly queued with virDomainEventStateQueue, but the
> timer virDomainEventTimer which flushes the events and sends them to
> the clients never gets called. Looking at the event queue in gdb
> shows that it's non-empty and that its size increases with each new
> events.
> 
> virDomainEventTimer is set up in virDomainEventStateRegister[ID]
> when going from 0 client connecte to 1 client connected, but is
> initially disabled. The timer is removed in
> virDomainEventStateRegister[ID] when the last client is disconnected
> (going from 1 client connected to 0).
> 
> This timer (which handles sending the events to the clients) is
> enabled in virDomainEventStateQueue when queueing an event on an
> empty queue (queue containing 0 events). It's disabled in
> virDomainEventStateFlush after flushing the queue (ie removing all
> the elements from it). This way, no extra work is done when the queue
> is empty, and when the next event comes up, the timer will get
> reenabled because the queue will go from 0 event to 1 event, which
> triggers enabling the timer.
> 
> However, with this Boxes bug, we have a client connected (Boxes), a
> non-empty queue (there are events waiting to be sent), but a disabled
> timer, so something went wrong.
> 
> When Boxes connects (it's the only client connecting to the libvirtd
> instance I used for debugging), the event timer is not set as expected
> (state->timer == -1 when virDomainEventStateRegisterID is called),
> but at the same time the event queue is not empty. In other words,
> we had no clients connected, but pending events. This also explains
> why the timer never gets enabled as this is only done when an event
> is queued on an empty queue.
> 
> I think this can happen if an event gets queued using
> virDomainEventStateQueue and the client disconnection happens before
> the event timer virDomainEventTimer gets a chance to run and flush
> the event. In this situation, virDomainEventStateDeregister[ID] will
> get called with a non-empty event queue, the timer will be destroyed
> if this was the only client connected. Then, when other clients connect
> at a later time, they will never get notified about domain events as
> the event timer will never get enabled because the timer is only
> enabled if the event queue is empty when virDomainEventStateRegister[ID]
> gets called, which will is no longer the case.

A nice long detailed explanation. I agree that this scenario you
outline is plausible as an explanation for why Boxes sometimes
stops getting events from libvirtd. It also explains why we don't
see it with VDSM - since they're a long running service, not a
frequently stoppped/started desktop app, they're much less likely
to hit the race.


> To avoid this issue, this commit makes sure to remove all events from
> the event queue when the last client in unregistered. As there is
> no longer anyone interested in receiving these events, these events
> are stale so there is no need to keep them around. A client connecting
> later will have no interest in getting events that happened before it
> got connected.

Yep, makes sense.

> ---
>  src/conf/domain_event.c | 24 +++++++++++++++++++++---
>  1 file changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/src/conf/domain_event.c b/src/conf/domain_event.c
> index 43ecdcf..7ab973b 100644
> --- a/src/conf/domain_event.c
> +++ b/src/conf/domain_event.c
> @@ -525,13 +525,13 @@ void virDomainEventFree(virDomainEventPtr event)
>  }
>  
>  /**
> - * virDomainEventQueueFree:
> + * virDomainEventQueueClear:
>   * @queue: pointer to the queue
>   *
> - * Free the memory in the queue. We process this like a list here
> + * Removes all elements from the queue
>   */
>  static void
> -virDomainEventQueueFree(virDomainEventQueuePtr queue)
> +virDomainEventQueueClear(virDomainEventQueuePtr queue)
>  {
>      int i;
>      if (!queue)
> @@ -541,6 +541,22 @@ virDomainEventQueueFree(virDomainEventQueuePtr queue)
>          virDomainEventFree(queue->events[i]);
>      }
>      VIR_FREE(queue->events);
> +    queue->count = 0;
> +}
> +
> +/**
> + * virDomainEventQueueFree:
> + * @queue: pointer to the queue
> + *
> + * Free the memory in the queue. We process this like a list here
> + */
> +static void
> +virDomainEventQueueFree(virDomainEventQueuePtr queue)
> +{
> +    if (!queue)
> +        return;
> +
> +    virDomainEventQueueClear(queue);
>      VIR_FREE(queue);
>  }
>  
> @@ -1559,6 +1575,7 @@ virDomainEventStateDeregister(virConnectPtr conn,
>          state->timer != -1) {
>          virEventRemoveTimeout(state->timer);
>          state->timer = -1;
> +        virDomainEventQueueClear(state->queue);
>      }
>  
>      virDomainEventStateUnlock(state);
> @@ -1596,6 +1613,7 @@ virDomainEventStateDeregisterID(virConnectPtr conn,
>          state->timer != -1) {
>          virEventRemoveTimeout(state->timer);
>          state->timer = -1;
> +        virDomainEventQueueClear(state->queue);
>      }
>  
>      virDomainEventStateUnlock(state);

ACK


Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]