[libvirt] [PATCH] Fix unwanted closing of libvirt client connection

Daniel P. Berrange berrange at redhat.com
Mon Sep 10 10:36:27 UTC 2012


On Mon, Sep 10, 2012 at 12:30:09PM +0200, Christophe Fergeau wrote:
> e5a1bee07 introduced a regression in Boxes: when Boxes is left idle
> (it's still doing some libvirt calls in the background), the
> libvirt connection gets closed after a few minutes. What happens is
> that this code in virNetClientIOHandleOutput gets triggered:
> 
> if (!thecall)
>     return -1; /* Shouldn't happen, but you never know... */
> 
> and after the changes in e5a1bee07, this causes the libvirt connection
> to be closed.
> 
> Upon further investigation, what happens is that
> virNetClientIOHandleOutput is called from gvir_event_handle_dispatch
> in libvirt-glib, which is triggered because the client fd became
> writable. However, between the times gvir_event_handle_dispatch
> is called, and the time the client lock is grabbed and
> virNetClientIOHandleOutput is called, another thread runs and
> completes the current call. 'thecall' is then NULL when the first
> thread gets to run virNetClientIOHandleOutput.
> 
> After describing this situation on IRC, danpb suggested this:
> 
> 11:37 < danpb> In that case I think the correct thing would be to change
>                'return -1' above to 'return 0' since that's not actually an
>                error - its a rare, but expected event
> 
> which is what this patch is doing. I've tested it against master
> libvirt, and I didn't get disconnected in ~10 minutes while this
> happens in less than 5 minutes without this patch.
> ---
>  src/rpc/virnetclient.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/src/rpc/virnetclient.c b/src/rpc/virnetclient.c
> index 43a9814..727ed67 100644
> --- a/src/rpc/virnetclient.c
> +++ b/src/rpc/virnetclient.c
> @@ -1205,7 +1205,10 @@ virNetClientIOHandleOutput(virNetClientPtr client)
>          thecall = thecall->next;
>  
>      if (!thecall)
> -        return -1; /* Shouldn't happen, but you never know... */
> +        return 0; /* This can happen if another thread raced with us and
> +                   * completed the call between the time this thread woke
> +                   * up from poll()ing and the time we locked the client
> +                   */
>  
>      while (thecall) {
>          ssize_t ret = virNetClientIOWriteMessage(client, thecall);

Considering how the code worked in the past, the comment was actually
correct originally. We never invoked virNetClientIOHandleOutput from
the event handler callback - we only used the callback for dealing
with *incoming* I/O. Only when we recently changed the keepalive
code to make use of the event handler callbacks did we now start
doing outgoiing I/O. This invalidated the comment & return code we
used here, but introducing the scenario described above.

ACK

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|




More information about the libvir-list mailing list