[libvirt] [PATCH] Fix unwanted closing of libvirt client connection
Daniel P. Berrange
berrange at redhat.com
Mon Sep 10 10:36:27 UTC 2012
On Mon, Sep 10, 2012 at 12:30:09PM +0200, Christophe Fergeau wrote:
> e5a1bee07 introduced a regression in Boxes: when Boxes is left idle
> (it's still doing some libvirt calls in the background), the
> libvirt connection gets closed after a few minutes. What happens is
> that this code in virNetClientIOHandleOutput gets triggered:
>
> if (!thecall)
> return -1; /* Shouldn't happen, but you never know... */
>
> and after the changes in e5a1bee07, this causes the libvirt connection
> to be closed.
>
> Upon further investigation, what happens is that
> virNetClientIOHandleOutput is called from gvir_event_handle_dispatch
> in libvirt-glib, which is triggered because the client fd became
> writable. However, between the times gvir_event_handle_dispatch
> is called, and the time the client lock is grabbed and
> virNetClientIOHandleOutput is called, another thread runs and
> completes the current call. 'thecall' is then NULL when the first
> thread gets to run virNetClientIOHandleOutput.
>
> After describing this situation on IRC, danpb suggested this:
>
> 11:37 < danpb> In that case I think the correct thing would be to change
> 'return -1' above to 'return 0' since that's not actually an
> error - its a rare, but expected event
>
> which is what this patch is doing. I've tested it against master
> libvirt, and I didn't get disconnected in ~10 minutes while this
> happens in less than 5 minutes without this patch.
> ---
> src/rpc/virnetclient.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/src/rpc/virnetclient.c b/src/rpc/virnetclient.c
> index 43a9814..727ed67 100644
> --- a/src/rpc/virnetclient.c
> +++ b/src/rpc/virnetclient.c
> @@ -1205,7 +1205,10 @@ virNetClientIOHandleOutput(virNetClientPtr client)
> thecall = thecall->next;
>
> if (!thecall)
> - return -1; /* Shouldn't happen, but you never know... */
> + return 0; /* This can happen if another thread raced with us and
> + * completed the call between the time this thread woke
> + * up from poll()ing and the time we locked the client
> + */
>
> while (thecall) {
> ssize_t ret = virNetClientIOWriteMessage(client, thecall);
Considering how the code worked in the past, the comment was actually
correct originally. We never invoked virNetClientIOHandleOutput from
the event handler callback - we only used the callback for dealing
with *incoming* I/O. Only when we recently changed the keepalive
code to make use of the event handler callbacks did we now start
doing outgoiing I/O. This invalidated the comment & return code we
used here, but introducing the scenario described above.
ACK
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
More information about the libvir-list
mailing list