[libvirt] [PATCH 1/2] qemu: Add support for changing timeout value to open unix monitor socket

Daniel P. Berrange berrange at redhat.com
Tue Jan 28 16:48:28 UTC 2014


On Mon, Jan 27, 2014 at 11:28:31AM +0000, Daniel P. Berrange wrote:
> On Fri, Jan 24, 2014 at 05:17:02PM +0100, Martin Kletzander wrote:
> > On Fri, Jan 24, 2014 at 12:56:43PM +0000, Daniel P. Berrange wrote:
> > > On Thu, Jan 23, 2014 at 07:47:54PM +0200, Pavel Fux wrote:
> > > > there are 8 servers with 8 vms on each server. all the qcow images are on
> > > > the nfs share on the same external server.
> > > > we are starting all 64 vms at the same time.
> > > > each vm is 2.5GB X 64vms = 160GB = 1280Gb
> > > > to read all of the data on a 1Gbe interface will take 1280sec = 21.3min
> > > > not all of the image is being read on boot so it takes only 5min
> > >
> > > That's interesting, but it still doesn't explain the failures. QEMU will
> > > start listening on its monitor socket before it even opens any of the
> > > disk images. So the time it takes to read disk images on boot should have
> > > no relevance to timeouts waiting for the monitor socket. All it does between
> > > exec of the QEMU binary and listening for the monitor socket is to loaded
> > > libraries QEMU is linked against and load a few misc pieces like BIOS
> > > firmware blobs. I just can't see a reason why this would take anywhere
> > > near 5 minutes - it should be a matter of a few seconds at worst.
> > >
> > 
> > I think it does a little bit more than that, but I have no proof for
> > it.  When you look for most occurrences of this error wrt virt-manager
> > (I'm not sure why, maybe because people using virsh deal with it
> > themselves), you'll find that most of them are caused by a managed
> > save.  When qemu is loading, it takes more than those 3 seconds we had
> > before, and it fails to start the machine.  The thing is that there is
> > nothing else weird on those machines, removing the managed save solves
> > everything.  And that's why I think it at least loads some
> > initialization values (in some special cases), although I haven't been
> > able to reproduce that.
> 
> Hmm, I was thinking it might be something related to socket connect/accept
> synchronization. QEMU will listen() very early, but won't accept() until
> very late in startup. I've just confirmed in a test though that connect()
> will succeed even if the app doesn't call accept(), since the kernel will
> complete the connection at the protocol level and just queue the client.
> So that doesn't explain it yet.

I did a test with QEMU by adding a 'sleep(20)' into the QEMU main()
method in vl.c. It only causes QEMU startup  failures if we put the
sleep right after parsing command line args. Once QEMU has done a
listen() on the socket, libvirt handles arbitrary delays without
issue.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|




More information about the libvir-list mailing list