[libvirt] race between qemu monitor startup and blocking migrate -incoming

Charles Duffy charles at dyfis.net
Fri Aug 28 04:01:24 UTC 2009


Howdy, 'yall.

I'm having issues with virDomainRestore failing, particularly under load 
-- even in 0.7.0, when there's no need to parse through qemu's output to 
find the monitor PTY.

Digging through strace output of libvirtd and the qemu processes it 
spawns, this is happening when qemu blocks on the migrate -incoming and 
ceases to respond to the monitor socket -- though some versions of qemu 
can go into this state before the monitor socket is even opened, leading 
to libvirt timing out either while attempting to open the monitor socket 
or while trying to read therefrom, and subsequently killing the qemu 
instance it spawned while that instance is still attempting to migrate 
in its old saved state.

Both of qemu-0.11.0-rc1 and qemu-kvm master have some form of blocking 
in -incoming exec: which can prevent libvirt from successfully carrying 
through a resume; I have reproduced the issue (and maintain logs from 
strace, available on request) irrespective of the state of Chris 
Lalancette's "Fix detached migration with exec" and "Allow monitor 
interaction when using migrate -exec" patches. The qemu binaries being 
used _appear_ to correctly allow monitor interaction prior to -incoming 
exec:... completion when interactively invoked in the trivial case shown 
below:

   $ qemu-system-x86_64 \
       -monitor stdio \
       -nographic \
       -serial file:/dev/null \
       -incoming 'exec:sleep 5; echo DONE >&2; kill $PPID' \
       /dev/null
   QEMU 0.10.91 monitor - type 'help' for more information
   (qemu) DONE
   $

...however, whether these same binaries work as-expected when invoked 
from libvirt by our automated test system under load is 
nondeterministic. (I have yet to reproduce the issue in a low-load 
environment using "virsh restore").

Is someone else working on this? Is a known-good (or believed-good) 
libvirt/qemu pair available? What can I do to help in getting this issue 
resolved?

Thanks!

---

libvirt-0.7.0 + qemu-kvm-0.11.0-rc1
qemudReadMonitorOutput:728 : internal error Timed out while reading 
monitor startup output

libvirt-0.6.5 + qemu-kvm-0.11.0-rc1
error : qemudReadMonitorOutput:705 : internal error Timed out while 
reading monitor startup output
error : qemudWaitForMonitor:1003 : internal error unable to start guest: 
char device redirected to /dev/pts/9
libvir: QEMU error : internal error unable to start guest: char device 
redirected to /dev/pts/9
^^ particularly interesting, as the above line should have been eaten by 
qemudExtractMonitorPath rather than emitted as error text

---

<aliguori> -incoming is blocking
<aliguori> you cannot interact with the monitor during -incoming
<mDuff> ...shouldn't we always be opening the monitor before starting 
the blocking -incoming bits, though? I don't always see that happening 
(and have an strace handy where it certainly doesn't).
<aliguori> no
<aliguori> well, i think they added some patches for that
<aliguori> but originally, that's not how it worked
<aliguori> and i think it's silly to work that way
<aliguori> -incoming should mean, wait patiently for an incoming migration
<aliguori> there's no point in interfacing with the monitor in the interim
<mDuff> I agree that interacting may not be called for, but at least 
connect()ing -- if it's a UNIX socket, the other side won't be able to 
connect at all until qemu goes first...
<aliguori> heh, well....
<aliguori> that particular race condition is addressed by -daemonize
<aliguori> because that's generally true
<aliguori> you don't know how long qemu will take to open the monitor
<aliguori> but -daemonize makes gives you notification because it 
doesn't daemonize the process until you've gotten to the point where all 
sockets are open
<aliguori> but IIRC, libvirt doesn't use -daemonize




More information about the libvir-list mailing list