[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[libvirt] race between qemu monitor startup and blocking migrate -incoming



Howdy, 'yall.

I'm having issues with virDomainRestore failing, particularly under load -- even in 0.7.0, when there's no need to parse through qemu's output to find the monitor PTY.

Digging through strace output of libvirtd and the qemu processes it spawns, this is happening when qemu blocks on the migrate -incoming and ceases to respond to the monitor socket -- though some versions of qemu can go into this state before the monitor socket is even opened, leading to libvirt timing out either while attempting to open the monitor socket or while trying to read therefrom, and subsequently killing the qemu instance it spawned while that instance is still attempting to migrate in its old saved state.

Both of qemu-0.11.0-rc1 and qemu-kvm master have some form of blocking in -incoming exec: which can prevent libvirt from successfully carrying through a resume; I have reproduced the issue (and maintain logs from strace, available on request) irrespective of the state of Chris Lalancette's "Fix detached migration with exec" and "Allow monitor interaction when using migrate -exec" patches. The qemu binaries being used _appear_ to correctly allow monitor interaction prior to -incoming exec:... completion when interactively invoked in the trivial case shown below:

  $ qemu-system-x86_64 \
      -monitor stdio \
      -nographic \
      -serial file:/dev/null \
      -incoming 'exec:sleep 5; echo DONE >&2; kill $PPID' \
      /dev/null
  QEMU 0.10.91 monitor - type 'help' for more information
  (qemu) DONE
  $

...however, whether these same binaries work as-expected when invoked from libvirt by our automated test system under load is nondeterministic. (I have yet to reproduce the issue in a low-load environment using "virsh restore").

Is someone else working on this? Is a known-good (or believed-good) libvirt/qemu pair available? What can I do to help in getting this issue resolved?

Thanks!

---

libvirt-0.7.0 + qemu-kvm-0.11.0-rc1
qemudReadMonitorOutput:728 : internal error Timed out while reading monitor startup output

libvirt-0.6.5 + qemu-kvm-0.11.0-rc1
error : qemudReadMonitorOutput:705 : internal error Timed out while reading monitor startup output error : qemudWaitForMonitor:1003 : internal error unable to start guest: char device redirected to /dev/pts/9 libvir: QEMU error : internal error unable to start guest: char device redirected to /dev/pts/9 ^^ particularly interesting, as the above line should have been eaten by qemudExtractMonitorPath rather than emitted as error text

---

<aliguori> -incoming is blocking
<aliguori> you cannot interact with the monitor during -incoming
<mDuff> ...shouldn't we always be opening the monitor before starting the blocking -incoming bits, though? I don't always see that happening (and have an strace handy where it certainly doesn't).
<aliguori> no
<aliguori> well, i think they added some patches for that
<aliguori> but originally, that's not how it worked
<aliguori> and i think it's silly to work that way
<aliguori> -incoming should mean, wait patiently for an incoming migration
<aliguori> there's no point in interfacing with the monitor in the interim
<mDuff> I agree that interacting may not be called for, but at least connect()ing -- if it's a UNIX socket, the other side won't be able to connect at all until qemu goes first...
<aliguori> heh, well....
<aliguori> that particular race condition is addressed by -daemonize
<aliguori> because that's generally true
<aliguori> you don't know how long qemu will take to open the monitor
<aliguori> but -daemonize makes gives you notification because it doesn't daemonize the process until you've gotten to the point where all sockets are open
<aliguori> but IIRC, libvirt doesn't use -daemonize


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]