[libvirt] Race between monitor startup and incoming migration impacting libvirt

There appears to be a race condition wherein a 'cont' command sent immediately on qemu startup can prevent a inbound migration specified via -incoming from occurring. libvirt's process for starting up qemu domains with an incoming migration includes with a 'cont' command at the end of qemudInitCpus, shortly after a successful connection with the monitor is made. While the libvirt monitor is generally unresponsive while an inbound migration is ongoing, forcing the 'cont' to occur only after the migration has completed, this isn't always true (as will be demonstrated below).

I suspect strongly that this is responsible for an occasional failure I'm seeing when loading libvirt domains from file.

This is highly reproducible using qemu-kvm-0.11.0-rc2, and straightforward to demonstrate by the following means:

- Build an appropriate ramsave file via migrating a stopped guest to disk.
    - Mark any backing store used by this guest read-only.

- Create an empty qcow2 file backed by the read-only store, if your guest has any disks. - Invoke qemu with arguments appropriate to the VM being resumed, and also the following: -S -monitor stdio -incoming 'exec:echo START_DELAY >&2 && sleep 5 && echo END_DELAY >&2 && cat <ramsave.raw && echo LOAD_DONE >&2'.

    - Wait until 'LOAD_DONE' is displayed, and run 'cont'
    - The VM will correctly resume.

    - Run 'cont' after START_DELAY is displayed, but before END_DELAY.
    - 'cat: write error: Broken pipe' will be displayed.
- The guest VM will reboot, enter a catatonic state, or otherwise fail to load correctly.

As the 'sleep 5' used in the above may be considered cheating, this issue may also be reproduced without any delay by removing the 'sleep', and terminating the shell command used to invoke qemu with <<<$'cont\n'

    Included for completeness, as libvirt 0.7.x uses UNIX sockets here.
    Use -monitor unix:tmp/test.monitor during qemu invocation, and
    - Invoke the following in a separate window:
      socat - UNIX-LISTEN:/tmp/test.monitor <<<$'cont\n'
    - Invoke qemu as above, but with -monitor unix:/tmp/test.monitor

I have a work-in-progress patch which modifies libvirt to use -daemonize for startup; waiting for the guest to detach before attempting to interact with the monitor may avoid this issue. However, as this patch is against libvirt master, and the master branch has other issues which expose themselves on virDomainRestore, I am unable to test it here.

Thoughts (and workarounds) welcome.

