[libvirt] [PATCH 1/2] Timeout QEMU monitor replies after 30 seconds

Eric Blake eblake at redhat.com
Wed Jun 22 17:26:27 UTC 2011


On 06/22/2011 11:05 AM, Jiri Denemark wrote:
> On Wed, Jun 22, 2011 at 16:47:18 +0100, Daniel P. Berrange wrote:
>> If the QEMU process has been stopped (kill -STOP/gdb), or the
>> QEMU process has live-locked itself, then we will never get a
>> reply from the monitor. We should not wait forever in this
>> case, but instead timeout after a reasonable amount of time.
>>
>> NB if the host has high CPU load, or a single monitor command
>> intentionally takes a long time, then this will cause bogus
>> failures. In the case of high CPU load, arguably the guest
>> should have been migrated elsewhere, since you can't effectively
>> manage guests on a host if QEMU is taking > 30 seconds to reply
>> to simply commands. Since we use background migration, there
>> should not be any commands which take significant time to
>> execute any more
> 
> The thing I'm most concerned about is that is far too easy to get into such
> situations especially since disk cache subsystem in Linux kernel is not the
> best thing in the world. While I agree that running guests on a loaded host is
> not very clever and guests should rather be migrated elsewhere, such situation
> doesn't have to be intentional. In other words, in case of a malfunction of
> some kind (some processes go crazy, network disruptions, ...) QEMU may require
> more than a timeout seconds to respond and we will penalize an innocent QEMU
> process because we won't be able to control it anymore even though the issues
> get fixed.

Is there any way to measure time spent by the child process, rather than
just relying on wall-time elapsed?  That is, when libvirt hits 30
seconds of wall time in waiting for a monitor, can it then check whether
the child process has accumulated any execution time (likely hung) vs.
no execution time (likely a starved system situation), and only give up
in the former case?

-- 
Eric Blake   eblake at redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 619 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20110622/547de543/attachment-0001.sig>


More information about the libvir-list mailing list