[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] [PATCH 1/2] Timeout QEMU monitor replies after 30 seconds

On 06/22/2011 11:05 AM, Jiri Denemark wrote:
> On Wed, Jun 22, 2011 at 16:47:18 +0100, Daniel P. Berrange wrote:
>> If the QEMU process has been stopped (kill -STOP/gdb), or the
>> QEMU process has live-locked itself, then we will never get a
>> reply from the monitor. We should not wait forever in this
>> case, but instead timeout after a reasonable amount of time.
>> NB if the host has high CPU load, or a single monitor command
>> intentionally takes a long time, then this will cause bogus
>> failures. In the case of high CPU load, arguably the guest
>> should have been migrated elsewhere, since you can't effectively
>> manage guests on a host if QEMU is taking > 30 seconds to reply
>> to simply commands. Since we use background migration, there
>> should not be any commands which take significant time to
>> execute any more
> The thing I'm most concerned about is that is far too easy to get into such
> situations especially since disk cache subsystem in Linux kernel is not the
> best thing in the world. While I agree that running guests on a loaded host is
> not very clever and guests should rather be migrated elsewhere, such situation
> doesn't have to be intentional. In other words, in case of a malfunction of
> some kind (some processes go crazy, network disruptions, ...) QEMU may require
> more than a timeout seconds to respond and we will penalize an innocent QEMU
> process because we won't be able to control it anymore even though the issues
> get fixed.

Is there any way to measure time spent by the child process, rather than
just relying on wall-time elapsed?  That is, when libvirt hits 30
seconds of wall time in waiting for a monitor, can it then check whether
the child process has accumulated any execution time (likely hung) vs.
no execution time (likely a starved system situation), and only give up
in the former case?

Eric Blake   eblake redhat com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]