[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] [PATCH 1/2] Timeout QEMU monitor replies after 30 seconds



On Wed, Jun 22, 2011 at 07:05:08PM +0200, Jiri Denemark wrote:
> On Wed, Jun 22, 2011 at 16:47:18 +0100, Daniel P. Berrange wrote:
> > If the QEMU process has been stopped (kill -STOP/gdb), or the
> > QEMU process has live-locked itself, then we will never get a
> > reply from the monitor. We should not wait forever in this
> > case, but instead timeout after a reasonable amount of time.
> > 
> > NB if the host has high CPU load, or a single monitor command
> > intentionally takes a long time, then this will cause bogus
> > failures. In the case of high CPU load, arguably the guest
> > should have been migrated elsewhere, since you can't effectively
> > manage guests on a host if QEMU is taking > 30 seconds to reply
> > to simply commands. Since we use background migration, there
> > should not be any commands which take significant time to
> > execute any more
> 
> The thing I'm most concerned about is that is far too easy to get into such
> situations especially since disk cache subsystem in Linux kernel is not the
> best thing in the world. While I agree that running guests on a loaded host is
> not very clever and guests should rather be migrated elsewhere, such situation
> doesn't have to be intentional. In other words, in case of a malfunction of
> some kind (some processes go crazy, network disruptions, ...) QEMU may require
> more than a timeout seconds to respond and we will penalize an innocent QEMU
> process because we won't be able to control it anymore even though the issues
> get fixed.

  It's clearly a trade-off and the reason why it must be configurable
30s is a lot already. It's a first shot, and I'm sure feedback will
suggest to add more logic around that basic timeout based error
detection. Right now the problem is that never failing the call is
a serious issue, and can block the whole process too (like on daemon
restart when trying to reconnect to a stuck guest).

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel veillard com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]