[libvirt] [RFC 0/2] Fix detection of slow guest shutdown

Christian Ehrhardt christian.ehrhardt at canonical.com
Mon Aug 6 10:57:41 UTC 2018


On Mon, Aug 6, 2018 at 10:47 AM Daniel P. Berrangé <berrange at redhat.com>
wrote:

> On Mon, Aug 06, 2018 at 07:20:10AM +0200, Christian Ehrhardt wrote:
> > In that case I wonder what the libvirt community thinks of the proposed
> > general "Pid is gone means we can assume it is dead" approach?
>
> The key thing with the shutdown process is that we use the dissapperance of
> the PID as the flag to indicate that it is safe to release any resources
> that
> the PID was using. eg the hostdevs are now available for another guest to
> use.
>
> I'd be concerned that if we looking /proc/$PID going away as the flag, then
> we would be releasing the hostdevs for reuse, before the kernel has cleaned
> them up. In the best case this would result in a 2nd guest failing to start
> because the device was still in the case, in the worst case we could crash
> the entire host (though I'd be hopeful vfio prevents that).
>

Yeah I agree that ressources being in use could lead to bad and rather hard
to debug problems.

> An alternative would be to understand on the Kernel side why the PID is
> > gone "too early" and fix that so it stays until fully cleaned up.
> > But even then on the Libvirt side we would need the extended timeout
> values.
>
> Yeah, looks like extended timeouts are unavoidable. The only real
> optimization
> would be to pass an explicit timeout to the kill method, increasing it by 2
> seconds for each hostdev that is assigned. That way we'll scale the timeout
> up as we need, so don't have to predict the worst case number of assigned
> devices.
>

I'd do both:
- extending the KILL path (if force is set) timeout in general to give bad
systems a chance
- extend the maximum by 2s per hostdev

I'll submit that in a few minutes as a reply.


> Regards,
> Daniel
> --
> |: https://berrange.com      -o-
> https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-
> https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-
> https://www.instagram.com/dberrange :|
>


-- 
Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20180806/0b5a67c6/attachment-0001.htm>


More information about the libvir-list mailing list