[libvirt] Making panic great again

Fri Apr 28 06:43:30 UTC 2017

On Thu, Apr 27, 2017 at 05:34:21PM -0700, Ed Swierk wrote:
>The panic device is currently documented as a way for "libvirt to receive
>panic notification from a QEMU guest".
>
>This is true, but not the whole story. When a guest triggers the panic
>device, QEMU pauses the guest, and libvirt takes the action specified by
>on_crash. This can interfere with the guest's own crash handling actions
>(e.g. writing a dump file and rebooting itself) if the guest triggers the
>panic device first (as Windows does).
>
>None of this is an obvious side effect of a notification mechanism, so the
>panic device documentation should mention it. (I'll send a documentation
>patch shortly.)
>
>Nor is this a desirable side effect, for guests that are configured to deal
>with crashes themselves. Sure, you can avoid using the panic device with
>such guests, but then virsh list or another application using the libvirt
>API to monitor domain state won't notice guest crashes. And if you still
>want libvirt to take action on guests that don't do it themselves, then you
>have to be careful to include the panic device only for those domains.
>
>Ideally libvirt would offer (1) a state indicating "this guest crashed and
>needs help" independently of triggering an action, and (2) a way to trigger
>an action only when needed to recover from the crash, excluding guests that
>deal with their own crashes.
>
>Sadly pvpanic and the HyperV crash MSR convey only that the guest crashed,
>not whether the guest is configured to take some action on its own. So
>there's no way to know precisely that a crashed (and not paused) guest is
>in need of assistance.
>
>But a state indicating "this guest crashed N minutes ago and hasn't
>rebooted itself" would be a useful approximation. And triggering an action
>N minutes after a guest crash if it hasn't rebooted itself in the meantime
>would make it easy to cap the downtime of crashed domains. Both could be
>implemented without changing either QEMU or panic device semantics.
>
>Does this seem useful to anyone else?
>

I'm trying to understand the situation.  So you have a guest that
handles crashes itself (like kdump, let's say), but on_crash options are
not enough for you:

  - preserve is bad because the guest is not available until someone
    restarts it

  - restart is bad because it doesn't keep the dump anywhere?

  - coredump-restart is bad because it doesn't keep the internal dump?

I have no usage for this, currently, so I'm not the right one to discuss
this, but I feel like you want the guest-handled crash to be uploaded or
saved somewhere and then have libvirt just restart it.  Or configure the
guest not to handle crashes and set on_crash to coredump-restart.

If none of those is working for you and you really need a special case,
it is doable with a short script atop of libvirt.

>--Ed

>--
>libvir-list mailing list
>libvir-list at redhat.com
>https://www.redhat.com/mailman/listinfo/libvir-list
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20170428/79195355/attachment-0001.sig>