[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] [RFC PATCH v2 0/3] Start fixing the pvpanic mess



On Wed, Aug 21, 2013 at 11:02:56AM -0600, Eric Blake wrote:
> On 08/21/2013 10:51 AM, Paolo Bonzini wrote:
> > Il 21/08/2013 18:48, Daniel P. Berrange ha scritto:
> >> No, <on_crash> is the right thing to be using for this from
> >> libvirt's pov & I don't think we should invent something new.
> >> The <on_crash> element has always been intended to represent
> >> handling of guest panics, not qemu internal errors.
> > 
> > Actually for Xen HVM guests, it mostly traps things such as failed
> > vmentries.  The Xen PV-on-HVM drivers do not register a panic notifier
> > that moves the guest to the "crashed" state.
> > 
> > <on_crash> cannot be salvaged, in my opinion, because all domain XMLs in
> > the wild will have a setting that causes libvirt to add "-device
> > isa-pvpanic".  Thus changing libvirt versions will change guest
> > hardware, which is _very_ bad.
> 
> Let's expand on that statement:
> 
> Libvirt's default for <on_crash> is 'destroy'.  But virt-install (and
> thus virt-manager) have been setting explicit 'restart' for AGES now.
> 
> Arguably, this is YET ANOTHER reason why virt-manager should be using
> libosinfo to make sane choices about new guest XML, based on known
> capabilities of the guest it will be installing.  But that only affects
> newly created guests after we fix the virt stack.
> 
> In the meantime, you have a point that we have a back-compat mess - we
> promise ABI stability (guests shall not see hardware changes when
> upgrading versions of libvirtd but leaving the XML unchanged - the only
> way to change hardware seen by an existing guest is to explicitly modify
> XML).
> 
> > 
> > In addition, Windows XP and 2003 will show the annoying device wizard
> > upon a libvirt upgrade, and fixing this is what surfaced all the mess.
> 
> Yes, so we need the back-compat code to leave pvpanic out of
> pre-existing guests, if we can find a way to sensibly do that.
> 
> So, this boils down to a question of what SHOULD the valid states for
> <on_crash> be?  Generically, we want <on_crash>destroy</on_crash> to not
> invalidate a guest, but also to not instantiate a pvpanic device; since
> that covers the libvirt defaults.  We also want
> <on_crash>restart</on_crash> to not invalidate a guest, but also to not
> instantiate a pvpanic device, since so many existing guests have that
> setting thanks to virt-install.
> 
> Maybe that means we add attributes/sub-elements to <on_crash> that
> express whether pvpanic device is permitted; and the absence of that
> attribute means the status quo (the <on_crash> tag is effectively
> ignored because without pvpanic device, there is no way for libvirt to
> learn if a guest panicked).  Or does it mean we expose a new sub-element
> of <devices>, similar to how we have a <memballoon> subelement that
> controls whether the memballoon device is show to the guest, and just
> document that for qemu, <on_crash> is a no-op without the <pvpanic>
> subelement?


This is a QEMU bug that you happened to be Cc'd on.
So you started worry about supporting a buggy QEMU.
This is generally futile.

There are uncounted bugs that we silently fixed.
They are often much more major than this silly reversibility bug.
Some bios versions have racy hotplug support so
hotplug event can be missed.
Should libvirt warn the user that bios is broken
and suggest restarting guest to see the device?
Some QEMU versions had a racy implementation of virtio
that would corrupt guest memory.
Should libvirt warn the user that virtio is broken
and suggest switching to e1000 or upgrading QEMU?
Some QEMU versions have buggy qcow2 that would corrupt disk.
Should libvirt warn the user that qcow2 is broken
and suggest switching to raw?
Some kernels have buggy vhost drivers which would crash host.
Should libvirt detect these and tell user to upgrade kernel
or switch to userspace virtio?
Some kernels have NIC drivers that brick hardware.
Should libvirt detect these and tell user to upgrade kernel
or switch to a different NIC?
There are libc bugs, glib bugs ....

Let's fix the bug in QEMU and move on.
Working around them in libvirt is unnecessary.


-- 
MST


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]