libvirt-6.5.0 breaks host passthrough migration

Fri Jul 10 11:48:26 UTC 2020

On Fri, Jul 10, 2020 at 7:14 AM Jiri Denemark <jdenemar at redhat.com> wrote:

> On Sun, Jul 05, 2020 at 12:45:55 -0400, Mark Mielke wrote:
> > With 6.4.0, live migration was working fine with Qemu 5.0. After trying
> out
> > 6.5.0, migration broke with the following error:
> >
> > libvirt.libvirtError: internal error: unable to execute QEMU command
> > 'migrate': State blocked by non-migratable CPU device (invtsc flag)
>
> Could you please describe the reproducer steps? For example, was the
> domain you're trying to migrate already running when you upgrade libvirt
> or is it freshly started by the new libvirt?
>

The original case was:

1) Machine X running libvirt 6.4.0 + qemu 5.0
2) Machine Y running libvirt 6.5.0 + qemu 5.0
3) Live migration from X to Y works. Guest appears fine.
4) Upgrade Machine X from libvirt 6.4.0 to 6.5.0 and reboot.
5) Live migration from Y to X fails with the message shown.

In each case, live migration was done with OpenStack Train directing
libvirt + qemu.

And it would be helpful to see the <cpu> element as shown by virsh
> dumpxml before you try to start the domain as well as the QEMU command
> line libvirt used to start the domain (in
> /var/log/libvirt/qemu/$VM.log).
>

The <cpu> element looks like this:

  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' dies='1' cores='4' threads='2'/>
  </cpu>

The QEMU command line is very long, and includes details I would avoid
publishing publicly unless you need them. The "-cpu" portion is just:

    -cpu host

The QEMU command line itself is generated from libvirt, which is directed
by OpenStack Train.

> > commit 201bd5db639c063862b0c1b1abfab9a9a7c92591
> > Author: Jiri Denemark <jdenemar at redhat.com>
> > Date:   Tue Jun 2 15:34:07 2020 +0200
> >
> >     qemu: Fill default value in //cpu/@migratable attribute
> >
> >     Before QEMU introduced migratable CPU property, "-cpu host" included
> all
> >     features that could be enabled on the host, even those which would
> block
> >     migration. In other words, the default was equivalent to
> migratable=off.
> >     When the migratable property was introduced, the default changed to
> >     migratable=on. Let's record the default in domain XML.
> >
> >     Signed-off-by: Jiri Denemark <jdenemar at redhat.com>
> >     Reviewed-by: Michal Privoznik <mprivozn at redhat.com>
> >
> > Before this change, qemu was still being launched with "-cpu host", which
> > for any somewhat modern version of qemu, defaults to migratable=on. The
> > above comment acknowledges this, however, the implementation chooses the
> > pessimistic and ancient (and no longer applicable!) value of
> migratable=off:
> >
> > +    if (qemuCaps &&
> > +        def->cpu->mode == VIR_CPU_MODE_HOST_PASSTHROUGH &&
> > +        !def->cpu->migratable) {
> > +        if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_CPU_MIGRATABLE))
> > +            def->cpu->migratable = VIR_TRISTATE_SWITCH_ON;
> >
> > *+        else if (ARCH_IS_X86(def->os.arch))+
> >  def->cpu->migratable = VIR_TRISTATE_SWITCH_OFF;*
> > +    }
>
> The implementation seems to be doing exactly what the commit message
> says. The migratable=off default should be used only when QEMU does not
> support -cpu host,migratable=on|off, that is only when QEMU is very old.
> Every non-ancient version of libvirt should have the
> QEMU_CAPS_CPU_MIGRATABLE set and thus this code should choose
> migrateble=on default.
>

I wasn't sure what QEMU_CAPS_CPU_MIGRATABLE represents. I initially
suspected what you are saying, but since it apparently did not work the way
I expected, I then presumed it does not work the way I expected. :-)

Is QEMU_CAPS_CPU_MIGRATABLE only from the <cpu> element? If so, doesn't
this mean that it is not explicitly listed for host-passthrough, and this
means the check is not detecting whether it is enabled or not properly?

> I think it is not a requirement for "migratable=XXX" to be explicit in
> > libvirt. However, if there is some reason I am unaware of, and it is
> > important for libvirt to know, then I think it is important for libvirt
> to
> > find out the authoritative state rather than guessing.
> Explicit defaults are always better for two reasons: they are visible to
> users and they don't silently change.
>

I think it can go either way. There is also convention over configuration
as a competing principle. However, I also prefer explicit. Just, it needs
to be correct, otherwise explicit can be very bad, as it seems in my case.
:-)

Thanks,

-- 
Mark Mielke <mark.mielke at gmail.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20200710/e90055b3/attachment-0001.htm>