[vfio-users] Bisected an issue with OVMF

Laszlo Ersek lersek at redhat.com
Fri Jul 28 14:58:49 UTC 2017


On 07/28/17 08:19, Samuel Holland wrote:
> Hello,
>
> I've been using vfio to pass through a GPU[1] to a QEMU virtual
> machine for several years. I've been using OVMF since some time in
> 2015. Initially, I updated my OVMF binary from Gerd's repo on a
> regular basis, but some time around February 2016, the OVMF binaries
> stopped booting. I could pass through other devices (EHCI controllers,
> audio) fine, but if I attached the GPU, I would have one core stuck at
> 100% usage indefinitely, and a blank screen and no serial console
> output from OVMF.
>
> The OVMF hang would happen regardless of the number of cores, or the
> amount of RAM given to the VM, or whether or not the GPU was attached
> to
> a root port.
>
> By the time I got around to investigating the issue, I'd forgotten the
> date of the last good revision (I only had the fd images on hand, not
> the RPM), and nobody else had heard of my issue. This spring, I tried
> compiling various revisions of the edk2 source repo, but none of them
> booted.
>
> Thanks to the recent thread where a couple of old OVMF builds were
> posted, I was finally able to bisect the issue!

Great; thank you for the effort!

> The build[2] from 2015 booted! The one from mid-2016 did not, as
> expected. Anyway, I tracked the breakage down to this commit:
>
> commit 7daf2401d420573f50e3d00ae3a89e54914ef056
> Author: Laszlo Ersek <lersek at redhat.com>
> Date:   Fri Mar 4 01:49:54 2016 +0100
>
>     OvmfPkg: PciHostBridgeLib: permit access to the full extended
>     config space
>
>     By now OVMF makes MdeModulePkg/Bus/Pci/PciHostBridgeDxe go through
>     MMCONFIG (when running on Q35). Enable the driver to address each
>     B/D/F's config space up to and including offset 0xFFF.
>
> With that commit reverted, current edk2 master (2c2c68b9d3e8) also
> boots successfully. I don't know if there's a bug somewhere in the
> software stack, or if my graphics card has a bug and needs a quirk, or
> what the issue is. But I'd love to know why this doesn't work, and get
> whatever's broken fixed.

Adding Alex -- I *vaguely* recall that, at some point in the Q35
discussions, Alex warned us that exposing the full 4KB extended config
space to the guest, for assigned PCI Express devices, might bring
unexpected behavior forward, as (do I remember correctly?) VFIO might
not contain any necessary quirks for patching registers in extended
config space.

Alex, do I remember (halfway) correctly?

Hmm, the following messages look relevant (see under "5. Device
assignment", near the end of each message):

  http://mid.mail-archive.com/20160906093809.27c93074@t450s.home
  http://mid.mail-archive.com/678a7c06-b49c-6e0a-104b-ca339f79246b@redhat.com
  http://mid.mail-archive.com/20160906123258.3a8c63be@t450s.home
  http://mid.mail-archive.com/4b64780d-64fd-19a9-724c-e0a52d93d0c0@redhat.com
  http://mid.mail-archive.com/ad82ade2-6d85-f55a-adeb-147c5a932352@redhat.com

> For what it's worth, back when I first started using this GPU and
> motherboard[3] combination, I had to manually apply this patch[4] to
> QEMU for assignment to work, until it was included in a released
> version of QEMU.

Ahh! Very interesting. I'm quite straining my brain to guess the root
cause here, but it looks like:
- your host system has "broken MMCONFIG regions",
- VFIO translates only a subset of the full 4KB config space "actively",
  and restricts the "quirks" also to this subset
- VFIO lets the rest of the config space through transparently,
- once OVMF allowed that transparent config space to grow to 4KB, it
  exposed bugs... somewhere.

Two workarounds: use either an i440fx machine type, or else move your
hostdev in the domain XML to a legacy PCI (not PCI Express) virtual
slot. (Basically, locate the <controller> with "pci-bridge" model in
your domain XML, and take note of its @index attribute. Then locate your
hostdev, and change the *non-source* <address>, so that its @bus
attribute point to the above-identified @index value. Remove the @slot
and @function attributes, libvirt should fill those in for you correctly
(it should find the first nonzero free slot).)

Thanks,
Laszlo

> [1]: https://www.gigabyte.com/Graphics-Card/GV-N660OC-2GD
> [2]:
> https://github.com/jkoelndorfer/local-tools/blob/master/workstation/vfio/edk2.git-ovmf-x64-0-20150804.b1143.g8ca1489.noarch.rpm
>
> [3]: http://www.asrock.com/mb/Intel/Z97E-ITXac/
> [4]:
> https://github.com/qemu/qemu/commit/f5793fd9e1fd89808f4adbfe690235b094176a37
>




More information about the vfio-users mailing list