[vfio-users] Intel IGD passthrough and OpRegion

Aa Aa jimbothom at yandex.com
Thu Jun 29 01:19:18 UTC 2017


On Tue, Jun 27, 2017 at 03:52:57PM -0600
> On Fri, 23 Jun 2017 21:01:19 +1000
> Aa Aa <jimbothom at yandex.com> wrote:
> 
> > I have run in to a few problems with IGD passthrough with a linux guest. I am not running in legacy mode, so I guess that I might not be supported.
> 
> Theoretically UPT mode is the one Intel supports, but issues abound
> since there's no stable hardware spec and the "universal"-ness of UPT
> isn't shaping up to be what was expected.
> 
> > The first thing that I noticed was on some intel machines, when the VFIO IOMMU module was loaded from qemu I was getting a whole lot of DMAR faults. The address I found was the same as the one that was being set by the kernel here:
> >  
> > Jun 23 10:21:35 phys kernel: DMAR: Setting RMRR:
> > Jun 23 10:21:35 phys kernel: DMAR: Setting identity map for device 0000:00:02.0 [0xcb000000 - 0xcf1fffff]
> >  
> > This happened on two machines at different memory locations. I was able to fix this by hardcoding an entry thus:
> > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > index a8a079ba9477..3c0f134c1669 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -1270,6 +1270,8 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> >                         goto out_domain;
> >         }
> >  
> > +       ret = iommu_map(domain->domain, 0xcb000000u,0xcb000000u , 0x4200000u,  IOMMU_READ | IOMMU_WRITE);
> >  
> > and the machine because usable again. As this fixed my problem, I didn't bother checking what RMRR does, but should this be handled or should entries that doesn't apepar in the PCI configation space not be remove from the DMAR?
> 
> In general, RMRRs exclude devices from being eligible for user
> assignment, see justification here:
> 
> https://access.redhat.com/sites/default/files/attachments/rmrr-wp1.pdf
> 
> IGD and USB are special cases to this though as the RMRR defined
> regions are for use by the device rather than for platform monitoring
> and back channel communications to access the device.  These cases are
> therefore allowed, but the RMRR is not honored and no attempt is made
> by anyone to map these regions.  In the case of IGD, the RMRR is
> typically describing the stolen memory of the device, and the code
> above is identity mapping that massive region into the user address
> space, regardless of the VM memory layout.  Not good. You've avoided
> some DMAR faults, likely at the expense of a stable VM.
> 
Yes, understood, but doesn't Intel also steal 66MB from my physical machines address space? 
> Originally we were told that UPT mode devices don't need stolen memory
> (legacy mode uses a partially successful hack to re-allocate stolen
> memory from the guest address space), but Intel seems to be going back
> on that as UPT becomes less and less universal.
> 
> > Anyhow, I tried to get the OpRegion working, adding an x-idg-opregion and overrring the VGA check
> 
> There's no VGA check in x-idg-opregion, 
> 
> > in the vfio kernel module. But I noticed in the VM the OpRegion ins't
>  mapped:
> >  b
> > On the guest the region FC is zero whereas it contains a 32bit address from the host:
> > lspci -xxxx -s 00:2
> > 00:02.0 Display controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)
> > 00: 86 80 12 04 07 04 90 00 06 00 80 03 00 00 80 00
> > ....
> > f0: 00 00 00 00 00 00 00 00 00 00 06 00 00 00 00 00
> >  
> >  lspci -xxxx -s 00:2
> > 00:02.0 Display controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)
> > 00: 86 80 12 04 07 04 90 00 06 00 80 03 00 00 00 00
> > ...
> > f0: 00 00 00 00 00 00 00 00 00 00 06 00 18 c0 d5 c8
> >  
> > How is the OpRegion mapped in the guest?
>
> Unfortunately you don't say what kernel you're using as QEMU's
> x-idg-opregion is dependent a version of vfio-pci that exposes this for
> IGD devices.  This was added in v4.6.  You also don't indicate which VM
> BIOS you're using, but only SeaBIOS supports the necessary fw_cfg
> interfaces for reserving memory for the OpRegion.  The way this is
> supposed to work is that QEMU reads the OpRegion from a vfio region on
> the vfio device file descriptor and stores that into fw_cfg for
> SeaBIOS.  SeaBIOS finds the fw_cfg tag, allocates the necessary memory,
> creating a reserved memory area in the VM, copies the OpRegion data
> into that reserved memory, then writes the address to the 0xFC register
> on the device.  Since you're not getting that, clearly something is
> broken in this chain.  Thanks,
>
I am sorry if I wasn't clear what I was asking for here. I just didn't understand what was going on in the pci_bar PCI_BAR_UNMAPPED and other code and still don't get how the mapping takes place with qemu's Memory regions. If I explain what I was doing, it might help clear things up.

I had been using a linux VM for sometime with IGD passthough and OVMF - no SeaBIOS and no legacy. Anyhow, all I wanted was to see if I could get windows to boot with IGD support and I discovered that the driver was loading but no information on connectors was appearing. I discovered that the VBT was in the OpRegion. Also it seems that the OpRegion needed to be writable as there were mailboxes. Also, I couldn't mmap that memory because PAE said it was write though, so I had to hack the kernel (https://pastebin.com/rc96YMgQ) and qemu to support this (https://pastebin.com/2ATig3Dt). Basically, I hard coded the OPREGION in the VM's address space at the end of where the BIOS extensions are located (because there is no BIOS in this case) and emulated write back support (by reading the whole OpRegion (3 pages, because of the alignment on my hardware) after writing to this. I was able to boot windows with IGD without KVM (KVM causes an exception) with q35 (not machine=pc).

I noticed the OPREGION docs (actually more likely the i915 code) that the OPREGION size can exceed 8192 bytes and should be writable..

Hope someone else will find this of use.

> Alex

Cheers

JT




More information about the vfio-users mailing list