[vfio-users] Infiniband card shares iommu group with pci bridge

Maksym Planeta mplaneta at os.inf.tu-dresden.de
Tue Jul 11 15:53:45 UTC 2017


Hello,

I'm trying to configure my Infiniband cards to pass them to VMs using SR-IOV. Unfortunately my only PCIe x16 slot seems to share iommu group with a PCI bridge. And this stops kernel from letting IB virtual functions to pass to the kernel. I tried many options but none of them worked. Let me describe what I did, probably you can give me an advice how to work around the issue.

Here is how the problem manifests in the first place:

    $ sudo virsh start mvapichVM.2.1
    error: Failed to start domain mvapichVM.2.1
    error: internal error: qemu unexpectedly closed the monitor: ... qemu-system-x86_64: -device vfio-pci,host=01:00.1,id=hostdev0,bus=pci.0,addr=0xa: vfio error: 0000:01:00.1: group 1 is not viable

The key error is this:

    vfio error: 0000:01:00.1: group 1 is not viable

I'm checking which devices are in the group 1:

    $ find /sys/kernel/iommu_groups/ -type l | grep $(lspci | grep Mellanox | tail -n 1 | cut -c1-2)
    /sys/kernel/iommu_groups/1/devices/0000:01:00.3
    /sys/kernel/iommu_groups/1/devices/0000:01:00.1
    /sys/kernel/iommu_groups/1/devices/0000:01:00.4
    /sys/kernel/iommu_groups/1/devices/0000:00:01.0
    /sys/kernel/iommu_groups/1/devices/0000:01:00.2
    /sys/kernel/iommu_groups/1/devices/0000:01:00.0

What we see here are an IB card with 1 physical function and 4 virtual functions, and a PCI bridge.

Here is an excerpt from lspci:

    # lspci -s 00:01 -vnn
    00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller [8086:0c01] (rev 06) (prog-if 00 [Normal decode])
    	Flags: bus master, fast devsel, latency 0, IRQ 26
    	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
	    Memory behind bridge: f0000000-f09fffff
    	Prefetchable memory behind bridge: 00000000c0000000-00000000c3ffffff
    	Capabilities: [88] Subsystem: Gigabyte Technology Co., Ltd Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller [1458:5000]
    	Capabilities: [80] Power Management version 3
    	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
	    Capabilities: [a0] Express Root Port (Slot+), MSI 00
    	Capabilities: [100] Virtual Channel
	    Capabilities: [140] Root Complex Link
    	Capabilities: [d94] #19
	    Kernel driver in use: pcieport
    	Kernel modules: shpchp
    
    
    # lspci -s 01:00.0 -vnn
    01:00.0 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3] [15b3:1003]
    	Subsystem: Mellanox Technologies MT27500 Family [ConnectX-3] [15b3:0050]
    	Flags: bus master, fast devsel, latency 0, IRQ 16
    	Memory at f0900000 (64-bit, non-prefetchable) [size=1M]
    	Memory at f0000000 (64-bit, prefetchable) [size=8M]
    	Expansion ROM at f0800000 [disabled] [size=1M]
    	Capabilities: [40] Power Management version 3
    	Capabilities: [48] Vital Product Data
    	Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
    	Capabilities: [60] Express Endpoint, MSI 00
    	Capabilities: [c0] Vendor Specific Information: Len=18 <?>
    	Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
    	Capabilities: [148] Device Serial Number f4-52-14-03-00-10-a4-e0
    	Capabilities: [154] Advanced Error Reporting
    	Capabilities: [18c] #19
    	Capabilities: [108] Single Root I/O Virtualization (SR-IOV)
    	Kernel driver in use: mlx4_core
    	Kernel modules: mlx4_core
    
    # lspci -s 01:00.1 -vnn
    01:00.1 Network controller [0280]: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] [15b3:1004]
    	Subsystem: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] [15b3:61b0]
    	Flags: fast devsel
    	[virtual] Memory at c0000000 (64-bit, prefetchable) [size=8M]
    	Capabilities: [60] Express Endpoint, MSI 00
    	Capabilities: [9c] MSI-X: Enable- Count=220 Masked-
    	Capabilities: [40] Power Management version 0
    	Kernel driver in use: vfio-pci
    	Kernel modules: mlx4_core
    
So I tried to circumvent this by compiling kernel with VFIO_NOIOMMU parameter (see this patch: https://lkml.org/lkml/2015/12/22/541). And also I tried to apply pcie_acs_override patch. I boot the kernel with pcie_acs_override=downstream,multifunction additionally.

But nothing change the iommu group assignment.

Some diagnostics from dmesg. These two lines appear during boot, but nothing similar appears for the 0000:00:01 device.

    [    0.692871] pci 0000:00:1c.2: Intel PCH root port ACS workaround enabled
    [    0.692567] pci 0000:00:1c.0: Intel PCH root port ACS workaround enabled

Hardware details:

    # lspci -t
    -[0000:00]-+-00.0
               +-01.0-[01]--+-00.0
               |            +-00.1
               |            +-00.2
               |            +-00.3
               |            \-00.4
               +-02.0
    ...

Motherboard:

    Base Board Information
	    Manufacturer: Gigabyte Technology Co., Ltd.
    	Product Name: Z87-HD3

CPU:

    Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz

Infiniband card: 

    Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

Full dmesg: https://pastebin.com/c3XrJ7Vu
Very verbose lspci: https://pastebin.com/FgiDJ9M3

Could you tell me if it is possible at all to break this group? If yes how can i do this.

-- 
Regards,
Maksym Planeta

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5174 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://listman.redhat.com/archives/vfio-users/attachments/20170711/2e10c5dd/attachment.p7s>


More information about the vfio-users mailing list