[vfio-users] Infiniband card shares iommu group with pci bridge
Maksym Planeta
mplaneta at os.inf.tu-dresden.de
Tue Jul 11 15:53:45 UTC 2017
Hello,
I'm trying to configure my Infiniband cards to pass them to VMs using SR-IOV. Unfortunately my only PCIe x16 slot seems to share iommu group with a PCI bridge. And this stops kernel from letting IB virtual functions to pass to the kernel. I tried many options but none of them worked. Let me describe what I did, probably you can give me an advice how to work around the issue.
Here is how the problem manifests in the first place:
$ sudo virsh start mvapichVM.2.1
error: Failed to start domain mvapichVM.2.1
error: internal error: qemu unexpectedly closed the monitor: ... qemu-system-x86_64: -device vfio-pci,host=01:00.1,id=hostdev0,bus=pci.0,addr=0xa: vfio error: 0000:01:00.1: group 1 is not viable
The key error is this:
vfio error: 0000:01:00.1: group 1 is not viable
I'm checking which devices are in the group 1:
$ find /sys/kernel/iommu_groups/ -type l | grep $(lspci | grep Mellanox | tail -n 1 | cut -c1-2)
/sys/kernel/iommu_groups/1/devices/0000:01:00.3
/sys/kernel/iommu_groups/1/devices/0000:01:00.1
/sys/kernel/iommu_groups/1/devices/0000:01:00.4
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.2
/sys/kernel/iommu_groups/1/devices/0000:01:00.0
What we see here are an IB card with 1 physical function and 4 virtual functions, and a PCI bridge.
Here is an excerpt from lspci:
# lspci -s 00:01 -vnn
00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller [8086:0c01] (rev 06) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0, IRQ 26
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
Memory behind bridge: f0000000-f09fffff
Prefetchable memory behind bridge: 00000000c0000000-00000000c3ffffff
Capabilities: [88] Subsystem: Gigabyte Technology Co., Ltd Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller [1458:5000]
Capabilities: [80] Power Management version 3
Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [a0] Express Root Port (Slot+), MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [140] Root Complex Link
Capabilities: [d94] #19
Kernel driver in use: pcieport
Kernel modules: shpchp
# lspci -s 01:00.0 -vnn
01:00.0 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3] [15b3:1003]
Subsystem: Mellanox Technologies MT27500 Family [ConnectX-3] [15b3:0050]
Flags: bus master, fast devsel, latency 0, IRQ 16
Memory at f0900000 (64-bit, non-prefetchable) [size=1M]
Memory at f0000000 (64-bit, prefetchable) [size=8M]
Expansion ROM at f0800000 [disabled] [size=1M]
Capabilities: [40] Power Management version 3
Capabilities: [48] Vital Product Data
Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
Capabilities: [60] Express Endpoint, MSI 00
Capabilities: [c0] Vendor Specific Information: Len=18 <?>
Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
Capabilities: [148] Device Serial Number f4-52-14-03-00-10-a4-e0
Capabilities: [154] Advanced Error Reporting
Capabilities: [18c] #19
Capabilities: [108] Single Root I/O Virtualization (SR-IOV)
Kernel driver in use: mlx4_core
Kernel modules: mlx4_core
# lspci -s 01:00.1 -vnn
01:00.1 Network controller [0280]: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] [15b3:1004]
Subsystem: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] [15b3:61b0]
Flags: fast devsel
[virtual] Memory at c0000000 (64-bit, prefetchable) [size=8M]
Capabilities: [60] Express Endpoint, MSI 00
Capabilities: [9c] MSI-X: Enable- Count=220 Masked-
Capabilities: [40] Power Management version 0
Kernel driver in use: vfio-pci
Kernel modules: mlx4_core
So I tried to circumvent this by compiling kernel with VFIO_NOIOMMU parameter (see this patch: https://lkml.org/lkml/2015/12/22/541). And also I tried to apply pcie_acs_override patch. I boot the kernel with pcie_acs_override=downstream,multifunction additionally.
But nothing change the iommu group assignment.
Some diagnostics from dmesg. These two lines appear during boot, but nothing similar appears for the 0000:00:01 device.
[ 0.692871] pci 0000:00:1c.2: Intel PCH root port ACS workaround enabled
[ 0.692567] pci 0000:00:1c.0: Intel PCH root port ACS workaround enabled
Hardware details:
# lspci -t
-[0000:00]-+-00.0
+-01.0-[01]--+-00.0
| +-00.1
| +-00.2
| +-00.3
| \-00.4
+-02.0
...
Motherboard:
Base Board Information
Manufacturer: Gigabyte Technology Co., Ltd.
Product Name: Z87-HD3
CPU:
Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
Infiniband card:
Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
Full dmesg: https://pastebin.com/c3XrJ7Vu
Very verbose lspci: https://pastebin.com/FgiDJ9M3
Could you tell me if it is possible at all to break this group? If yes how can i do this.
--
Regards,
Maksym Planeta
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5174 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://listman.redhat.com/archives/vfio-users/attachments/20170711/2e10c5dd/attachment.p7s>
More information about the vfio-users
mailing list