[vfio-users] Linux iommu issue with peer-to-peer dma transfers between NVidia GTX 1080s

Sat Sep 23 16:00:37 UTC 2017

Hi, 
I would like to draw upon the list participants' know-how and
experience in trying to resolve the following issue. I have tried in
vain to get NVidia's support in the past, I have given up for quite a
long time in the hope it will get fixed as a matter of course but
coming back to it half a year later (and multiple kernel and driver
versions later) I see it still persists. (The original post was https:/
/devtalk.nvidia.com/default/topic/996091/peer-to-peer-dma-issue-/ here
and I am copying below)

The bug that makes the use of multiple GTX1080's impossible when I turn
on the IOMMU in Linux (tried kernels 4.8 and 4.13, using either
standard iommu=on or iommu=on,igfx_off or iommu=pt for passthrough
mode) on a X99 board.

The bug can be triggered by running any peer-to-peer memory transfer,
for example running the CUDA 8.0 Samples code
1_Utilities/p2pBandwidthLatencyTest from the terminal triggers the
problem: the video driver (and as a result the X server) crashes
immediately, and after multiple Ctrl-C's and waiting for tens of
seconds the server eventually restarts and I am presented with a login
prompt to X Windows.

The relevant kernel error messages are (thousands of these lines, just
a snippet below:) 

[   51.691440] DMAR: DRHD: handling fault status reg 2
[   51.691450] DMAR: [DMA Write] Request device [04:00.0] fault addr
f8139000 [fault reason 05] PTE Write access is not set
[   51.691457] DMAR: [DMA Write] Request device [04:00.0] fault addr
f8139000 [fault reason 05] PTE Write access is not set
[   51.691462] DMAR: [DMA Write] Request device [04:00.0] fault addr
f8139000 [fault reason 05] PTE Write access is not set
[   51.691465] DMAR: [DMA Write] Request device [04:00.0] fault addr
f8139000 [fault reason 05] PTE Write access is not set
[   51.691470] DMAR: DRHD: handling fault status reg 400
[   51.740674] DMAR: DRHD: handling fault status reg 402
[   51.740683] DMAR: [DMA Write] Request device [04:00.0] fault addr
f8139000 [fault reason 05] PTE Write access is not set
[   51.740688] DMAR: [DMA Write] Request device [04:00.0] fault addr
f8139000 [fault reason 05] PTE Write access is not set
[   51.740693] DMAR: [DMA Write] Request device [04:00.0] fault addr
f8139000 [fault reason 05] PTE Write access is not set

Cleary the above suggest that the CUDA driver is attempting DMA at an
address for which the corresponding iommu page table entry write flag
is not set, presumably because the driver has not properly
registered/requested access via the general dma_map() kernel interface
(https://www.kernel.org/doc/Documentation/DMA-API-HOWTO.txt) 

Scouting the net reveals a bug registered (https://bugzilla.kernel.org/
show_bug.cgi?id=188271) for exactly the same reason on totally
different hardware (Supermicro Dual socket board) using Pascal Titan-
X's, so same architecture cards as mine. Interestingly enough, the
kernel error messages in this report claim unauthorized access of
*exactly* the same memory address! (f8139000, in bold below) :

[16193.666976] DMAR: [DMA Write] Request device [82:00.0] fault addr f8
139000 [fault reason 05] PTE Write access is not set (edited)

So this looks like a red flag that somehow the indirection afforded by
the iommu is bypassed and the driver is using hardcoded DMA addresses.
Please note that the author of the bug report claims that seting
iommu=igfx_off somehow solves this, but really igfx_off per se should
be irrelevant here without turning the iommu support on first, with
something like iommu=on,igfx_off. What instead happens is that most
likely iommu=igfx_off as opposed to iommu=on just turns off iommu
altogether, allowing the dma to succeed. This is exactly what happens
on my system too. So in other words the bug report merely states that
turning off the iommu allows peer-to-peer tranfers to work. Still his
detailed log files should be very useful for an independent
manifestation of the same issue. My log files are attached on the
original thread included at the start of this post.

I am using an ASRock X99 board (x99e-itx/ac) with latest firmware,
intel i6800k, dual Asus GTX-1080s Founder's Edition, 32GB ram and
Ubuntu 16.10 (or 17.10 now) with all updates applied (kernel 4.8.0-37
or 4.13 now) with driver 378.13 or 384.69.

Have you come across this while trying to virtualize nvidia GPUs? Given
the Linux driver forum at nvidia refuses to display bug posts by users
(they remain "hidden") and given nvidia would much have you buy
quadro's and tesla's instead the conspiracy theorist in me is more
inclined to believe that vt-d is intentionally disabled in consumer
versions of the hardware...

Thanks for any input/solutions!