[vfio-users] Kernel panic at vfio_intx_handler leads to low performance in guest VM

Hu Zhifeng zhifeng.hu at hotmail.com
Thu May 25 10:53:29 UTC 2017


Dear all,

I am running a fresh Fedora 23 and want to use kvm/qemu to run a windows VM with GPU passthrough.

My setup is as follow:
Host OS: Fedora 23 (Workstation x86_64)
Kernel: 4.2.3-300.fc23.x86_64
QEMU version: qemu-2.4.0.1-1.fc23
Guest VM: Windows 7
CPU: Intel i7-6700K
Motherboard: Gigabyte B150-HD3
IGD: Intel® HD Graphics 530 (used by the host)
Graphics Card: GT710 (used by the VM)

First, enable IOMMU by appending the `intel_iommu=on` parameter to GRUB.
Next, prevent the kernel modules i915, nouveau and snd_hda_intel from being loaded for both initramfs and system.
Then, load vfio-pci with ids (modprobe vfio-pci ids=10de:128b,10de:0e0f)
Last, run qemu like this:
qemu-system-x86_64 -enable-kvm -m 4G -cpu host,kvm=off -smp 4,sockets=1,cores=2,threads=2 -hda ~/win7.img -usbdevice host:093a:2510 -usbdevice host:0c45:7603 -device vfio-pci,host=01:00.0,x-vga=on -device vfio-pci,host=01:00.1 -vga none

Everything looks good and the dedicated GPU detected by the guest VM (N.B. GPU driver `378.92-desktop-win8-win7-64bit-international-whql.exe` was ready),
But the guest VM is running very slow, and I observed kernel panic which generated by vfio_pci.

Here's the log from dmesg:
[  737.317946] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=io+mem,decodes=io+mem:owns=none
[  737.356996] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=io+mem,decodes=io+mem:owns=none
[  737.367606] vfio_pci: add [10de:128b[ffff:ffff]] class 0x000000/00000000
[  737.378437] vfio_pci: add [10de:0e0f[ffff:ffff]] class 0x000000/00000000
[  738.233680] vfio-pci 0000:01:00.0: enabling device (0000 -> 0003)
[  739.755715] kvm: zapping shadow pages for mmio generation wraparound
[  739.874265] irq 16: nobody cared (try booting with the "irqpoll" option)
[  739.874269] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.3-300.fc23.x86_64 #1
[  739.874270] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./B150-HD3-CF, BIOS F5 03/11/2016
[  739.874271]  0000000000000000 e5300c14e6af3df1 ffff880470c03e28 ffffffff81771fca
[  739.874272]  0000000000000000 ffff88045b2844a4 ffff880470c03e58 ffffffff810f88a5
[  739.874273]  ffff880081f42e50 ffff88045b284400 0000000000000000 0000000000000010
[  739.874275] Call Trace:
[  739.874276]  <IRQ>  [<ffffffff81771fca>] dump_stack+0x45/0x57
[  739.874281]  [<ffffffff810f88a5>] __report_bad_irq+0x35/0xd0
[  739.874282]  [<ffffffff810f8c44>] note_interrupt+0x244/0x290
[  739.874284]  [<ffffffff810f607c>] handle_irq_event_percpu+0x11c/0x180
[  739.874285]  [<ffffffff810f6110>] handle_irq_event+0x30/0x60
[  739.874286]  [<ffffffff810f91f4>] handle_fasteoi_irq+0x84/0x150
[  739.874287]  [<ffffffff81016e42>] handle_irq+0x72/0x120
[  739.874289]  [<ffffffff810bd66a>] ? atomic_notifier_call_chain+0x1a/0x20
[  739.874291]  [<ffffffff8177b5df>] do_IRQ+0x4f/0xe0
[  739.874292]  [<ffffffff817794eb>] common_interrupt+0x6b/0x6b
[  739.874292]  <EOI>  [<ffffffff81108a4f>] ? hrtimer_start_range_ns+0x1bf/0x3b0
[  739.874296]  [<ffffffff816160c0>] ? cpuidle_enter_state+0x130/0x270
[  739.874297]  [<ffffffff8161609b>] ? cpuidle_enter_state+0x10b/0x270
[  739.874298]  [<ffffffff81616237>] cpuidle_enter+0x17/0x20
[  739.874300]  [<ffffffff810dfcc2>] call_cpuidle+0x32/0x60
[  739.874301]  [<ffffffff81616213>] ? cpuidle_select+0x13/0x20
[  739.874302]  [<ffffffff810dff58>] cpu_startup_entry+0x268/0x320
[  739.874304]  [<ffffffff8176870c>] rest_init+0x7c/0x80
[  739.874305]  [<ffffffff81d5702d>] start_kernel+0x49d/0x4be
[  739.874307]  [<ffffffff81d56120>] ? early_idt_handler_array+0x120/0x120
[  739.874308]  [<ffffffff81d56339>] x86_64_start_reservations+0x2a/0x2c
[  739.874309]  [<ffffffff81d56485>] x86_64_start_kernel+0x14a/0x16d
[  739.874309] handlers:
[  739.874313] [<ffffffffa05172d0>] vfio_intx_handler [vfio_pci]
[  739.874313] Disabling IRQ #16

What I've tried so far:
1. Different graphics card (GTX750Ti), with same results
2. Different host OS (Fedora 24: Kernel 4.5.5-300.fc24.x86_64 + qemu-2.6.2-8.fc24), without any issues
3. Load vfio-pci with `nointxmask=1`, without any issues
4. Remove `-hda ~/win7.img` from QEMU command (seabios only), still get the same crash

So I have some questions now:
1. Is this a known issue? what is the root cause?
2. Why Fedora 24 does not have this issue? related to kernel, qemu or other components?
3. Is `nointxmask=1` the right way to avoid crash?

Thank you in advance for any guidance.

Best regards,
Zhifeng
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/vfio-users/attachments/20170525/6c2886c1/attachment.htm>


More information about the vfio-users mailing list