[vfio-users] Kernel panic at vfio_intx_handler leads to low performance in guest VM
Hu Zhifeng
zhifeng.hu at hotmail.com
Thu May 25 10:53:29 UTC 2017
Dear all,
I am running a fresh Fedora 23 and want to use kvm/qemu to run a windows VM with GPU passthrough.
My setup is as follow:
Host OS: Fedora 23 (Workstation x86_64)
Kernel: 4.2.3-300.fc23.x86_64
QEMU version: qemu-2.4.0.1-1.fc23
Guest VM: Windows 7
CPU: Intel i7-6700K
Motherboard: Gigabyte B150-HD3
IGD: Intel® HD Graphics 530 (used by the host)
Graphics Card: GT710 (used by the VM)
First, enable IOMMU by appending the `intel_iommu=on` parameter to GRUB.
Next, prevent the kernel modules i915, nouveau and snd_hda_intel from being loaded for both initramfs and system.
Then, load vfio-pci with ids (modprobe vfio-pci ids=10de:128b,10de:0e0f)
Last, run qemu like this:
qemu-system-x86_64 -enable-kvm -m 4G -cpu host,kvm=off -smp 4,sockets=1,cores=2,threads=2 -hda ~/win7.img -usbdevice host:093a:2510 -usbdevice host:0c45:7603 -device vfio-pci,host=01:00.0,x-vga=on -device vfio-pci,host=01:00.1 -vga none
Everything looks good and the dedicated GPU detected by the guest VM (N.B. GPU driver `378.92-desktop-win8-win7-64bit-international-whql.exe` was ready),
But the guest VM is running very slow, and I observed kernel panic which generated by vfio_pci.
Here's the log from dmesg:
[ 737.317946] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=io+mem,decodes=io+mem:owns=none
[ 737.356996] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=io+mem,decodes=io+mem:owns=none
[ 737.367606] vfio_pci: add [10de:128b[ffff:ffff]] class 0x000000/00000000
[ 737.378437] vfio_pci: add [10de:0e0f[ffff:ffff]] class 0x000000/00000000
[ 738.233680] vfio-pci 0000:01:00.0: enabling device (0000 -> 0003)
[ 739.755715] kvm: zapping shadow pages for mmio generation wraparound
[ 739.874265] irq 16: nobody cared (try booting with the "irqpoll" option)
[ 739.874269] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.3-300.fc23.x86_64 #1
[ 739.874270] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./B150-HD3-CF, BIOS F5 03/11/2016
[ 739.874271] 0000000000000000 e5300c14e6af3df1 ffff880470c03e28 ffffffff81771fca
[ 739.874272] 0000000000000000 ffff88045b2844a4 ffff880470c03e58 ffffffff810f88a5
[ 739.874273] ffff880081f42e50 ffff88045b284400 0000000000000000 0000000000000010
[ 739.874275] Call Trace:
[ 739.874276] <IRQ> [<ffffffff81771fca>] dump_stack+0x45/0x57
[ 739.874281] [<ffffffff810f88a5>] __report_bad_irq+0x35/0xd0
[ 739.874282] [<ffffffff810f8c44>] note_interrupt+0x244/0x290
[ 739.874284] [<ffffffff810f607c>] handle_irq_event_percpu+0x11c/0x180
[ 739.874285] [<ffffffff810f6110>] handle_irq_event+0x30/0x60
[ 739.874286] [<ffffffff810f91f4>] handle_fasteoi_irq+0x84/0x150
[ 739.874287] [<ffffffff81016e42>] handle_irq+0x72/0x120
[ 739.874289] [<ffffffff810bd66a>] ? atomic_notifier_call_chain+0x1a/0x20
[ 739.874291] [<ffffffff8177b5df>] do_IRQ+0x4f/0xe0
[ 739.874292] [<ffffffff817794eb>] common_interrupt+0x6b/0x6b
[ 739.874292] <EOI> [<ffffffff81108a4f>] ? hrtimer_start_range_ns+0x1bf/0x3b0
[ 739.874296] [<ffffffff816160c0>] ? cpuidle_enter_state+0x130/0x270
[ 739.874297] [<ffffffff8161609b>] ? cpuidle_enter_state+0x10b/0x270
[ 739.874298] [<ffffffff81616237>] cpuidle_enter+0x17/0x20
[ 739.874300] [<ffffffff810dfcc2>] call_cpuidle+0x32/0x60
[ 739.874301] [<ffffffff81616213>] ? cpuidle_select+0x13/0x20
[ 739.874302] [<ffffffff810dff58>] cpu_startup_entry+0x268/0x320
[ 739.874304] [<ffffffff8176870c>] rest_init+0x7c/0x80
[ 739.874305] [<ffffffff81d5702d>] start_kernel+0x49d/0x4be
[ 739.874307] [<ffffffff81d56120>] ? early_idt_handler_array+0x120/0x120
[ 739.874308] [<ffffffff81d56339>] x86_64_start_reservations+0x2a/0x2c
[ 739.874309] [<ffffffff81d56485>] x86_64_start_kernel+0x14a/0x16d
[ 739.874309] handlers:
[ 739.874313] [<ffffffffa05172d0>] vfio_intx_handler [vfio_pci]
[ 739.874313] Disabling IRQ #16
What I've tried so far:
1. Different graphics card (GTX750Ti), with same results
2. Different host OS (Fedora 24: Kernel 4.5.5-300.fc24.x86_64 + qemu-2.6.2-8.fc24), without any issues
3. Load vfio-pci with `nointxmask=1`, without any issues
4. Remove `-hda ~/win7.img` from QEMU command (seabios only), still get the same crash
So I have some questions now:
1. Is this a known issue? what is the root cause?
2. Why Fedora 24 does not have this issue? related to kernel, qemu or other components?
3. Is `nointxmask=1` the right way to avoid crash?
Thank you in advance for any guidance.
Best regards,
Zhifeng
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/vfio-users/attachments/20170525/6c2886c1/attachment.htm>
More information about the vfio-users
mailing list