[vfio-users] GPU driver crashes when running a second VM if either VM has a virtual disk stored on physical media other than the root disk. Tested on three X58 chipset MBs

Peter Maloney peter.maloney at brockmann-consult.de
Thu Nov 16 09:59:22 UTC 2017


Oh I saw your other thread but didn't see this one where you show usb
passthrough... I've seen usb passthrough fail with an i7 5820K and X99
chipset on some kernels between 4.9 and 4.13 (but 4.13.11 works fine).
... 4.10 is in that range, but I didn't test that one on the X99. And I
don't have anything with X58.

When usb passthrough fails this way, the host dmesg shows things like
this 
https://gist.github.com/anonymous/d1cfe9ef5ebef0f96bb8d9ac451140f0  and
then none of my usb devices work, such as the keyboard and mouse on the
host (can't even REISUB :D and I disabled the power button due to
toddler fascination with such buttons).

So I suggest
- test without usb passthrough... try synergy if you need mouse/keyboard
control, or pci passthrough of a usb controller
- using kernel 4.13.11 which worked for me on the X99, or .12 which is
the latest but only tested once so far.


My setup:
vm 1
    second seat with another monitor. On the i7, with usb passthrough
for keyboard, mouse, usb sound card. On the ryzen 7, pci passthrough of
the usb controller.
    r7 260X hangs the host (on VM start with ryzen 7 1700X
GA-AX370-Gaming-5, on VM reboot on i7 5820K GA-X99-UD3)... so I use a HD
6770
    it sorta dual boots... pick disks to attach, rather than controlled
via bootloader
    first set of disks: linux, on LVM on the host's SSD
    second set of disks: windows 10, on LVM on a separate mdadm raid1,
and with bcache with cache on the host's SSD
vm 2
    same seat using 2nd input on main monitor, synergy for keyboard and
mouse, usb passthrough for sound and joystick
    r7 360
    windows 10
    disks with separate mdadm raid1, and with bcache with cache on the
host's SSD

The X99 is borrowed since my motherboard died. My main machine is the
ryzen 7. (thanks Geoff and Paolo for fixing the NPT bug!)

On 11/14/17 09:10, Brian Yglesias wrote:
> To put it another way, running concurrent VMs when at least one VM has an assigned GPU will always result in a GPU driver crash, unless all VMs and all their attached media reside on the root disk.  I've been able to replicate this consistently across three motherboards, all with the X58 chipset (I don't have anything else on hand to test with, and at this point I suspect it's a problem with the chipset).
>
> Example:
> -The OS is on /dev/sda
> -VM1's root disk is also on /dev/sda, and is the only disk
> -VM2's root disk is also on /dev/sda, and is the only disk
>
> *This works.
>
> -Now, add a second physical drive to the server - /dev/sdb
> -Attach a virtual disk to VM2 which is stored on /dev/sdb
>
> *Any and all VMs with assigned GPUs will now eventually crash.
>
>
> If they are at near idle it may take a while.  If their GPUs are being utilized somewhat, it may take only seconds.  The only solution I've found is to stop all VMs and use VMs with assigned GPU(s) by themselves.
>
> This has been the case since early 2016 when I began testing.  I've tried various invocations of kvm, as well as countless disk configurations, and I've upgraded the kernel, kvm, and the rest of the OS several times.  I'm not sure that this is a vfio problem, although the fact that the problem only occurs when an assigned GPU is involved is suggestive.  In any case, I also reported this to the qemu bugtracker (last year), but have not heard back.
>
> I currently start the VMs as follows:
>
>
> /usr/bin/kvm \
> -id 110 \
> -chardev 'socket,id=qmp,path=/var/run/qemu-server/110.qmp,server,nowait' \
> -mon 'chardev=qmp,mode=control' \
> -pidfile /var/run/qemu-server/110.pid \
> -daemonize \
> -smbios 'type=1,uuid=a4419ef3-5aef-4978-8849-d9d010e26e27' \
> -drive 'if=pflash,unit=0,format=raw,readonly,file=/usr/share/kvm/OVMF_CODE-pure-efi.fd' \
> -drive 'if=pflash,unit=1,format=raw,file=/root/sbin/110-ovmf.fd' \
> -name Brian-PC \
> -smp '12,sockets=1,cores=12,maxcpus=12' \
> -nodefaults \
> -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
> -vga none \
> -nographic \
> -no-hpet \
> -cpu 'host,hv_vendor_id=Nvidia43FIX,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_reset,hv_vpindex,hv_runtime,hv_relaxed,+kvm_pv_unhalt,+kvm_pv_eoi,kvm=off' \
> -m 8192 \
> -object 'memory-backend-ram,id=ram-node0,size=8192M' \
> -numa 'node,nodeid=0,cpus=0-11,memdev=ram-node0' \
> -k en-us \
> -readconfig /usr/share/qemu-server/pve-q35.cfg \
> -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' \
> -device vfio-pci,host=04:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0 \
> -device vfio-pci,host=04:00.1,id=hostpci1,bus=ich9-pcie-port-2,addr=0x0 \
> -device 'usb-host,hostbus=9,hostport=1.1,id=usb0' \
> -device 'usb-host,hostbus=9,hostport=1.2,id=usb1' \
> -device 'usb-host,hostbus=9,hostport=1.3,id=usb2' \
> -device 'usb-host,hostbus=9,hostport=1.4,id=usb3' \
> -device 'usb-host,hostbus=9,hostport=1.5,id=usb4' \
> -device 'usb-host,hostbus=9,hostport=1.1.1,id=usb5' \
> -device 'usb-host,hostbus=9,hostport=1.1.2,id=usb6' \
> -device 'usb-host,hostbus=9,hostport=1.1.3,id=usb7' \
> -device 'usb-host,hostbus=9,hostport=1.1.4,id=usb8' \
> -device 'usb-host,hostbus=9,hostport=1.1.5,id=usb9' \
> -device 'usb-host,hostbus=9,hostport=1.2.1,id=usb10' \
> -device 'usb-host,hostbus=9,hostport=1.2.2,id=usb11' \
> -device 'usb-host,hostbus=9,hostport=1.2.3,id=usb12' \
> -device 'usb-host,hostbus=9,hostport=1.2.4,id=usb13' \
> -device 'usb-host,hostbus=9,hostport=1.2.5,id=usb14' \
> -device 'usb-host,hostbus=9,hostport=1.3.1,id=usb15' \
> -device 'usb-host,hostbus=9,hostport=1.3.2,id=usb16' \
> -device 'usb-host,hostbus=9,hostport=1.3.3,id=usb17' \
> -device 'usb-host,hostbus=9,hostport=1.3.4,id=usb19' \
> -device 'usb-host,hostbus=9,hostport=1.4.1,id=usb21' \
> -device 'usb-host,hostbus=9,hostport=1.4.2,id=usb22' \
> -device 'usb-host,hostbus=9,hostport=1.4.3,id=usb23' \
> -device 'usb-host,hostbus=9,hostport=1.4.4,id=usb24' \
> -device 'usb-host,hostbus=9,hostport=1.4.5,id=usb25' \
> -device 'usb-host,hostbus=9,hostport=1.5.1,id=usb26' \
> -device 'usb-host,hostbus=9,hostport=1.5.2,id=usb27' \
> -device 'usb-host,hostbus=9,hostport=1.5.3,id=usb28' \
> -device 'usb-host,hostbus=9,hostport=1.5.4,id=usb29' \
> -device 'usb-host,hostbus=9,hostport=1.5.5,id=usb30' \
> -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' \
> -iscsi 'initiator-name=iqn.1993-08.org.debian:01:209b855ce18e' \
> -drive 'file=/dev/zvol/fastpool/vm-110-disk-1,if=none,id=drive-virtio0,cache=writeback,format=raw,aio=threads,detect-zeroes=on' \
> -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' \
> -drive 'file=/dev/zvol/rpool/data/vm-110-disk-1,if=none,id=drive-virtio1,cache=writeback,format=raw,aio=threads,detect-zeroes=on' \
> -device 'virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb' \
> -netdev 'type=tap,id=net0,ifname=tap110i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
> -device 'virtio-net-pci,mac=92:7F:88:0F:73:8D,netdev=net0,bus=pci.0,addr=0x12,id=net0' \
> -rtc 'driftfix=slew,base=localtime' \
> -machine 'type=q35' \
> -global 'kvm-pit.lost_tick_policy=discard'
>
>
> As you can see, this VM has two virtual disks which are stored on separate physical disks.  As such, this VM will crash unrecoverably if another VM is started on the same machine.  All VMs must have all of their disks stored on OS root to avoid a crash.  Obviously this is hardly an ideal setup.
>
> I wanted to give this one more shot before I gave up on the platform (which is unfortunate because I bought two of them), and I was hoping someone could help me out.
>
> Thanks,
> Brian
>
>
>
>
> # kvm --version
> QEMU emulator version 2.9.0 pve-qemu-kvm_2.9.0-5
>
> # uname -a
> Linux proxmox-1 4.10.17-3-pve #1 SMP PVE 4.10.17-21 (Thu, 31 Aug 2017 14:57:17 +0200) x86_64 GNU/Linux
>
> _______________________________________________
> vfio-users mailing list
> vfio-users at redhat.com
> https://www.redhat.com/mailman/listinfo/vfio-users


-- 

--------------------------------------------
Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.maloney at brockmann-consult.de
Internet: http://www.brockmann-consult.de
--------------------------------------------




More information about the vfio-users mailing list