[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[vfio-users] Radeon 5770 Passthrough works in Chipset 440FX but not in Q35



Finally I had both the time and will to start migrating from Xen to standalone QEMU. I want to install everything (Both host OS and guest OS) from scratch in my already-not-so-new SSD. Sadly, I hitted a hard wall, and after a lot of hours testing different options, I was out of ideas. Just mere seconds before I clicked "Send" to this mail, I found this:
Then decided to give a try booting with 440FX instead of Q35. Success...
...but it would be a waste if I didn't send this mail at all, at least so you get myexperience first hand.


The problem was that no matter what I did, installing the Drivers for my Radeon 5770 caused Windows 10 to black screen, then some seconds later the Monitor entered Standby mode, and there were no life signals coming from the VM anymore. This issue surprised me, since I expected to have enough know-how to pull it out by myself, yet didn't managed to figure out what it was until the very last moment.


Reelevant Hardware
Processor: Xeon E3-1245V3 (Haswell)
RAM: 32 GiB
Motherboard: Supermicro X10SAT with BIOS R2.0
Video Card (Host): Intel HD Graphics P4600
Video Card (Guest): Sapphire Radeon 5770 Flex (In its own IOMMU Group, since the Processor PCIe Controller is ignored)

Reelevant Software (Host)
OS: Arch Linux
Linux Kernel: 4.5.4
Hypervisor: QEMU 2.5.1
Boot Mode: UEFI (systemd-boot)

Reelevant Software (Guest)
OS: Windows 10 Enterprise Build 10240 (RTM)
Video Card Drivers: Catalyst 15.7 / Catalyst 15.7.1 / Crimson Edition 16.2.1 Beta
Boot Mode: UEFI

VM Firmware: OVMF binaries adquired from here https://www.kraxel.org/repos/jenkins/edk2/

Note that I briefly used the latest (Now outdated...) b1856 dated 31-05-2016, but it had an annoying bug where I couldn't enter the Firmware setup when the VM POSTed by pressing Del, so I pulled an older backup, b1737 dated 19-04-2016, where I can enter the Firmware setup as usual.

VBIOS: Modded my Radeon VBIOS to add UEFI GOP with updGOP 1.9.3 adquired from here http://www.win-raid.com/t892f16-AMD-and-Nvidia-GOP-update-No-requests-DIY.html

There is also a new version that makes a specific improvement to my Radeon 5770 VBIOS, but didn't tested it.


Host side configuration

Kernel Parameters:

options root=PARTUUID=<PARTUUID> rw intel_iommu=on hugepagesz=1G hugepages=20 default_hugepages=2M modprobe.blacklist=radeon

Basically, turned the Intel IOMMU on, reserved 20 GiB worth of 1 GiB Huge Pages, and blacklisted the radeon Kernel Module with a Kernel Parameter. I tested it and it works, since lspci -s 01:00.0 -vvv | grep Kernel doesn't returns the "Kernel driver in use:" line. The Huge Pages works, too.


VFIO Binding Script:

#!/bin/bash

modprobe vfio-pci
sleep 1s
echo 0000:01:00.0 > /sys/bus/pci/devices/000:01:00.0/driver/unbind
sleep 1s
echo 0000:01:00.1 > /sys/bus/pci/devices/000:01:00.1/driver/unbind
sleep 1s
echo 1002 68b8 > /sys/bus/pci/drivers/vfio-pci/new_id
sleep 1s
echo 1002 aa58 > /sys/bus/pci/drivers/vfio-pci/new_id
sleep 1s

I run that on every boot. The first echo line fails since the Video Card has no Driver binding it, but it doesn't hurts if its there, either. lspci -s 01:00.0 -vvv | grep Kernel returns "Kernel driver in use: vfio-pci" as expected, 01:00.1 device returns that too.
I didn't manage to get vfio-pci working with a .conf file in /etc/modprobe.d/. As far that I know, it seems that I need to customize mkinitcpio to do so, since modprobe.d just loads Kernel Modules that it already knows (Anything related to vfio throwed a parser error that said it was ignoring it). However, manually starting vfio-pci with modprobe and ids= binded the Radeon (Which is Driverless) but not the HDMI Audio function, which was grabbed by another Driver, but it didn't produced any error informing that it failed. I suppose that maybe it works in the early enviroment if vfio-pci grabs it first. Regardless, this way is "good enough" for me.


QEMU Launch Script:

Working 440FX version:

#!/bin/bash

qemu-system-x86_64 \
-nodefconfig \
-name "Windows 10 x64" \
-display sdl \
-monitor stdio \
-machine pc-i440fx-2.5,accel=kvm \
-nodefaults \
-cpu host,hv-spinlocks=8191,hv-relaxed,hv-vapic,hv-time,hv-crash,hv-reset,hv-vpindex,hv-runtime \
-smp cpus=8,sockets=1,cores=4,threads=2 \
-m size=20G \
-object memory-backend-file,id=mem0,size=20G,prealloc=yes,mem-path=/root/vms/hugepages1G \
-numa node,nodeid=0,cpus=0-7,memdev=mem0 \
-rtc base=localtime,driftfix=slew \
-drive id=pflash0,if=pflash,format=raw,readonly=on,file=/root/vms/firmware/OVMF_CODE-efi-vm0.fd \
-drive id=pflash1,if=pflash,format=raw,file=/root/vms/firmware/OVMF_CODE-efi-vm0.fd \
-drive id=cdrom0,if=none,format=raw,readonly=on,file=/root/storage/Win10x64/Win10x64_Enterprise.iso \
-drive id=cdrom1,if=none,format=raw,readonly=on,file=/root/storage/iso/virtio-win-0.1.117.iso \
-drive id=drive0,if=none,format=raw,cache=directsync,aio=native,file=/dev/vg0/vm0 \
-netdev tap,id=net0,vhost=on,fd=3 3<>/dev/tap$(< /sys/class/net/macvtap0/ifindex) \
-object iothread,id=iothread0 \
-device ioh3420,multifunction=true,chassis=0,bus=pci.0,addr=1e.0,id=pcie.1 \
-device ioh3420,chassis=1,bus=pci.0,addr=1e.1,id=pcie.2 \
-device ioh3420,chassis=2,bus=pci.0,addr=1e.2,id=pcie.3 \
-device ioh3420,chassis=3,bus=pci.0,addr=1e.3,id=pcie.4 \
-device virtio-net-pci,netdev=net0,bus=pci.0,addr=10.0,id=network0,mac=$(< /sys/class/net/macvtap0/address) \
-device virtio-scsi-pci,iothread=iothread0,bus=pci.0,addr=11.0,id=scsi0 \
-device scsi-hd,bus=scsi0.0,drive=drive0 \
-device ide-cd,bus=ide.0,drive=cdrom0 \
-device ide-cd,bus=ide.1,drive=cdrom1 \
-device qxl \
-device vfio-pci,host=1:00.0,multifunction=true,bus=pcie.1,addr=00.0,romfile=/root/vms/firmware/Juniper_updGOP.rom \
-device vfio-pci,host=1:00.1,bus=pcie.1,addr=00.1 \


Failing Q35 version:

#!/bin/bash

qemu-system-x86_64 \
-nodefconfig \
-name "Windows 10 x64" \
-display sdl \
-monitor stdio \
-machine pc-q35-2.5,accel=kvm \
-nodefaults \
-cpu host,hv-spinlocks=8191,hv-relaxed,hv-vapic,hv-time,hv-crash,hv-reset,hv-vpindex,hv-runtime \
-smp cpus=8,sockets=1,cores=4,threads=2 \
-m size=20G \
-object memory-backend-file,id=mem0,size=20G,prealloc=yes,mem-path=/root/vms/hugepages1G \
-numa node,nodeid=0,cpus=0-7,memdev=mem0 \
-rtc base=localtime,driftfix=slew \
-drive id=pflash0,if=pflash,format=raw,readonly=on,file=/root/vms/firmware/OVMF_CODE-efi-vm0.fd \
-drive id=pflash1,if=pflash,format=raw,file=/root/vms/firmware/OVMF_CODE-efi-vm0.fd \
-drive id=cdrom0,if=none,format=raw,readonly=on,file=/root/storage/Win10x64/Win10x64_Enterprise.iso \
-drive id=cdrom1,if=none,format=raw,readonly=on,file=/root/storage/iso/virtio-win-0.1.117.iso \
-drive id=drive0,if=none,format=raw,cache=directsync,aio=native,file=/dev/vg0/vm0 \
-netdev tap,id=net0,vhost=on,fd=3 3<>/dev/tap$(< /sys/class/net/macvtap0/ifindex) \
-object iothread,id=iothread0 \
-device ioh3420,multifunction=true,chassis=0,bus=pcie.0,addr=1e.0,id=pcie.1 \
-device ioh3420,chassis=1,bus=pcie.0,addr=1e.1,id=pcie.2 \
-device ioh3420,chassis=2,bus=pcie.0,addr=1e.2,id=pcie.3 \
-device ioh3420,chassis=3,bus=pcie.0,addr=1e.3,id=pcie.4 \
-device virtio-net-pci,netdev=net0,bus=pcie.0,addr=10.0,id=network0,mac=$(< /sys/class/net/macvtap0/address) \
-device virtio-scsi-pci,iothread=iothread0,bus=pcie.0,addr=11.0,id=scsi0 \
-device scsi-hd,bus=scsi0.0,drive=drive0 \
-device ide-cd,bus=ide.0,drive=cdrom0 \
-device ide-cd,bus=ide.1,drive=cdrom1 \
-device qxl \
-device vfio-pci,host=1:00.0,multifunction=true,bus=pcie.1,addr=00.0,romfile=/root/vms/firmware/Juniper_updGOP.rom \
-device vfio-pci,host=1:00.1,bus=pcie.1,addr=00.1 \


Yes, it is a ugly mammoth script, and I hope to not have inserted any typo while writting it here. Resume: A 4C/8T Processor, 20 GiB RAM with Huge Pages (Hacked in via a NUMA Node), nearly all Hyper-V Enlightenments enabled, Q35 or 440FX Chipset, OVMF Firmware (Splitted CODE + VARS version), VirtIO Network Controller using MACVTAP (Still didn't mastered it, since when I open the VM, I instantaneously lose Network connection on host, which uses a MACVLAN. Will have to check that, again...), and Virtio SCSI Controller with IOThread/x-data-plane. There are also a few PCIe Root Ports, one which is where I attach the Video Card to. The only missing thing is intel-hda and hda-duplex for sound, since I still didn't managed to configure ALSA properly.

The action happens mostly in the last three lines, plus OVMF:

OVMF pure EFI (CODE + VARS)
OR
OVMF with CSM (CODE + VARS)

-device VGA
OR
-device qxl-vga
OR
-device qxl
OR
none

-device vfio-pci,host=1:00.0,multifunction=true,bus=pcie.1,addr=00.0
OR
-device vfio-pci,host=1:00.0,multifunction=true,bus=pcie.1,addr=00.0,x-vga=on
OR
-device vfio-pci,host=1:00.0,multifunction=true,bus=pcie.1,addr=00.0,romfile=/root/vms/firmware/Juniper_updGOP.rom
OR
-device vfio-pci,host=1:00.0,multifunction=true,bus=pcie.1,addr=00.0,x-vga=on,romfile=/root/vms/firmware/Juniper_updGOP.rom
OR
none

What I consider a miracle is that Primary VGA Passthrough is working, even on the broken Q35. I was able to consistently see OVMF POSTing in the Monitor attached to the Radeon, and Windows 10 also works in basic VGA Modes, so its possible to perform a clean install and use it after installed. It even survives many guest reboots (In Q35, still didn't tested more in deep in 440FX), but after a few ungraceful forced guest shut downs, it requires a host reboot. So far, Primary VGA Passthrough works either if I use OVMF pure EFI with the modded Option ROM, or if I use OVMF with CSM and x-vga=on. Note that I suffered pallete corruption issues after using x-vga=on since I don't have the Intel IGP VGA Arbitration patch, but a reboot fixed it.
Where I had absolutely no success with Q35 is trying to install the Radeon Drivers, since without them, the VM is worthless for gaming. I tested the three versions available for the Radeon 5xxx series for Windows 10 (Catalyst 15.7, 15.7.1, which I have in my working Xen install, and Crimson Edition 16.2.1 Beta). Everytime that I get to the Display Driver install stage, be it either when installing the full Catalyst package, or by using Update Driver in the Device Manager, the screen turns black, after a few seconds the Monitor enters Standby, and the VM doesn't provide life signals any longer, so I have to forcefully close it. There is no useful dmesg message besides some kvm [1968]: vcpu0 unhandled rdmsr: 0x641 (Something like that, don't have the precise dmesg output at hand), but seems to be unrelated to this.
I tested a lot of combinations and the results were always the same. Attemping Secondary VGA Passthrough (Booting with VGA or qxl-vga, and the first Radeon variation with no x-vga/romfile. Is pretty much how Xen does it) produced the same results. I also consistently rebooted, or shut down then turned on again the computer, to make sure that the Radeon doesn't get tainted between guest reboots. No matter what I tried, results was always the same.
The only times where I had something that looked success was when I had the idea to install the Display Driver in Safe Mode (Required to enter msconfig to enable it for next reboot), and at that point I could see the Radeon 5700 Series in the Device Manager, but rebooting in normal mode was imposible since it black screened during the Windows logo splash screen, until I removed the Radeon from the VM and booted with VGA/qxl-vga to uninstall the Drivers.

A very important thing is that in order to be able to control the guest without using Synergy, QEMU needs an emulated Video Card of some sort, since that allows the empty SDL window to grab the keyboard and mouse when you click in it, so it works inside the VM - this is an identical behavior to what I do in Xen. If there is no Video Card, the QEMU empty window doesn't grab the input at all, so I can see OVMF POSTing in the Monitor attached to the Radeon but I can't interact with it. The minimalistic workaround is to have at minimum qxl, since it doesn't have an Option ROM so the Firmware ignores it, and if you don't install the QXL Drivers, Windows also ignores it, so it serves as dummy.


So far, just by changing to 440FX plus minor adjustments for the device buses, the Drivers kicked in instantly since they were installed previously (I left the VM in an unbooteable state as Drivers were installed, so they wrecked it). It also works in fully Primary VGA Passthrough mode with the dummy qxl.
My next test wll be a Windows 10 clean install with the latest Crimson Edition Drivers, using qxl to workaround the input grab and just Primary VGA Passthrough from start to finish.


Regardless that I'm happy because I finally got it working at last, I would like to know why it doesn't works in Q35, as I would prefer its realistic PCIe topology instead of PCIe Root Port on 440FX PCI Host Bridge hack.

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]