[vfio-users] An epic wall of text with questions and comments from a Xen user

Zir Blazer zir_blazer at hotmail.com
Sun Nov 22 13:36:03 UTC 2015


During the last two years I have been a happy Xen user with a decently working VGA Passthrough setup. I also have been closely following KVM and QEMU development, but didn't had a real reason to test it. Since now I have a new SSD, I can do some experiments without compromising my current setup (I have a single HD, and everything is there), as one of the things that worries me the most, is to be unable to get my current setup working again if I start to change config files, so a new disk always gives me breathing room as I can start from scratch. Anyways, get ready for massive walls of text, since two years of experiences are worth a lot of text...

So far, there are the reasons why I'm interesed in testing KVM at this point:

1) KVM-VFIO is usually ahead of Xen in features since it can use them straight from standalone QEMU, while Xen has to adapt the features for use with its toolstack. There is a specific niche where KVM-VFIO is very ahead: PCI/VGA Passthrough, where Xen has a critical feature that it does not support: GeForce Passthrough. It is very important since the GeForce 9xx generation is extremely good and usually the main choice for gaming, so lets say that Xen is missing most of this generation potential marketshare by not support it. KVM-VFIO got GeForce Passthrough consistently working via workarounds, and it seems that you don't a lot of trouble getting ahead of whatever nVidia decides to throw at its Drivers so it can detect if its a VM enviroment and refuse to work if it finds so.
Xen could workaround GeForce Passthrough limitations if you were able to do a GeForce @ Quadro hard mod of your Video Card, which is technically better that a GeForce Passthrough since the Quadro Drivers aparrently enable some GPU soft reset optimizations (Quadro's MultiOS feature), so VM reboots didn't presented issues at all. The big problem is that the solder mod required that someone with electronics knowledge had previously tinkering with your specific Video Card model to confirm that it works and tell you the precise instructions to mod it yourself, otherwise it is impossible for a standard end user to do. Also, I recently hear that Xen can do GeForce Passthrough by playing with CPUID flags, but only a developer managed to do it and there are no end user usable instructions.
I think that Xen does have a specific advantage, which is VGA Passthrough of the Intel Processors integrated GPU. It should be better or easier based on info I saw on the developer Mailing List, since there were a lot of patchs from Intel to specifically make it work. They were related to reserve the 00:02.0 PCI Address in the guest.
While I don't have a GeForce, I have a friend that does with a shining new Skylake system with a GeForce 980 (I promise to send a lspci and IOMMU Groups breakdown if I get to touch the system with a Linux LiveCD), and he is interesed in VGA Passthrough, so having some previous experience to help him would be useful.

2) Xen used to work very well with my Radeon 5770 Juniper, but after switching my main VM to Windows 10, I got annoyed due to the Windows Update forced restarts after applying hotfixes (You can delay it like 10 days, but that's not enough for someone that is used to have its machine on for months, and at the absolute deadline when you can't delay it any longer, it restarts regardless of what you were doing). The Radeon has soft reset issues, which are far worse in Windows 10  that in my previous main OS, Windows XP x64. WXP x64 just required two consecutive VM reboots for the Radeon to work properly, in W10 it only works properly on the very first boot. VM reboots caused by Windows forced restarts makes the Radeon in W10 unable to use PowerPlay Power States, thus being noiser since it never enters Idle power modes, and to make it worse, a few times after VM reboots I had bizarre flickering issues that ended up in BSODs until I rebooted the computer itself. Here are my comments about current Radeon 5770 behavior in W10 to Xen Mailing List:
http://lists.xenproject.org/archives/html/xen-users/2015-07/msg00160.html
Since VFIO had some patches for Radeons soft reset (Not sure if it applies to the Juniper GPU, also know as Radeon 5750/5770/6750/6770), I want to check how my 5770 behaves with it. My objetive would be to make sure that I can reboot the same VM and get the Radeon in a fully working state, be it due to VFIO reset patches, or due something else (VBIOS sideloading?). It would be even more wonderful if I can switch the Radeon froma VM with W10 to WXP x64 or viceversa without having to reboot the computer (At least once it worked, but results are totally inconsistent and BSOD is what happens most often). This would make comparing OSes far easier.

3) Xen has OVMF support to create UEFI VMs, but it is not as good as standalone QEMU. Xen does not uses a standalone OVMF binary that I can specify in a config file, it needs OVMF to be builded into it at compile time, so is not easy to test other OVMF versions. The NVRAM is entirely volatile, you have no way to save it. Worst of all, it doesn't seem to work with PCI Passthrough, since everytime I tried to create a OVMF VM with a PCI Device assigned, OVMF didn't POSTed, nor I know of anyone that got it working with PCI/VGA Passthrough on the Xen side of things on more recent versions.
Regardless, I can't try to imitate the VFIO proposed legacy-free VM setup (OVMF Firmware, Video Card with UEFI GOP and an UEFI Boot capable OS) since my Radeon 5770 does not support UEFI GOP, and while I know that some users managed to mod their VBIOS to add it, I didn't found anyone with a Juniper that managed to do so. The only Juniper based cards that I'm aware that has UEFI GOP support are two models coming from Apple itself, but I hear some comments about the Juniper Flash ROM capacity being too small to fit the required code.

4) Since Xen is more enterprise oriented that KVM-VFIO, they usually don't spend development time for some type of consumer features. Compared to standard VMMs, Xen is missing sound support, which is rather sad. It seems to be able to use QEMU to emulate Sound Cards like the Intel HD Audio, but it has no way to return the produced sound  to the host. This means that in order to have sound on my main VM, the easiest way is to do PCI Passthrough of the Motherboard integrated Sound Card, and screw all the other VMs. I can't create a simple VM to safely browse the Internet or play old games that don't need a 3D GPU (Like if I wanted to use a MS-DOS/W98 QEMU VM as a faster DOSBOX replacement) merely because the lack of sound.
There were some plans for paravirtualized sound support, but no idea what happened to them. I see this as Xen biggest drawback for a consumer setup, as you can't get out-of-the-box sound from multiple VMs with the host as the sound mixer.

5) Being able to work with either Xen or KVM will mean that I can test if any specific issue I have is reproducible on the other VMM, as I almost always meet quircks everytime I want to do something. Performance benchmarks of the same setup under different VMMs would also be very interesing. Phoronix ocassionally do benchmarks of different VMMs, but they don't use VGA Passthrough or any other advanced features to see if there are important differences. So far, Xen seems to have less CPU overhead and KVM more IO performance.
How viable it is to test that would depend if one can launch the VM created by another VMM and switch between them in a seamless way . Since both Xen and KVM-VFIO uses QEMU for device emulation, I suppose that there should be a basic layer that is 100% compatible (Big exception is that Xen does not support QEMU Q35 Chipset, only FX440). I think that the only issues I'm aware of are installing Xen or QEMU paravirtualized Drivers (Xen GPLPV or Windows PV Drivers and QEMU VirtIO), those may need to be uninstalled first. There was a project to use VirtIO in Xen, but no idea if its still in development.

6) The Xen package maintainer for Arch Linux is usually very busy, so sometimes the package can get outdated by long periods, which is the current status of it. When Xen 4.5 was released, his Xen package was actually the best one after checking other distributions including Xen, since the default compiling options for it included required patches to be able to build OVMF, SPICE, and even an EFI binary for UEFI Boot. However, currently, Xen released both the 4.5.2 hotfix and the newer 4.6 version, but the package is still behind. Not only that, since Arch Linux is a rolling release, you usually have a lot of new libraries and stuff that can break using or building older things. In Xen case, both OVMF and SPICE got broken by newer libraries (Or was it a newer GCC version?), so either you are a developer that can fix the code by yourself, or you need to wait for someone else to do it. Effectively, this means that to build Arch Linux Xen right now, you have to disable some of the modern features, which places it below to what it was able to do 6 months ago or so when the package was fresh.
For me, this means that it rules out a fresh install of Arch Linux into the SSD since I would lose some Xen features which I use, the only option to keep using Xen is to clone my host partition to the SSD. The other option, as you can guess, is to do a fresh install and go straight for KVM, which is what I'm interesed to do.


Also, I made a installation guide for Arch Linux + Xen last updated for Xen 4.5 release, which is the exact same setup that I'm using in the host right now. You can't reproduce it due to what I stated earlier, but I think that someone may want to read it to see my experience with Xen and how I like to do things. Reproducible results are very important! And chances are that except KVM-VFIO specific things, the rest will be done as close to identical as possible.
http://pastebin.com/hTJ0EgNZ
And here are some pics. I got very lazy and at the end since there weren't a lot of people interesed in using Xen for VGA Passthrough, so I didn't migrate everything to a properly formatted wiki guide as I wanted:
http://lists.xen.org/archives/html/xen-users/2015-04/msg00071.html



While I could get most things working in a usable state, the main issue that I currently have is that after two years using this setup, I couldn't figure out how to actually get a productive system out of all this complex virtualization setup. In some way, I feel like I'm losing performance due to overhead but didn't win anything particular compared to merely running the ocassional VM inside a native Windows, and worse, the complexity makes it overally much harder to use, as it is anything but confortable. This is because I'm mostly a everyday Windows user that doesn't have a real reason to use Linux, but I won't seriously consider going back to native (Except for benchmarking) since I'm bend that Windows must not have direct Hardware access (Windows 10 telemetry and UEFI 2.5 universal Firmware flashing capabilities are reinforcing my theories than it could be risky if a huge exploit appears). Anyways, I think that nearly everything needs a revisit to try to get the most out of it. The goal should be to have VMs that can give an end user experience as close to native as possible. Lets begin with my computer:
	HARDWARE
		Processor: Intel Xeon E3-1245V3 Haswell (4C/8T, 3.4 GHz, VT-d)		http://ark.intel.com/products/75462/
		Cooling: Cooler Master Hyper 212 EVO w/its thermal grease			http://www.coolermaster.com/cooling/cpu-air-cooler/hyper-212-evo/
		Motherboard: Supermicro X10SAT w/Chipset C226 Rev.C2, BIOS R2.0		http://www.supermicro.com/products/motherboard/Xeon/C220/X10SAT.cfm
		RAM Memory: 2x AMD Radeon RP1866 2 * 8 GB (32 GiB Total) Unbuff./No-ECC	http://www.radeonmemory.com/performance_series.php
		Storage SSD: Samsung 850 EVO 250 GB						http://www.samsung.com/global/business/semiconductor/minisite/SSD/global/html/ssd850evo/overview.html
		Storage HD: Seagate HDD.15 4 TB ST4000DM000					http://www.seagate.com/internal-hard-drives/desktop-hard-drives/desktop-hdd/
		Video Card: Intel HD Graphics P4600 (Integrated)
		Video Card: Sapphire Radeon 5770 FleX 1 GB GDDR5 (No UEFI GOP)		http://www.sapphiretech.com/productdetial.asp?Pid=BE168B2D-E158-4507-9F22-3ED4792082B8&lang=eng
		Monitor: Samsung SyncMaster P2370H (Connected via DVI to Radeon 5770)	http://www.samsung.com/us/business/support/owners/product/P2370H
		Monitor: Samsung SyncMaster 932N+ (Connected via VGA to Intel IGP)		Absolutely NO INFO in Samsung website,as if it didn't ever existed...
		Power Supply: Seasonic S12II Bronze 520W					http://www.seasonicusa.com/S12II-350-430-520-620%20Bronze.htm
	SOFTWARE
		Boot Loader: Gummiboot (UEFI Boot)
		Hypervisor: Xen 4.5
		Host (Dom 0): Arch Linux
		Linux Kernel: 4.0.1

My Motherboard has a very nice System Block Diagram on Page 1-8 of the Manual:
http://www.supermicro.com/manuals/motherboard/C226/MNL-1544.pdf 
This is my system IOMMU Groups, lspci and lspci tree:

/sys/kernel/iommu_groups

/sys/kernel/iommu_groups/0/devices/0000:00:00.0

/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.1

/sys/kernel/iommu_groups/2/devices/0000:00:02.0

/sys/kernel/iommu_groups/3/devices/0000:00:03.0

/sys/kernel/iommu_groups/4/devices/0000:00:14.0

/sys/kernel/iommu_groups/5/devices/0000:00:16.0

/sys/kernel/iommu_groups/6/devices/0000:00:19.0

/sys/kernel/iommu_groups/7/devices/0000:00:1a.0

/sys/kernel/iommu_groups/8/devices/0000:00:1b.0

/sys/kernel/iommu_groups/9/devices/0000:00:1c.0

/sys/kernel/iommu_groups/10/devices/0000:00:1c.1

/sys/kernel/iommu_groups/11/devices/0000:00:1c.3

/sys/kernel/iommu_groups/12/devices/0000:00:1c.4

/sys/kernel/iommu_groups/13/devices/0000:00:1d.0

/sys/kernel/iommu_groups/14/devices/0000:00:1f.0
/sys/kernel/iommu_groups/14/devices/0000:00:1f.2
/sys/kernel/iommu_groups/14/devices/0000:00:1f.3
/sys/kernel/iommu_groups/14/devices/0000:00:1f.6

/sys/kernel/iommu_groups/15/devices/0000:02:00.0

/sys/kernel/iommu_groups/16/devices/0000:03:00.0

/sys/kernel/iommu_groups/17/devices/0000:04:01.0

/sys/kernel/iommu_groups/18/devices/0000:04:04.0

/sys/kernel/iommu_groups/19/devices/0000:04:05.0

/sys/kernel/iommu_groups/20/devices/0000:04:07.0

/sys/kernel/iommu_groups/21/devices/0000:04:09.0

/sys/kernel/iommu_groups/22/devices/0000:08:00.0

/sys/kernel/iommu_groups/23/devices/0000:09:00.0
/sys/kernel/iommu_groups/23/devices/0000:0a:00.0

/sys/kernel/iommu_groups/24/devices/0000:0b:00.0


00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v3 Processor DRAM Controller (rev 06)					-- GROUP 0
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller (rev 06)		-- GROUP 1
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3 Processor Integrated Graphics Controller (rev 06)	-- GROUP 2
00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller (rev 06)			-- GROUP 3
00:14.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI (rev 05)				-- GROUP 4
00:16.0 Communication controller: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1 (rev 04)		-- GROUP 5
00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I217-LM (rev 05)						-- GROUP 6
00:1a.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 (rev 05)				-- GROUP 7
00:1b.0 Audio device: Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller (rev 05)			-- GROUP 8
00:1c.0 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #1 (rev d5)			-- GROUP 9
00:1c.1 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #2 (rev d5)			-- GROUP 10
00:1c.3 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #4 (rev d5)			-- GROUP 11
00:1c.4 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #5 (rev d5)			-- GROUP 12
00:1d.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 (rev 05)				-- GROUP 13
00:1f.0 ISA bridge: Intel Corporation C226 Series Chipset Family Server Advanced SKU LPC Controller (rev 05)			-- GROUP 14
00:1f.2 SATA controller: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] (rev 05)	-- GROUP 14
00:1f.3 SMBus: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller (rev 05)				-- GROUP 14
00:1f.6 Signal processing controller: Intel Corporation 8 Series Chipset Family Thermal Management Controller (rev 05)	-- GROUP 14
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Juniper XT [Radeon HD 5770]			-- GROUP 1
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Juniper HDMI Audio [Radeon HD 5700 Series]			-- GROUP 1
02:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 01)					-- GROUP 15
03:00.0 PCI bridge: PLX Technology, Inc. PEX 8606 6 Lane, 6 Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)			-- GROUP 16
04:01.0 PCI bridge: PLX Technology, Inc. PEX 8606 6 Lane, 6 Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)			-- GROUP 17
04:04.0 PCI bridge: PLX Technology, Inc. PEX 8606 6 Lane, 6 Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)			-- GROUP 18
04:05.0 PCI bridge: PLX Technology, Inc. PEX 8606 6 Lane, 6 Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)			-- GROUP 19
04:07.0 PCI bridge: PLX Technology, Inc. PEX 8606 6 Lane, 6 Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)			-- GROUP 20
04:09.0 PCI bridge: PLX Technology, Inc. PEX 8606 6 Lane, 6 Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)			-- GROUP 21
08:00.0 USB controller: Renesas Technology Corp. uPD720201 USB 3.0 Host Controller (rev 03)					-- GROUP 22
09:00.0 PCI bridge: Texas Instruments XIO2213A/B/XIO2221 PCI Express to PCI Bridge [Cheetah Express] (rev 01)		-- GROUP 23
0a:00.0 FireWire (IEEE 1394): Texas Instruments XIO2213A/B/XIO2221 IEEE-1394b OHCI Controller [Cheetah Express] (rev 01)	-- GROUP 23
0b:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)					-- GROUP 24


-[0000:00]-+-00.0
           +-01.0-[01]--+-00.0
           |            \-00.1
           +-02.0
           +-03.0
           +-14.0
           +-16.0
           +-19.0
           +-1a.0
           +-1b.0
           +-1c.0-[02]----00.0
           +-1c.1-[03-0a]----00.0-[04-0a]--+-01.0-[05]--
           |                               +-04.0-[06]--
           |                               +-05.0-[07]--
           |                               +-07.0-[08]----00.0
           |                               \-09.0-[09-0a]----00.0-[0a]----00.0
           +-1c.3-[0b]----00.0
           +-1c.4-[0c-44]--
           +-1d.0
           +-1f.0
           +-1f.2
           +-1f.3
           \-1f.6


Here are the issues that I can think about, many that should apply regardless if I use Xen or KVM. This must be part of the planning of anyone attemping to be a Virtualization Overlord:


DEALING WITH MULTIPLE MONITORS AND THE PHYSICAL DESKTOP
I have two Monitors. Before my current VM-based setup, in my previous system I used them simultaneously as Dual Monitors for a native Windows XP. Currently, I have the 23' Monitor connected to the Radeon for VGA Passthrough, and the 19' is connected to the IGP for the host (Dom0) to use and the ocassional secondary VM (Like when I want to open my old main VM to get some data out of it, or a VM specifically for production applications, since I don't want to pollute those with gaming and browsing). This essencially means that I'm losing an entire Monitor the vast majority of the time, unless I'm using another VM, which I usually don't.
A thing which I have been thinking about is connecting both Monitors to the Radeon, then use SSH to access Dom0, and if I want to launch another VM, I could use VNC or SPICE to access it from inside the main VM, so they would be a normal window that I can drop and maximize in the 19' Monitor as currently. This approach has two cons: Compared to the standard QEMU SDL window, the Mouse cursor is laggy in SPICE, and its EXTREMELY noticeable - I didn't tested with VNC as SPICE is supposed to be better, nor I tested with SPICE mouse "client mode", since my work VM uses WXP x64 and the SPICE guest drivers installer complains that the OS is unsupported (I recall that I asked in SPICE IRC and got told to try to uncompress the installer and try to manually install the individual components, but didn't tested back at that time and I don't recall the instructions any more. Worst case scenario, it could be "workarounded" by installing the more resource heavy W7 if the SPICE guest drivers and client mouse mode are worth it). The other con is that since the Radeon should be grabbed by VFIO or xen-pciback before Linux finishes booting, the system would be effectively headless until I start the main VM (I suppose this last thing could be automatized via a script to start the main VM after every boot).
Reelevant to the last con (And VGA Arbitration), is that I have the IGP configured as Primary VGA in the BIOS. The Monitor attached to the Radeon doesn't gets signal at all unless its from the main VM since xen-pciback grab it first (If I don't use xen-pciback, the Monitor turns on during Linux boot since the Driver initializes the Radeon, if I recall correctly). The Motherboard even has an option to optionally load or not a PCI Option ROM (Either in Legacy or UEFI Mode) if found on a specific slot. I think that I have set it to load as normal for the slot where the Radeon is, but the BIOS doesn't use it for video output anyways during POST nor I hear its Fan spinning, I suppose that since the IGP gets priority. As such, Radeon should be as fresh and uninitialized at it gets since neither the BIOS or the Linux host seems to bothers with it. I don't know if in the previous approach, it would work the same if I have to set the Radeon as Primary VGA, if I can see the POST and enter the BIOS, then do VGA Passthrough of the Radeon during the same session with no extra issues (Like broken PowerPlay as I have on VM reboots). Otherwise, I would be forced to replug a Monitor to the IGP if I want to enter the BIOS.
I also hear about Synergy to seamlessly move Keyboard and Mouse between several computers (Or in our case, VMs), and also has Clipboard Sharing capabilities (That SPICE also does, not sure if requires guest drivers installed too), but didn't tested it to see how it works in general usefulness and perceived lag or delay like when using SPICE to see the other VMs. I considered too that some people recommends a KVM (Keyboard Video Mouse) switch, but I never saw one where I live at, nor I think that they will be cheap, most likely they will cost a kidney. So I prefer a 100% Software approach.

A powerful solution to all the previous issue would be having a Video Card that the Linux host can boot with, then release and reset it just before launching the VM that is going to get it via Passthrough, then automatically grab and reinitialize it when the host detects that the VM closes. Even if the guest doesn't support hotplug (I doubt Windows does), if the host can retake it, having both Monitors connected to the Radeon wouldn't be an issue at all for as long that the host grab them and show the screen there after the VM closes. I don't know how doable it is since you would need both a Video Card with working reset capabilities and a Linux host that does well with GPU and display hotplug, but I suppose than this would be the closest thing to native feel since I can have both POST / BIOS, host, then the main VM, perfectly using both Monitors when their turn comes. Only thing missing would be being able to Ctrl + Alt to the host, but that shouldn't be needed due to remote SSH, but a possible problem is that it would be quite hard to retake control of the computer if the main VM crashes since I will have to figure out how to blindly Alt + Tab to the Terminal to use a command to force QEMU to destroy the VM, otherwise I would be forced to reboot.

Another thing to note is that the position of the Monitors on the physical Desktop is extremely important, too. I have the 23' one right in front of me, since that's where I'm looking the vast majority of the time. As I'm not used to turn around the head/neck to see the 19', when I use the secondary VMs, its very uncomfortable to look at and stresses me. Its even worse since because the 19' Monitor is 1280*1024, its a lot less of Monitor surface compared to the main 23' Monitor that does 1920*1080, so there is substantially less room for things on screen. This essencially means that the secondary VMs are both unconfortable to turn to see, and less productive  that when I merely used native Windows with Dual Monitors to keep some stuff at hand on the other screen and moving the windows from screen depending on how much surface they needed.

Finally, I suppose that the ultimate solution would be that if there was a way for the host to snoop and copy the guest GPU framebuffer similar to how it works with an emulated GPUs like the Cirrus, but with a Passthrough one. You could have a host-controlled Video Card which has all the Monitors attached to it, then arrange VM windows and make a composite screen surface as you see convenient. I hear that Intel was planning for XenGT/KVMGT to have a snoopable guest framebuffer for this purpose, so it can integrate with VNC and SPICE:
https://github.com/01org/KVMGT-kernel/issues/13
This will absolutely be a killer feature when it hits...


SWITCHING ACTIVE VM
Another thing which is rather annoying on everyday usage is switching around VMs. My original idea was that using multiple VMs should be like Alttabing between them, and I got half decent results with this. Its also possible to make the VM window fit perfectly Fullscreen (If emulated GPU is set to 1280*1024, that's my 19' Monitor max resolution) if I right click the top left icon on the status bar and untick Un/Decorate.
The way that Arch Linux uses X.org if minimalistic installed, is that if you want, you can start one X.org for every tty console (Virtual Terminal). So, for example, in my setup, during normal boot it goes straight to a tty1 console with autologin, then I type startx to launch X.org, open a Terminal, create a VM, then use Ctrl + Alt + F2 to go to tty2, log in, use startx to get into X.org, open a Terminal, and create a VM there, and repeat. Since Ctrl + Alt overrides the guest controls and goes straight to host, if you do Ctrl + Alt + Fx, you can jump straight to another tty from inside a VM (Which is why I consider it superior to multiple Workspaces in the same tty). This is effectively as close as possible to my original idea of Alttabing between VMs running in Fullscreen mode. However, it has a few shortcomings. Imagine the following situation (Remember that I have a Monitor attached to the IGP for the host and another exclusive for the Radeon):

tty1 : X.org with a QEMU SDL window of the VM with VGA Passthrough
tty2: X.org with a QEMU SDL window with a secondary VM
tty3: X.org with a QEMU SDL window with another secondary VM

The first one is that X.org doesn't seem to remember the active window of the others ttys. For example, if I'm on tty1 (Be it on host or the VM) and do Ctrl + Alt + F2, I jump to tty2. However, while I can instantaneously see the VM in Fullscreen, the active window behavior is not consistent. Sometimes the VM is not active and what moves is the X.org black cursor instead, this means that I have to do an extra click before getting the VM window active. Other times the VM is active, but X.org black cursor persist on top of it, so I have to Ctrl + Alt to the host, then click the window again for it to dissapear. For quick Alttabbing this would be extremely annoying, as behavior is inconsistent. Ideally, I would want it to be that when I do Ctrl + Alt + Fx, I get the VM in Fullscreen (Which it does, since I learn that the Status Bar could be disabled with Un/Decorate so the VM SDL window starts at 0,0), and already active.
Then you have the VM with VGA Passthrough, which is a very special case. When the VM gets created, QEMU launches a SDL window, where you can see SeaBIOS POSTing on the emulated GPU screen. Then after Windows splash screen, the Monitor connected to the Radeon takes over, while the QEMU SDL window resizes and merely becomes an empty black window. On some ocassions I tried to keep enabled the emulated Cirrus and use Extended Desktop, this way I could use both Monitors for the main VM (And would solve the previous Dual Monitor thing), but this proved rather BSOD prone (Didn't tested it in-depth. May be worth to try again with SPICE qxl).
The issue with this main VM, is that while it has nothing to show on tty1 besides the black window, I still have to Alttab and click the black window to make it the active VM, and thus its effectively wastes the entire secondary Monitor. If I'm on tty1 using the main VM and I decide to do Ctrl + Alt + F2/F3, I can still see whatever action was going on the main VM on its dedicated Monitor, then see and use the secondary VMs on the other one. But since I have to Ctrl + Alt + F1 to tty1 to actually use it, what its on the Monitor attached to the IGP gets switched so I can't use the main VM and see a secondary VM simultaneously. Ideally, there should be a key combination with custom behaviator to transfer control to the main VM without actually switching tty screens.
I have absolutely no idea on how to do it, the best I could think about is on something like using the main VM and a secondary VM on the same tty, then Alttab to an invisible window with 100% transparency, or something like that. Ideal behaviator would be that if I'm on tty2/tty3 and do Ctrl + Alt + F1, control transfer to that VM but without replacing the tty2/tty3 screen with tty1. I suppose that everyone should get the idea by now.

Another consideration would be for Linux guests. Using multiple tty in them can't be done with Ctrl + Alt + Fx since control is relinquished to the host everytime you do Ctrl + Alt, so the only way to change ttys inside a Linux guest is with the console command (chvt). I still didn't tried remapping the key combination because I don't know if there is a good one that doesn't conflicts with something that already exist. I suppose than for a Linux guest, by default it would be better to use Workspaces instead of ttys.
Additionally, while doing my Xen Install Guide, I used Nested Virtualization to easily take Screenshots and test nearly the entire procedure from start to finish. However, If I'm on a VM inside a VM, doing Ctrl + Alt relinquishes control to the host, not to the nested host, so if you want to switch to the VM inside the Nested host, you have to Ctrl + Alt + Fx, then click the VM window, then click the other VM window. Key combinations with a nested host will be a painful experience.
Not sure if someone already thought of new, universal key combinations that are easy to use and don't conflict with the default config of the possible mainstream uses. I don't think that Ctrl + Alt + Fx is bad, for as long that you use Windows guest, or single-tty Linux guests. But still, behavior could be improved to always transfer control to the VM window.


VGA ARBITRATION WITH UEFI GOP FROM INTEL IGP
A thing which puzzles me is the entire VGA Arbitration issue. Since I just heared about how much of a problem is from the VFIO side of things, I don't know how Xen deal with this at all, nor if I ever had issues related to it, so I have a lot of holes in this topic when picking VFIO related info:
http://vfio.blogspot.com.ar/2014/08/whats-deal-with-vga-arbitration.html
What I understand from VGA Arbitration is that its an issue when you have multiple Video Cards that are trying to use simultaneously the VGA protocol, something that is used mostly at either host POST/boot sequence, or VM POST/boot sequence, since when the Video Card Drivers take over, it should aim to disable legacy VGA and use something else. Doesn't this effectively means that I can freely have either the host or a single VM using a VGA Passthrough Video Card with legacy VGA full time with no issues, for as long as all the others either uses UEFI GOP, or wait for OS Drivers to initializate it? Intel provides UEFI GOP for their IGPs, so I suppose that VGA shouldn't need to be touched at all if I do host UEFI Boot with UEFI GOP (Like what you need for Windows 8 Ultra Fast Boot). What I don't get, if that when you later claim that the IGP VGA disabling mechanism is broken, it applies even if the Intel IGP boots using UEFI GOP (Since it means that VGA gets enabled AFTER it boots), and the issue is entirely Driver related.

At the moment with my Xen setup, for the host I use UEFI Boot, and Arch Linux gets the Intel IGP. At some point I had BIOS Boot with Syslinux (Since Xen got full UEFI Boot capabilities since Linux Kernel 3.17, it didn't worked for me before it), and it also worked in the same way - could even test it if needed, since I have a hybrid BIOS/UEFI installation. I also have installed the X.org xf86-video-intel Driver, which I suppose is the famous i915. At the guest side, my Radeon 5770 doesn't have UEFI GOP support, nor I managed to get OVMF to POST with it at all because PCI Passthrough in Xen OVMF seems to be broken, so it means that I have to use SeaBIOS, no exceptions. I installed Arch Linux mesa-demos package to have the glxinfo utility, and when I do glxinfo | grep render as stated here:
http://dri.freedesktop.org/wiki/glxinfo/
...I get:
direct rendering: Yes
OpenGL renderer string: Mesa DRI Intel(R) Haswell Server
It looks like DRI is working to me even if I'm using SeaBIOS in the guest...

I suppose than this is because Xen is limited to do Secondary VGA Passthrough, which as far that I know is identical to standard PCI Passthrough. The VM is created with an emulated GPU (Either Standard VGA or the Cirrus Logic), and you see SeaBIOS POSTing on the host screen, up to the initial WXP x64 or W10 splash screen. Its when the Radeon Drivers takes over that the Monitor connected to it turns on, and at that moment Windows stops outputting video using the emulated GPU since its disabled in the Device Manager, while in the host screen, the QEMU SDL windows resizes and gets completely black.
Xen can also do Primary VGA Passthrough, that means that you actually see SeaBIOS POST on the Monitor connected to the Video Card like it seems you do with VFIO and OVMF, but this mode is limited to some Quadros.
As far that I can conclude based on all this info, is that the Passthrough mode that I am using does NOT load PCI Option ROMs during VM POST at all, which are the ones responsible for setting up the legacy VGA mode. the Drivers are doing all the work initializing the Radeon. I would say that in my system, VGA isn't touched at all, and I would expect that what Xen does is doable in KVM-VFIO, too, so I think that doing Secondary VGA Passthrough in guests is another way to workaround VGA Arbitration. The difference would be that I would try to use OVMF in the guest so I can UEFI Boot, and since I can wait for the Drivers to initialize the Radeon, UEFI GOP support may not be needed at all (At the cost of OVMF POST screen).

USING PERIPHERALS
At the moment, I'm using QEMU emulated PS/2 Mouse and Keyboard. This is usually fine, but sometimes I miss the extra multimedia Keyboard buttons to do things like launching the Calculator (I also have a gaming Mouse with lots of buttons but I never bother with those). I didn't really researched what are the possible options to get fully functional Keyboard and Mouse in at least the main VM without losing functionality.
>From a performance and latency perspective, sometimes I hear suggestions about doing PCI Passthrough of an USB Controller so I can have everything working as native, but then it means that the host wouldn't be able to use these devices. It would be acceptable if it was possible to get it working in a hotplug fashion like I said that could be useful to do with the Video Card, and access the other VMs via VNC or SPICE. I believe than the USB Controller Passthrough for input would be very good idea but only in a multiseat enviroment, not to share it between host and several VMs.
Another thing is that its annoying to use removable devices like USB Flash Drives in VMs. Recent versions of Xen lost PVUSB capabilities, and I think that you can't currently hotplug USB devices, you can attach them only if available during VM creation. So if I want to move data from a USB Flash Drive to a VM, I have to use the host as intermediary by mounting Flash Drives in the Linux host, mounting an accesible Shared Folder from the Windows VM in the host, and just then I'm actually able to move the data. Besides that is extremely annoying to set up all that, since Windows doesn't directly see the USB Flash Drive, I can't make use of tools to write booteable ISOs. I workarounded this using SPICE USB Redirection, but it requires to compile Xen with SPICE support, enable SPICE for the VM, and launch a SPICE client. It would be rather useful if I could permanently set one of the frontal USB ports exclusive to the main VM, so things gets automatically redirected there - USB Controller Passthrough may do the trick for removable devices. This is very useful since if I want to use Windows based tools to format and make a bootable a USB Flash Drive, I can do so for as long as Windows sees it directly.


PLANNING STORAGE
A problem when dealing with multiple VMs is that you eventually have several copies of the same things all over the place, and efficient organization is either hard or near impossible - its a waste of time when you have to copy things from one VM to another, worse if you have to start other VMs to retrieve info. This is a rather complex topic since it involves both the partition stage and everyday usage planning. For my old Xen guide I had written a lot about how I believe that partitioning could be done.

As host, I like a minimalistic one with nothing but the bare Arch Linux install to start X.org (Or better, Wayland in some future) with OpenBox and launch VMs, as its sole purpose is to be an administrative domain, not a full OS - the less it does, the more secure it will be. Chances are that the host partition itself could also be rather volatile, since for host maintenance, I prefer to format and reinstall Arch Linux from scratch, as doing updates every a few months usually break things like it happened before with Xen (If its working, don't fix it philosophy). Its size requeriments are small as the host partition just needs room for itself and maybe system logs and such. For reference, my one year old host partition is just using 5.2 GiB, and that includes Xen sources. Indeed, what usually makes storage requeriments skyrocket is downloading source and building it there, Linux Kernel was around 1 GiB of sources.

Currently, I have a 4 TB HD, which is GPT formatted and has three physical partitions: A 1 GiB ESP (EFI System Partition), which is mandatory for UEFI Boot, a 15 GiB EXT4 partition for the minimalistic Arch Linux host (Xen Dom0), which today I believe is a bit oversized and could be 7-10 GiB, and a massive 3.x TiB LVM partition where I can create, expand, snapshot or delete volumes, either as storage for the host (Currently a single big EXT4 volume for OS installation ISOs and file-backed VMs, always mounted in the host via fstab), or for VMs (As raw volumes). At that time this was the best and easier partitioning scheme I could come up with, as it gives me flexibility to allocate space as I need it and get very good performance out of it (File-backed VMs SUCKS in disk IOPS, try these with tons of small files and you will know what I mean...).

Other options that I considered:

- Using another File System like BTRFS for the host partition, but I didn't feel there was any real advantage compared to EXT4 maturity, performance and compatibility.
- Using ZFS for the storage partition, with ZVols as LVM volumes replacement. While I love all ZFS and BTRFS next-generation features for data reliability, I dropped the ZFS idea since it seems to be a pain in the arse to set up. It is harder that it should be due to licensing issues, as ZFS has to be used as an external Driver, not an integrated part of the Linux Kernel. ZFS also has a big overhead and is rather resource heavy, so you have to give it plenty of CPU and RAM resources for cache (Which I have to spare, but I don't know how many resources it actually needs and didn't feel like experimenting how it scales). I don't know either if ZFS is worth using in a single HD, since the bitrot protection requires at least two HDs to be really effective. Nor I know about ZFS general performance vs traditional File Systems like EXT4, or ZVol vs LVM overhead, since I think all benchmarks I ever saw about it, used RAID with multiple HDs and unknow cache configuration. Basically, I have a hard time trying to think if all the extra complexity and resources to get ZFS running is worth it or not in a small setup.
- Using two partitions instead of three, with the host partition being part of the LVM partition. This would allow me to use snapshots for the host, which is a rather useful feature considering Arch Linux rolling release nature, and that every now and then a new library breaks something (Which is what happened with Xen). However, it requires some extra work to set up booting from a LVM partition. Also, it is vulnerable to possible LVM corruption issues, which a physical partition doesn't have to deal with it, making recovery easier if needed (One less layer to deal with).
- Using a USB Flash Drive as host, similar to how the unRAID OS works, and use the HD as whole-disk ZFS with no ESP or host partition at all. However, only recently I adquired a big enough USB Flash Drive, previously I had only a 1 GiB one, but using it for the host install will make it bound to this computer. I think that having a USB Flash Drive as host would also allow me to do PCI Passthrough of the SATA Controller to a VM if I wanted to do so, but I'm not 100% sure since it shares the IOMMU Group with other very basic devices (SMBus and Thermal Management). I recall some people having a VM with a SATA Controller with multiple HDs specifically for a NAS type distribution, but I have no idea how they get the Passthrough working since the Intel Chipsets from consumer platforms usually have the SATA Controller with ugly companions.

Essencially, I choosed what I described earlier since that was rather simple to do, but I'm sure that there are better choices. Those are what I want to research and implement.

Since now I have a new SSD (Which is still virgin), I have more choices that previously. For obvious reasons, the new host OS should be in the SSD (Unless for some reason I decide to go for the USB Flash Drive host), and chances are that I repeat the simple three physical partitions structure. As I wouldn't need the HD to be bootable (Which means a partition for the ESP), I was thinking that it could be made a whole-disk ZFS instead of merely a ZFS partition. I don't know if it is worth for me to do so, all the previous ZFS statements applies. Regardless, doing whole-disk ZFS is easier said that done, since to repartition the HD I have to move a lot of data to several smallers HDs among my family members computers then back to it after formatting. I would do it if someone can convince me that going ZFS is a must, otherwise, the HD would stay as it is (Including my working host with Xen, in case I can't get something running with KVM).

The other thing is that having a SSD + HD complicates a few things compared to just a HD, its more complex than my current arrangement that was basically 1 volume = 1 VM. Since the SSD is mostly useful for the OSes and capacity comes at a premium, I expect to have secondary VMs that are just a single volume from either the SSD or HD as now, and others that will see two virtual disks, the OS part being installed in the SSD so they boot fast, and a HD volume for storage. This way, for as long that I don't make volumes unnecesarily big, I can take advantage of SSD speeds for as many VM OSes as possible. I expect that at least half of the SSD would go straight for the main VM, as modern Windows OSes and games takes up a lot of space.
Another thing which is critical, is that in order to NOT ridiculous make partitions too big, you need a practical way to resize both the LVM volumes and the File Systems contained within them. For example, my very first main VM was a file-backed VM with rather poor IOPS, and also undersized, but I didn't know how to expand it (Or migrate it to a LVM volume, which should be worth a dd command with proper parameters), so I created a few 80 GB or so files to act as raw images and attached it to the VM as they were getting filled. The end result was that the VM has 3 or 4 slow virtual HDs, which was impractical. After starting from scratch with LVM, I later learned how to expand a LVM volume, and also to boot the VM with the GParted ISO to expand the File Systems contained within it (NTFS for Windows VMs). This gave me a lot of breathing room since I could effectively resize, and didn't had to ridiculous overprovision.
I recall some statements claiming that it was best to have a single partition per file or volume, even if you are able to create a full MBR or GPT structure in the virtual disk. I think that this is viable, except for VMs that are going to use UEFI, since I prefer the ESP (EFI System Partition) to be part of the main VM disk instead of a detached device, as they're too closely related to be separated.
Name planning is rather important, since the idea is that you can recognize what are the critical virtual disk components of a specific VM (Example: You have the OS in one volume and the storage partition in another one, but it is worthless as a standalone device since modern applications like to install the application files in the folder that you tell it, but a lot of user-specific files, settings and cache are part of the Windows partition. You need both to get it fully functional).
This pretty much sums up local VM storage...

...now we have shared storage. There are a whole bunch of static data, namely ISOs, music, videos, photos, and such stuff, that I have scatted around in an extremely disorderly way. Since I never thought about this earlier, I usually leave everything in its own VM, and if I need to retrieve it later, I have to launch the VM (I know that you can do this host side with a loopback device but requires more commands that I memorized). Every VM has its Shared Folder that I use as a port for moving stuff between VMs, but this is merely to make data available, not to order it in any way. Static data should be centralized instead of in the VM that I used it last time, and even worse, since I have different versions of files (Or redundant copies) depending on where I worked with them last time, since I copy things between VMs instead of moving them.
Sometimes, even moving data around is painful, since due to not having USB Redirection ready (I usually don't enable SPICE for most VMs since I use them locally and SDL does that better), I also have to mount both the USB Flash Drive and the VM Shared Folder in the host and transfer files between them. That would also apply if I were to use loopback devices or mount VM LVM volumes directly on the host.
As you can see, my data management in this very moment is a utter disaster, and I need a solution for it. The best idea, based on how I see other guys doing it, is having a File Server, that should have all the shareable data. Some people use external NAS, but with virtualization you may as well do it on the same computer. Then there comes a whole bunch of questions, regarding what is the best way to do it...
Who is best suited to be the File Server, and how I should operate it? Should the host do it directly, giving it more tasks that merely being the administrative domain as I said earlier? Should I create a VM specifically to use as File Server?
If I create a VM, should I give it the HD for whole-disk ZFS? If I have the storage volumes of the other VMs inside ZVols in that HD, does that means than this File Server VM would make a sort of dependency chain since it is a prerequisite that I start it before the other ones?
Should I use any specific distro for a File Server, any mandatory feature set, config or packages that are required? Can the access be granularized in some way so the File Server differenciates between others VMs (Which may have full r/w access) and other computers from the network, so I can whitelist/blacklist what I want to share and to whom?
If going for ZFS, how much resources should either the host or VM need?
Those are lots of questions...

And I'm not finished yet, there is more. We also have data that is usually bound to a specific VM, like the applications or games that you install, since those usually add stuff to Windows Registry, Start Menu and Desktop Shortcuts, and a lots of folders in different places (Program Files, Documents and Settings, and some also annoyingly puts extra folders into My Documents). For things that requires installation, you can't simply "move" some folders to other VM and expect it to work, you usually have to install them again. This isn't usually that hard, just time consuming, but many applications and games usually have cache files for things like window and menu positioning, last typed keywords, and similar settings, that aren't easy to migrate or syncronize between several VMs. You lose quite a lot of time reconfiguring the GUI of some programs or games that don't have a documented way to export settings, and sometimes you can't make things identical (Ever lost hours moving and resizing windows around so they are optimally sized, don't overlap on borders, etc?). And on many cases, this also adds for redundant static data if I have to install a several GiB game on another VM.
I have been thinking about how to deal with this, which is also much more complex that merely setting up a File Server. Some years ago there was a trend of "portable applications", which are those that have a lot of hacks or wrappers to use local folders for everything, without touching any of the Windows ones. These has been popular some years ago since people liked to load their USB Flash Drives with applications or games ready to use, including cache and config, so they would look and act pretty much identical regardless of the computer (Or in our case, VM) that they executed them from. Using portable applications stored in the File Server is a rather excellent idea, but the part that is a pain in the arse, is that I will need to either download already portabilized software that is of dubious origin, or learn how to adapt them myself. It does look like the "portable application" concept may play extremely well with VMs, since it would ease testing multiple OSes if I don't have to reinstall and reconfigure everything in every one of them.
Another alternative are NTFS hardlinks, but on Wndows platform they're like an obscure feature. It would be possible to copypaste the program folders to the File Server and hardlink them, but this doesn't cover the Windows Registry part.


RAMDISK WORHT IT?
3 years ago, before the Hynix plant fire, RAM was dirt cheap and I decided to purchase 32 GiB. Had I know that I would end up purchasing a Xeon and a true Workstation Motherboard just to make sure that VT-d worked, I would have purchased ECC RAM and sealed the deal with a true Workstation class system. Regardless, the point I usually assign my host has 2 GiB of RAM, my main VM 20, and I leave 12 GiB to spare, since I do not like that in order to launch other VMs (Which are usually 3/4 GiB), I'm forced to reboot the main VM in order to downsize the assigned RAM, else I would have all remaining 30 GB in it. Since I have tons of unused RAM, it means that I could make a RAMDisk big enough for several uses - possible video recording (The HD was too slow and dropped frames, the new SSD may not have that issue), copying a game there to load it at ultra fast speeds for bragging rights (With MMORPG is a rather decent idea), or even a very fast OS install in a volatile enviroment (Though since W7/8/10 takes like 20 GiB, I would have to downsize the host for those). Some people also uses it for temp files or browser cache with very good results, too.

On a native system, if you want to make a RAMDisk, you need a third party Software that reserves RAM from the OS. However, if its a VM what we're talking about, you can do it from the host itself, then assign it as a virtual drive - I don't know of the particularities of tmpfs, but looks much more flexible. Most of the questions would be if a host side RAMDisk would perform faster, and how it should be formatted. Host side RAMDisk has a really critical advantage, since you could restart the VM without losing the RAMDisk (Weren't it for ocassional brownouts, blackouts, and Video Card warm reset issues, I would never reboot my system). I think that its mostly for bragging rights, but some people may figure out if there are better reasons to use one.


VM LAG DURING FULL LOAD
My other family members are using rather slow computers by today standards (Next best one is my old 2009 Athlon II X4 620 with 4 GB RAM), and since sometimes they want to do heavy stuff like rendering, its obviously that I thought about how I can put my powerful Workstation to use. I decided to make a VM with the applications that they use and enable SPICE, and install SPICE clients on the other machines so they can access it. The idea was rather workable, but I didn't got the intended results...

First, since I couldn't fix the mouse lag issue (VM was WXP x64, wasn't able to install SPICE guest tools to test the mouse client mode, and didn't tried uncompressing the installer and manually installing components) the VM was rather a last resort measure for doing the heavy lifting since it was unconfortable to use for anything else. Then, when that VM was put to use, my main VM lagged like hell. During the lag, attemping to use the main VM even for web browsing usually resulted in the virtual Mouse to somehow stop functioning if the VM had lag for more than a few seconds (Like a timeout), and I never find a way to soft reset the Mouse from inside the VM, having to reboot it (Maybe W10 behaves differently, but that's what WXP x64 did). Details of this are here:
http://lists.xenproject.org/archives/html/xen-users/2015-06/msg00058.html
The config at that time was that the host sees all 4 physical Cores (I had Hyper Threading disabled at that moment) with 2 GB RAM, my main VM had 4 vCPUs pinned to each of the 4 physical Cores with 20 or 24 GB RAM, and the guest VM had just 2 vCPUs pinned to Cores 2 and 3. When the guest VM was doing renders or artificial loads like OCCT CPU test, there was an infernal lag on my main VM that made it totally unusable, mainly due to the Mouse issue. Suggestions that I receive (But didn't took the time to apply, since after a few days no one used the guest VM anymore, thus had no urgent need to fix this) were that the issue may be related to QEMU device emulation, so I should let the host have an exclusive physical Core for itself, same with the VMs has I shouldn't share physical Cores, either.

The issue here that even if the proposed solution fixed the load issue, I would have ridiculously limited VMs. Since most OSes usually don't support things like CPU hotplug, it means that in order to give exclusive resources to a VM, I must make each VM config so it just take the bare minimum resources, like one exclusive pinned vCPU, so its performance doesn't degrades when I open another one that goes Full Load. Doing this also means that I will have to reboot often to resize resources on a VM, or waste tons of unassigned resources, so its unviable. And giving my main VM more than 4 vCPUs would force me to reboot it to remove resources everytime that I'm intending to open another one, and reboot it again with max resources when its alone. If you add in the Radeon warm reset issue, this is totally unpractical, and would look like if it was a Dual Boot instead of a VM. I would prefer that at the VMM level you have a sort of QoS (Quality of Service) to guarantee that even if VMs are sharing loaded CPU Cores, VM is still usable. Otherwise I can't give my main VM all CPU resources, I'm forced to have some CPU leftovers (Like with RAM) that will be barely used just to not have to reboot it.

Worthy of noting is that I recently upgraded my Processor cooling and enabled Hyper Threading and Turbo Boost, which I previously had disabled, since it increased temperatures around 10-15°C and even reached thermal throttling point a few times, with the Motherboard speaker going like crazy because overheat. Since then, I think that even if my main VM vCPU are pinned to the 8 logical Processors and have no other VM running, I had rather common occurences of temporal lag or stuttering that makes that the Mouse cursor to stop responding for around a second if in Windows desktop, very similar to the previous scenario, but not as intense (Good thing is that at least W10 never dropped it, or maybe it never reached timeout). On FPS games during the lag, the cursor seems to fix itself to the place that I was aiming, and if I stop moving the Mouse, camera turns by itself. It takes a while forcefully moving the mouse to the sides before retaking full control. I suppose this happens more often with HT since either Windows or Xen are ocassionally using both a Physical and Logical Core when it should aim for maxing Physical Cores first. However, I didn't tested enough to figure out what makes for that ocassional lag, but its too similar to what happens when another VM is heavily loaded. QEMU supposedly allowed you to craft your own Sockets/Cores/Thread structure, and in other Threads of this Mailing List I saw that some people greatly reduced their DPC Latency by playing with that, so it may be worth a try.


USING QEMU WITH -NODEFAULTS
Its very probable that instead of using virsh/libvirt, I use standalone QEMU and its own commands, since that is what I do with Xen. The difference is that while Xen works with VM Config Files which I just have to tell the Xen toolstack to use to get all the reelevant VM creation parameters, QEMU seems to need all that ton of parameters at the command line. So at least for creating VMs with QEMU, I will have to heavily rely on scripts, which I believe that properly formatted will be easier to read than libvirt bloated XML. Bad thing is that simple scripting on Linux is harder that MS-DOS style BAT files, since you also have to make the text file executable, and you need to append ./ to execute it even if you're on the same folder, but well, that's the way that it is.
Since I'm going the hard way, I could do it a bit more interesing. Some guy in this Mailing List mentioned recently that instead of using default FX440 or Q35, which as far that I read merely enables a set of predefined emulated PCI Devices at specific address, you could make your own tree by using -nodefaults. Besides the possible "for fun" factor, or allowing for very specific optimizations, I believe that it may be worth trying to make a working tree that does NOT use the 00:02.0 address (Which both FX440 and Q35 uses by default), since based on patches that I saw for Xen, that address seems reserved for Intel IGP Passthrough, which I think VFIO currently fails to do and could be related to that.

Anyways, I failed to find details or working examples of hand-made trees, or what FX440 and Q35 are equivalent to, and how much they can be reduced if you remove unused emulated or legacy stuff, yet still have a complete system worth of using. If anyone has at hand links about the possibilites that -nodefaults brings, drop them.


SETTING A TFTP SERVER FOR PXE BOOT
This one is closely related to the File Server for shared storage, and maybe that specific VM could also do this job (Or else, the host itself?). A TFTP Server provides the files for another system to boot from if it supports PXE Boot, which both VMs and computers from like a decade ago do. For VMs, I don't see any real advantage - I can specify a CDROM with a mounted ISO file when I start the VM, which is much easier that setting up a TFTP Server. However, for other computers in my network, being able to install without needing to prepare a USB Flash Drive looks like a rather interesing proposal. I don't know if WiFi cards have PXE Boot ROMs, but I hope they do, since the times that I needed this alternative booting way the most, was with capricious Notebooks with broken Hardware or unfriendly BIOSes.

There are two things that doesn't convince me about setting a TFTP Server: I can't find convincing info that I can merely copypaste ISOs to a shared folder and that the Server application takes care of presenting them to a PXE cliient in a menu. Instead, everything points that I must install and configure a Boot Loader to load the ISOs in a partition, or instead of easily manageable ISOs, that I have to uncompress them, and if there are more that one OS installer it makes for a rather complex folder structure, etc. Doesn't look very practical to use.
The second thing is where the booteable ISOs should be at. In order to take advantage of the TFTP Server, that VM should have them. Unless I make redundant copies, this means that the host would lose direct access to the ISOs, and if I just leave everything in the hands of the File Server, if for some reasons it stop working, i can't use ISOs for installing OSes in other VMs or booting from a rescue LiveCD (That isn't a redundant duplicated copy on the host). May not be such a big of an issue if I can mount the File Server VM partitions from within the host, but still, I have to think about this, too.


ISOLATE A INTERNET-FACING VM FROM THE LOCAL NETWORK
Besides having tested how it feels to share a VM with family members, at some point I was also interesed in letting a friend remotely access a VM in my system, as if I was a cloud provider. A VM has fairly better remote access capabilities than a native OS that needs third party applications like TeamViewer to be viewed from outside, so using VMMs that usually provides remote capabilities built in looks like a clear winner to me. However, since its a friend from the Internet and not a family member that is already on my home network, I wanted to do it as secure as possible. This means trying to isolate the VM from both the host computer and the home network, so possible Internet remote users of that VM shouldn't see the other computers of my home network, nor their Shared Folders. As far that I was able to research, this requires far more networking skills that someone that is used to setup wizards like me has.

The topology of my home network is a single hybrid ADSL Modem+WiFi Router (With a single dynamic public IP) with 3 wired computers attached to it (Including mine), and a Notebook via WiFi. The Router sees as DHCP Clients the Notebook, the two normal computers, and from mine, both the host and the individual VMs (It sees the MAC Address of both the real NIC and also the VMs emulated one), and assigns local IPs for each of them. By default, Xen likes to make a bridge that is used by the host and the VMs to share the NIC and thus Internet connection, not sure how standalone QEMU does, but chances are that it is around the same.
What I think I need to figure out is how to isolate the VM at two key points: At the Router, and at my computer virtual bridge, so it can't directly see the host or the other VMs. It still needs to have a working Internet connection for obvious reasons. So far, I think that what I want to do is called a VLAN, so both the Linux host and the Router need to be aware than a specific MAC Address should be in a different, isolated VLAN. I don't know if I need a fancy Router to do that, since my ISP provided Router seems to have some options to configure VLAN but I don't know how to use it. I suppose that I also need to do Port Forwarding so the VM can be accesed from outside, but that's the easier part of it.



Congratulations, you reached the end of my mail. This pretty much sums up all my two years of experience, and all the things that I think that needs to be improved before a system centered around a Hypervisor or VMM can overtake native systems in ease of use. 		 	   		  




More information about the vfio-users mailing list