[vfio-users] Brutal DPC Latency - how is yours? check it please and report back

Quentin Deldycke quentindeldycke at gmail.com
Mon Jan 11 15:08:38 UTC 2016


Hello,

I use intel cpu (i7 4790k). But yes i have an R9 290 as gpu.
I try to offload to core 0, so in fact i can keep threads 0 and 4 for linux
The rest of your resume is right.


I use the same program for dpc check. Is also available: latencymon.
But i find dpclat more interesting.
http://www.resplendence.com/latencymon

Script for moving all threads to 0,4:
https://github.com/qdel/scripts/blob/master/vfio/shieldbuild

XML file:
https://github.com/qdel/scripts/blob/master/vfio/win10.xml

Kernel command line:
intel_iommu=on iommu=pt vfio_iommu_type1.allow_unsafe_interrupts=1
kvm.ignore_msrs=1 drm.rnodes=1 i915.modeset=1
nohz_full=1,2,3,4,5,6,7 rcu_nocbs=1,2,3,4,5,6,7
default_hugepagesz=1G hugepagesz=1G hugepages=12


Notes about my setup:
* I have 3 monitors. All are connected to intel
  2 of them have a input in AMD. With xrandr
  i can disable these screens and they switch source
  (at least one, the other is buggy, most times i need to push the source
button).
* I pass also a NVMe drive (this thing is actually BRUTAL!!!)
  - I can boot the same drive on native!
* I pass my second network card
* I pass one of my sata controller (i have ntfs drives there)
* I pass usb devices and not the whole controller
  - with the little udev script i plug new devices to the vm if this one is
started
* Sound is output by HDMI and back into line in of pc. I can use the line
in control to
  modify the sound of the whole vm. Work perfectly actually.
* Best thing for this is dual monitor + synergy :)


--
Deldycke Quentin


On 11 January 2016 at 15:06, Milos Kaurin <milos.kaurin at gmail.com> wrote:

> Hello,
>
> Yes, I have a corei7.
>
> I have to admit that seeing Quentin's e-mail was the first I found out
> about DPC latency. I'm taking a strictly empirical approach for now,
> but I'd like to dive deeper into this, at least to provide a reference
> point for you guys.
> Reason for this being is that even though I'm familiar with Linux, I'm
> don't have low-level familiarity as you guys have (other than
> conceptual). I'm more than willing to learn given the opportunity,
> though.
>
> Quentin:
> From what I understand about your use:
> * You have an AMD CPU
> * In your kernel parameters, you are trying to offload your
> scheduling-clock interrupts to only thread(core?) 0.
> * Your script sets kernel memory management, future tasks and current
> tasks to be run at thread 0
> * Valley bench seems to be most sensitive to DPC latency issues (as
> well as "Heroes of the storm")
> * Pinning only 3 cores to the VM gives you best results, but seeing
> that newer games take advantage of multiple cores, you'd like to have
> an option to use more cores for winVirt
>
> What I'd like from you:
> * Can you provide me with the optimal (3core -> VM) settings,
> including kernel parameters, your updated script and the XML of your
> virt in this mode of use.
> * Can you provide me with a method how to keep track of DPC latency? I
> found this: http://www.thesycon.de/deu/latency_check.shtml , but I'd
> like us to use the same method.
>
> Why I'm asking all of this:
> Just ran valley (HD extreme). These are the results:
>
>  *Bare-metal:
> FPS: 48.7
> Score: 2036
> Min FPS: 23.4
> Max FPS:90.6
>
> * hugetables, nopin, 1x4x2, host-passthrough:
> FPS: 47.9
> Score: 2005
> Min FPS: 19.7
> Max FPS: 91.5
>
> The score is ~1.5 % worse in the virt.
> The min FPS difference (which looks significant) might be negligible
> because I'm running Firefox in the host with a bunch of tabs open
> (idle, though)
>
> I have also been playing "Rocket League" in the virt which is a very
> twitchy game, and I play it on an experienced level. I did not find
> any problems with playing the game like this.
>
> My current XML: https://gist.github.com/Kaurin/0b6726e8a94084bd0b64
> PCI devices passed through: nvidia+HDMI audio, onboard sound, onboard
> XHCI USB controller
>
> Notes about my setup:
> * Both virt and host are hooked up to the same monitor (host-VGA / virt -
> DVI).
> * I also don't have any additional USB controllers, which means that
> when I turn on the virt, I lose my usb(mouse,keyboard) on the host
> * Same goes for sound: when I turn on the virt, I lose sound in the host
> * I just flip the monitor input and I'm good to go.
> * I have plans to set up new hardware so I can use both host/virt at
> the same time
>
> Let me know if my further input would be useful.
>
> Regards,
> Milos
>
>
>
> On Mon, Jan 11, 2016 at 9:19 AM, Quentin Deldycke
> <quentindeldycke at gmail.com> wrote:
> > In fact, some games react quite well to this latency. Fallout for example
> > doesn't show much difference between host - vm with brutal DPC and vm
> with
> > "good dpc".
> >
> > I tested 3 modes:
> >
> > - all 8 core to vm without pinning: brutal dpc, did not tried to play
> games
> > on it. Only ungine valley => 2600 points
> > - 6 cores pinned to the vm + emulator on core 0,1: correct latency. Most
> > games work flawlessly (bf4 / battlefront / diablo III) but some are
> > catastrophic: Heroes of the storm. valley => 2700
> > - 3 cores pinned to vm: Perfect latency, all games work ok. But i am
> affraid
> > 3 cores are a bit 'not enough" for incoming games. valley => 3100 points
> >
> > I think that valley is  a good benchmark. It is free and small. It seems
> to
> > be affected by this latency problem like most games.
> >
> >
> >
> >
> > --
> > Deldycke Quentin
> >
> >
> > On 11 January 2016 at 09:59, rndbit <rndbit at sysret.net> wrote:
> >>
> >> Tried Milos' config too - DPC latency got worse. I use AMD cpu though so
> >> its hardly comparable.
> >> One thing to note is that both VM and bare metal (same OS) score around
> 5k
> >> points in 3dmark fire strike test (VM 300 points less). Sounds not too
> bad
> >> but in reality bf4 is pretty much unplayable in VM due to bad
> performance
> >> and sound glitches while playing it on bare metal is just fine. Again
> DPC
> >> latency on bare metal even under load is ok - occasional spike here and
> >> there but mostly its within norm. Any kind of load on VM makes DPC go
> nuts
> >> and performance is terrible. I even tried isolcpus=4,5,6,7 and binding
> vm to
> >> those free cores - its all the same.
> >>
> >> Interesting observation is that i used to play titanfall without a hitch
> >> in VM some time in the past, 3.10 kernel or so (no patches). When i get
> free
> >> moment ill try downgrading kernel, maybe problem is there.
> >>
> >>
> >> On 2016.01.11 10:39, Quentin Deldycke wrote:
> >>
> >> Also, i juste saw something:
> >>
> >> You use ultra (4k?) settings on a 770gtx. This is too heavy for it. You
> >> have less than 10fps. So in fact if you loose let's say 10% of
> performance,
> >> you will barely see it.
> >>
> >> What we search is a very high reponse time. Could you please compare
> your
> >> system with a less heavy benchmark. It is easier to see the difference
> at
> >> ~50-70 fps.
> >>
> >> In my case, this configuration work. But my fps fluctuate quite a lot.
> If
> >> you are a bit a serious gamer, this falls are not an option during game
> :)
> >>
> >> --
> >> Deldycke Quentin
> >>
> >>
> >> On 11 January 2016 at 08:54, Quentin Deldycke <
> quentindeldycke at gmail.com>
> >> wrote:
> >>>
> >>> Using this mode,
> >>>
> >>> DPC Latency is hugely buggy using this mode.
> >>>
> >>> My fps are also moving on an apocaliptic way: from 80 to 45 fps without
> >>> moving on ungine valley.
> >>>
> >>> Do you have anything working on your linux? (i have plasma doing
> nothing
> >>> on another screen)
> >>>
> >>> Ungine heaven went back to 2600 points from 3100
> >>> Cinebench r15: single core 124
> >>>
> >>>
> >>> Could you please send your whole xml file, qemu version and kernel
> config
> >>> / boot?
> >>>
> >>> I will try to get 3dmark and verify host / virtual comparison
> >>>
> >>> --
> >>> Deldycke Quentin
> >>>
> >>>
> >>> On 9 January 2016 at 20:24, Milos Kaurin <milos.kaurin at gmail.com>
> wrote:
> >>>>
> >>>> My details:
> >>>> Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
> >>>> 32GB total ram
> >>>> hugetables at 16x1GB for the guest (didn't have much to do with 3dmark
> >>>> results)
> >>>>
> >>>> I have had the best performance with:
> >>>>
> >>>>   <vcpu placement='static'>8</vcpu>
> >>>>   <cpu mode='custom' match='exact'>
> >>>>     <model fallback='allow'>host-passthrough</model>
> >>>>     <topology sockets='1' cores='4' threads='2'/>
> >>>>   </cpu>
> >>>>
> >>>> No CPU pinning on either guest or host
> >>>>
> >>>> Benchmark example (Bare metal Win10 vs Fedora Guest Win10)
> >>>> http://www.3dmark.com/compare/fs/7076732/fs/7076627#
> >>>>
> >>>>
> >>>> Could you try my settings and report back?
> >>>>
> >>>> On Sat, Jan 9, 2016 at 3:14 PM, Quentin Deldycke
> >>>> <quentindeldycke at gmail.com> wrote:
> >>>> > I use virsh:
> >>>> >
> >>>> > ===SNIP===
> >>>> >   <vcpu placement='static'>3</vcpu>
> >>>> >   <cputune>
> >>>> >     <vcpupin vcpu='0' cpuset='1'/>
> >>>> >     <vcpupin vcpu='1' cpuset='2'/>
> >>>> >     <vcpupin vcpu='2' cpuset='3'/>
> >>>> >     <emulatorpin cpuset='6-7'/>
> >>>> >   </cputune>
> >>>> > ===SNAP===
> >>>> >
> >>>> > I have a prepare script running:
> >>>> >
> >>>> > ===SNIP===
> >>>> > sudo mkdir /cpuset
> >>>> > sudo mount -t cpuset none /cpuset/
> >>>> > cd /cpuset
> >>>> > echo 0 | sudo tee -a cpuset.cpu_exclusive
> >>>> > echo 0 | sudo tee -a cpuset.mem_exclusive
> >>>> >
> >>>> > sudo mkdir sys
> >>>> > echo 'Building shield for core system... threads 0 and 4, and we
> place
> >>>> > all
> >>>> > runnning tasks there'
> >>>> > /bin/echo 0,4 | sudo tee -a sys/cpuset.cpus
> >>>> > /bin/echo 0 | sudo tee -a sys/cpuset.mems
> >>>> > /bin/echo 0 | sudo tee -a sys/cpuset.cpu_exclusive
> >>>> > /bin/echo 0 | sudo tee -a sys/cpuset.mem_exclusive
> >>>> > for T in `cat tasks`; do sudo bash -c "/bin/echo $T >
> >>>> > sys/tasks">/dev/null
> >>>> > 2>&1 ; done
> >>>> > cd -
> >>>> > ===SNAP===
> >>>> >
> >>>> > Note that i use this command line for the kernel
> >>>> > nohz_full=1,2,3,4,5,6,7 rcu_nocbs=1,2,3,4,5,6,7
> default_hugepagesz=1G
> >>>> > hugepagesz=1G hugepages=12
> >>>> >
> >>>> >
> >>>> > --
> >>>> > Deldycke Quentin
> >>>> >
> >>>> >
> >>>> > On 9 January 2016 at 15:40, rndbit <rndbit at sysret.net> wrote:
> >>>> >>
> >>>> >> Mind posting actual commands how you achieved this?
> >>>> >>
> >>>> >> All im doing now is this:
> >>>> >>
> >>>> >> cset set -c 0-3 system
> >>>> >> cset proc -m -f root -t system -k
> >>>> >>
> >>>> >>   <vcpu placement='static'>4</vcpu>
> >>>> >>   <cputune>
> >>>> >>     <vcpupin vcpu='0' cpuset='4'/>
> >>>> >>     <vcpupin vcpu='1' cpuset='5'/>
> >>>> >>     <vcpupin vcpu='2' cpuset='6'/>
> >>>> >>     <vcpupin vcpu='3' cpuset='7'/>
> >>>> >>     <emulatorpin cpuset='0-3'/>
> >>>> >>   </cputune>
> >>>> >>
> >>>> >> Basically this puts most of threads to 0-3 cores including emulator
> >>>> >> threads. Some threads cant be moved though so they remain on 4-7
> >>>> >> cores. VM
> >>>> >> is given 4-7 cores. It works better but there is still much to be
> >>>> >> desired.
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> On 2016.01.09 15:59, Quentin Deldycke wrote:
> >>>> >>
> >>>> >> Hello,
> >>>> >>
> >>>> >> Using cpuset, i was using the vm with:
> >>>> >>
> >>>> >> Core 0: threads 0 & 4: linux + emulator pin
> >>>> >> Core 1,2,3: threads 1,2,3,5,6,7: windows
> >>>> >>
> >>>> >> I tested with:
> >>>> >> Core 0: threads 0 & 4: linux
> >>>> >> Core 1,2,3: threads 1,2,3: windows
> >>>> >> Core 1,2,3: threads 5,6,7: emulator
> >>>> >>
> >>>> >> The difference between both is huge (DPC latency is mush more
> >>>> >> stable):
> >>>> >> Performance on single core went up to 50% (cinebench ratio by core
> >>>> >> from
> >>>> >> 100 to 150 points)
> >>>> >> Performance on gpu went up to 20% (cinebench from 80fps to 100+)
> >>>> >> Performance on "heroes of the storm" went from 20~30 fps to stable
> 60
> >>>> >> (and
> >>>> >> much time more than 100)
> >>>> >>
> >>>> >> (performance of Unigine Heaven went from 2700 points to 3100
> points)
> >>>> >>
> >>>> >> The only sad thing is that i have the 3 idle threads which are
> barely
> >>>> >> used... Is there any way to put them back to windows?
> >>>> >>
> >>>> >> --
> >>>> >> Deldycke Quentin
> >>>> >>
> >>>> >>
> >>>> >> On 29 December 2015 at 17:38, Michael Bauer <michael at m-bauer.org>
> >>>> >> wrote:
> >>>> >>>
> >>>> >>> I noticed that attaching a DVD-Drive from the host leads to HUGE
> >>>> >>> delays.
> >>>> >>> I had attached my /dev/sr0 to the guest and even without a DVD in
> >>>> >>> the drive
> >>>> >>> this was causing huge lag about once per second.
> >>>> >>>
> >>>> >>> Best regards
> >>>> >>> Michael
> >>>> >>>
> >>>> >>>
> >>>> >>> Am 28.12.2015 um 19:30 schrieb rndbit:
> >>>> >>>
> >>>> >>> 4000μs-16000μs here, its terrible.
> >>>> >>> Tried whats said on
> >>>> >>> https://lime-technology.com/forum/index.php?topic=43126.15
> >>>> >>> Its a bit better with this:
> >>>> >>>
> >>>> >>>   <vcpu placement='static'>4</vcpu>
> >>>> >>>   <cputune>
> >>>> >>>     <vcpupin vcpu='0' cpuset='4'/>
> >>>> >>>     <vcpupin vcpu='1' cpuset='5'/>
> >>>> >>>     <vcpupin vcpu='2' cpuset='6'/>
> >>>> >>>     <vcpupin vcpu='3' cpuset='7'/>
> >>>> >>>     <emulatorpin cpuset='0-3'/>
> >>>> >>>   </cputune>
> >>>> >>>
> >>>> >>> I tried isolcpus but it did not yield visible benefits. ndis.sys
> is
> >>>> >>> big
> >>>> >>> offender here but i dont really understand why. Removing network
> >>>> >>> interface
> >>>> >>> from VM makes usbport.sys take over as biggest offender. All this
> >>>> >>> happens
> >>>> >>> with performance governor of all cpu cores:
> >>>> >>>
> >>>> >>> echo performance | tee
> >>>> >>> /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor >/dev/null
> >>>> >>>
> >>>> >>> Cores remain clocked at 4k mhz. I dont know what else i could try.
> >>>> >>> Does
> >>>> >>> anyone have any ideas..?
> >>>> >>>
> >>>> >>> On 2015.10.29 08:03, Eddie Yen wrote:
> >>>> >>>
> >>>> >>> I tested again with VM reboot, I found that this time is about
> >>>> >>> 1000~1500μs.
> >>>> >>> Also I found that it easily get high while hard drive is loading,
> >>>> >>> but
> >>>> >>> only few times.
> >>>> >>>
> >>>> >>> Which specs you're using? Maybe it depends on CPU or patches.
> >>>> >>>
> >>>> >>> 2015-10-29 13:44 GMT+08:00 Blank Field <ihatethisfield at gmail.com
> >:
> >>>> >>>>
> >>>> >>>> If i understand it right, this software has a fixed latency error
> >>>> >>>> of 1
> >>>> >>>> ms(1000us) in windows 8-10 due to different kernel timer
> >>>> >>>> implementation. So
> >>>> >>>> i guess your latency is very good.
> >>>> >>>>
> >>>> >>>> On Oct 29, 2015 8:40 AM, "Eddie Yen" <missile0407 at gmail.com>
> wrote:
> >>>> >>>>>
> >>>> >>>>> Thanks for information! And sorry I don'r read carefully at
> >>>> >>>>> beginning
> >>>> >>>>> message.
> >>>> >>>>>
> >>>> >>>>> For my result, I got about 1000μs below and only few times got
> >>>> >>>>> 1000μs
> >>>> >>>>> above when idling.
> >>>> >>>>>
> >>>> >>>>> I'm using 4820K and used 4 threads to VM, also  I set these 4
> >>>> >>>>> threads
> >>>> >>>>> as 4 cores in VM settings.
> >>>> >>>>> The OS is Windows 10.
> >>>> >>>>>
> >>>> >>>>> 2015-10-29 13:21 GMT+08:00 Blank Field <
> ihatethisfield at gmail.com>:
> >>>> >>>>>>
> >>>> >>>>>> I think they're using this:
> >>>> >>>>>> www.thesycon.de/deu/latency_check.shtml
> >>>> >>>>>>
> >>>> >>>>>> On Oct 29, 2015 6:11 AM, "Eddie Yen" <missile0407 at gmail.com>
> >>>> >>>>>> wrote:
> >>>> >>>>>>>
> >>>> >>>>>>> Sorry, but how to check DPC Latency?
> >>>> >>>>>>>
> >>>> >>>>>>> 2015-10-29 10:08 GMT+08:00 Nick Sukharev
> >>>> >>>>>>> <nicksukharev at gmail.com>:
> >>>> >>>>>>>>
> >>>> >>>>>>>> I just checked on W7 and I get 3000μs-4000μs one one of the
> >>>> >>>>>>>> guests
> >>>> >>>>>>>> when 3 guests are running.
> >>>> >>>>>>>>
> >>>> >>>>>>>> On Wed, Oct 28, 2015 at 4:52 AM, Sergey Vlasov
> >>>> >>>>>>>> <sergey at vlasov.me>
> >>>> >>>>>>>> wrote:
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> On 27 October 2015 at 18:38, LordZiru <lordziru at gmail.com>
> >>>> >>>>>>>>> wrote:
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>> I have brutal DPC Latency on qemu, no matter if using
> >>>> >>>>>>>>>> pci-assign
> >>>> >>>>>>>>>> or vfio-pci or without any passthrought,
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>> my DPC Latency is like:
> >>>> >>>>>>>>>> 10000,500,8000,6000,800,300,12000,9000,700,2000,9000
> >>>> >>>>>>>>>> and on native windows 7 is like:
> >>>> >>>>>>>>>> 20,30,20,50,20,30,20,20,30
> >>>> >>>>>>>>>
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> In Windows 10 guest I constantly have red bars around 3000μs
> >>>> >>>>>>>>> (microseconds), spiking sometimes up to 10000μs.
> >>>> >>>>>>>>>
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>> I don't know how to fix it.
> >>>> >>>>>>>>>> this matter for me because i are using USB Sound Card for
> my
> >>>> >>>>>>>>>> VMs,
> >>>> >>>>>>>>>> and i get sound drop-outs every 0-4 secounds
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> That bugs me a lot too. I also use an external USB card and
> my
> >>>> >>>>>>>>> DAW
> >>>> >>>>>>>>> periodically drops out :(
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> I haven't tried CPU pinning yet though. And perhaps I should
> >>>> >>>>>>>>> try
> >>>> >>>>>>>>> Windows 7.
> >>>> >>>>>>>>>
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> _______________________________________________
> >>>> >>>>>>>>> vfio-users mailing list
> >>>> >>>>>>>>> vfio-users at redhat.com
> >>>> >>>>>>>>> https://www.redhat.com/mailman/listinfo/vfio-users
> >>>> >>>>>>>>>
> >>>> >>>>>>>>
> >>>> >>>>>>>>
> >>>> >>>>>>>> _______________________________________________
> >>>> >>>>>>>> vfio-users mailing list
> >>>> >>>>>>>> vfio-users at redhat.com
> >>>> >>>>>>>> https://www.redhat.com/mailman/listinfo/vfio-users
> >>>> >>>>>>>>
> >>>> >>>>>>>
> >>>> >>>>>>>
> >>>> >>>>>>> _______________________________________________
> >>>> >>>>>>> vfio-users mailing list
> >>>> >>>>>>> vfio-users at redhat.com
> >>>> >>>>>>> https://www.redhat.com/mailman/listinfo/vfio-users
> >>>> >>>>>>>
> >>>> >>>>>
> >>>> >>>
> >>>> >>>
> >>>> >>>
> >>>> >>> _______________________________________________
> >>>> >>> vfio-users mailing list
> >>>> >>> vfio-users at redhat.com
> >>>> >>> https://www.redhat.com/mailman/listinfo/vfio-users
> >>>> >>>
> >>>> >>>
> >>>> >>>
> >>>> >>>
> >>>> >>> _______________________________________________
> >>>> >>> vfio-users mailing list
> >>>> >>> vfio-users at redhat.com
> >>>> >>> https://www.redhat.com/mailman/listinfo/vfio-users
> >>>> >>>
> >>>> >>>
> >>>> >>>
> >>>> >>> _______________________________________________
> >>>> >>> vfio-users mailing list
> >>>> >>> vfio-users at redhat.com
> >>>> >>> https://www.redhat.com/mailman/listinfo/vfio-users
> >>>> >>>
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> _______________________________________________
> >>>> >> vfio-users mailing list
> >>>> >> vfio-users at redhat.com
> >>>> >> https://www.redhat.com/mailman/listinfo/vfio-users
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> _______________________________________________
> >>>> >> vfio-users mailing list
> >>>> >> vfio-users at redhat.com
> >>>> >> https://www.redhat.com/mailman/listinfo/vfio-users
> >>>> >>
> >>>> >
> >>>> >
> >>>> > _______________________________________________
> >>>> > vfio-users mailing list
> >>>> > vfio-users at redhat.com
> >>>> > https://www.redhat.com/mailman/listinfo/vfio-users
> >>>> >
> >>>
> >>>
> >>
> >>
> >>
> >> _______________________________________________
> >> vfio-users mailing list
> >> vfio-users at redhat.com
> >> https://www.redhat.com/mailman/listinfo/vfio-users
> >>
> >>
> >>
> >> _______________________________________________
> >> vfio-users mailing list
> >> vfio-users at redhat.com
> >> https://www.redhat.com/mailman/listinfo/vfio-users
> >>
> >
> >
> > _______________________________________________
> > vfio-users mailing list
> > vfio-users at redhat.com
> > https://www.redhat.com/mailman/listinfo/vfio-users
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/vfio-users/attachments/20160111/60b230fd/attachment.htm>


More information about the vfio-users mailing list