[vfio-users] Brutal DPC Latency - how is yours? check it please and report back

Quentin Deldycke quentindeldycke at gmail.com
Mon Feb 29 10:16:55 UTC 2016


Near as efficient as isolcpus, but can be used dynamically, during run:

Use nohz_full / rcu_nocbs, to offload all rcu of your vm core to your
OS-only cores
Use cgroups, when you start vm, you keep only x core to the OS, when you
shut it down, let the OS have all cores.

If vm is started and you need to have a power boost on linux, just use
"echo $$ | sudo tee /cgroups/cgroup.procs", and you will have all cores for
program run from this shell :)

Linux only: all core, (but cores 1,2,3 are in nohz mode, offloaded by core
0)
Linux + windows: 1 core to linux, 3 core to windows
Need boost on linux: the little command line for this shell


Example of cgroup usage:
https://github.com/qdel/scripts/tree/master/vfio/scripts => shieldbuild /
shieldbreak

Which are called threw qemu hooks:
https://github.com/qdel/scripts/tree/master/vfio/hooks

I do not configure my io, i let qemu manage.


Not one fun behavior:
While idle, i am completely still at ~1000us,
If i run a game, it goes down to a completely still 500us

Example: http://b.qdel.fr/test.png

Sorry for quality, vnc to 4k screen from 1080p all this...


--
Deldycke Quentin


On 29 February 2016 at 10:55, Rokas Kupstys <rokups at zoho.com> wrote:

> Yes currently i am actually booted with vanilla archlinux kernel, no NO_HZ
> and other stuff.
>
> Why does 2 core for the host is unacceptable? You plan to use it making
> hard workloads while gaming?
>
> Problem with isolcpus is that it exempts cores from linux cpu scheduler.
> This means even if VM is offline they will stand idle. While i dont do
> anything on host while gaming i do plenty when not gaming and just throwing
> away 6 cores of already disadvantaged AMD cpu is a real waste.
>
> This config is not good actually.
>
> Well.. It indeed looks bad on paper, however it is the only one that
> yields bearable DPC latency. I tried what you mentioned, various
> combinations. Pinning 0,2,4,6 cores to vm, 1,3 to emulator, 5,7 for io /
> 1,3,5,7 cores to vm, 0,2 to emulator, 4,6 for io / 0,1,2,3 cores to vm, 4,5
> to emulator, 6,7 for io / 4,5,6,7 cores to vm, 0,1 to emulator, 2,3 for io.
> All of them yield terrible latency.
>
> Would be interesting to hear someone who has AMD build, how (if) he solved
> this.
>
>
> On 2016.02.29 11:10, Bronek Kozicki wrote:
>
> Two things you can improve, IMO
>
> * disable NO_HZ
>
> * use isolcpus to dedicate your pinned CPUs to guest only - this
> will also ensure they are not used for guest  IO.
>
> B.
>
> On 29/02/2016 08:45, Rokas Kupstys wrote:
>
>
>
>
> Yesterday i figured out my latency problem. All things listed
> everywhere on internet failed. Last thing i tried was pinning one
> vcpu to two physical cores and it brought latency down. Now i have
> FX-8350 CPU which has shared FPU for each two cores so maybe thats
> why. With just this pinning latency now is most of the time just
> above 1000μs. However under load latency increases. I threw out
> iothreads and emulator pinning and it did not affect much.
> Superior latency could be achieved using isolcpus=2-7, however
> leaving just two cores to host is unacceptable. With that setting
> latency was around 500μs without load. Good part is that
> Battlefield3 no longer lags, although i observed increased loading
> times on textures compared to bare metal. Not so good part is that
> there still is minor sound skipping/cracking since latency is
> spiking up under load. That is very disappointing. I also tried
> performance with two VM cores pinned to 4 host cores - bf3 lagged
> enough to be unplayable. 3 vm cores pinned to 6 host cores was
> already playable but sound was still cracking. I noticed little
> difference between that and 4 vm cores pinned to 8 host cores. Be
> nice if sound could be cleaned up. If anyone have any ideas im all
> ears. Libvirt xml i use now:
>
>
>
>   <vcpu
> placement='static'>4</vcpu>
>
>   <cputune>
>
>     <vcpupin vcpu='0' cpuset='0-1'/>
>
>     <vcpupin vcpu='1' cpuset='2-3'/>
>
>     <vcpupin vcpu='2' cpuset='4-5'/>
>
>     <vcpupin vcpu='3' cpuset='6-7'/>
>
>   </cputune>
>
>   <features>
>
>     <acpi/>
>
>     <apic/>
>
>     <pae/>
>
>     <hap/>
>
>     <viridian/>
>
>     <hyperv>
>
>       <relaxed state='on'/>
>
>       <vapic state='on'/>
>
>       <spinlocks state='on' retries='8191'/>
>
>     </hyperv>
>
>     <kvm>
>
>       <hidden state='on'/>
>
>     </kvm>
>
>     <pvspinlock state='on'/>
>
>   </features>
>
>   <cpu mode='host-passthrough'>
>
>     <topology sockets='1' cores='4' threads='1'/>
>
>   </cpu>
>
>   <clock offset='utc'>
>
>     <timer name='rtc' tickpolicy='catchup'/>
>
>     <timer name='pit' tickpolicy='delay'/>
>
>     <timer name='hpet' present='no'/>
>
>     <timer name='hypervclock' present='yes'/>
>
>   </clock>
>
>
>
>
> Kernel configs
>
> CONFIG_NO_HZ_FULL=y
>
> CONFIG_RCU_NOCB_CPU_ALL=y
>
> CONFIG_HZ_1000=y
>
> CONFIG_HZ=1000
>
>
> I am not convinced 1000 hz tickrate is needed. Default one (300)
> seems to perform equally as well from looking at latency charts.
> Did not get chance to test it with bf3 yet however.
>
>
>
>
>
> On 2016.01.12 11:12, thibaut noah
> wrote:
>
>
>
>
>
>
>
> [cut]
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> vfio-users mailing list
> vfio-users at redhat.com
> https://www.redhat.com/mailman/listinfo/vfio-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/vfio-users/attachments/20160229/684a345b/attachment.htm>


More information about the vfio-users mailing list