[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [vfio-users] cpu core pinning with multiple cpus



On Sat, 2015-09-12 at 00:23 +0200, Erik Adler wrote:
> Certain games are giving me terrible frame rates on my GTX 970.
> Generally these games are not very demanding when using bare metal.
> Unigine Valley Benchmark is doing fine at about 87% native speeds.
> Same with some other GPU intensive benchmarks. The games that have bad
> fps seem to be taxing the cpu heavily and having latency issues in
> passthough.
> 
> I am not sure that I have paired my cores correctly. Using Alex's CPU
> latency script I get the following. Since this is a dual CPU system I
> need to keep everything on the same NUMA node.
> There is a definitive pattern but I am not 100% sure that I see the
> correctional with lstopo.

Looks pretty much like it's supposed to afaict.  Take the top row for
example, CPU0 has the best latency to CPUs 0-5 and 12-17.  These are
thread0 and thread1 of the cores on node0.  CPUs 6-11 and 18-23 are on
the remote socket, so the suffer a pretty big hit.  Move down to row 6,
CPU6 on node1 and we see that the latency has flipped, 6-11 and 18-23
are now closer.


>   |  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
> --+------------------------------------------------------------------------
>  0| 10  8  7  8  7  8  4  4  4  4  4  4  6  7  7  7  7  7  4  4  4  4  4  4
>  1|  9 10  8  8  8  8  4  4  4  4  4  4  7  7  8  8  8  8  4  4  4  4  4  4
>  2|  8  8 10  8  8  7  4  4  4  4  5  4  7  8  7  8  8  8  4  4  4  4  4  4
>  3|  8  8  8 10  8  8  4  5  4  4  4  4  8  7  8  7  8  8  4  4  4  4  4  4
>  4|  8  8  7  8 10  7  4  4  4  4  4  4  7  8  8  8  7  8  3  4  4  4  4  4
>  5|  8  8  8  8  8 10  4  4  4  4  4  4  7  8  8  7  8  7  4  4  4  4  4  4
>  6|  4  4  4  4  4  4 10  6  6  6  6  6  4  4  3  4  4  4  5  7  6  6  6  6
>  7|  4  4  4  5  4  4  5 10  7  7  7  7  4  4  4  5  5  4  6  7  7  7  7  7
>  8|  4  5  4  5  4  5  6  7 10  7  7  6  4  4  5  4  4  5  5  6  6  8  8  7
>  9|  5  5  5  4  4  4  6  7  8 10  8  8  5  4  5  5  5  4  6  8  7  7  8  8
> 10|  4  4  4  4  4  3  5  6  6  6 10  6  4  4  4  4  4  4  5  6  6  6  5  6
> 11|  3  3  4  3  4  4  5  5  6  6  6 10  3  3  4  4  4  4  5  6  6  6  6  5
> 12|  7  8  8  8  8  7  4  5  4  4  4  4 10  8  8  8  8  8  4  5  4  4  5  5
> 13|  8  7  8  7  7  7  3  4  4  4  4  4  8 10  8  8  7  7  4  4  4  4  4  4
> 14|  8  8  7  8  8  8  4  4  4  4  4  4  7  8 10  8  7  8  4  5  4  4  4  4
> 15|  8  7  8  6  8  8  4  4  4  4  4  4  7  7  8 10  8  7  4  5  4  4  4  4
> 16|  9  8  9  9  8  9  4  5  5  4  5  5  8  8  9  9 10  9  4  5  5  4  5  5
> 17|  8  7  8  8  8  7  4  4  4  4  4  4  7  8  8  8  7 10  4  4  4  4  4  4
> 18|  4  4  4  4  4  4  5  7  6  7  7  7  4  4  4  4  4  4 10  6  7  6  7  7
> 19|  5  5  4  4  4  4  6  6  7  7  7  6  4  4  5  4  4  4  6 10  8  7  7  7
> 20|  4  5  4  4  4  4  6  8  7  8  8  8  4  4  5  3  4  4  6  8 10  8  8  8
> 21|  4  4  4  4  4  4  6  5  6  5  6  6  4  4  4  4  4  4  5  6  6 10  7  6
> 22|  5  4  4  5  4  5  6  7  7  7  7  8  4  5  5  5  5  5  6  8  8  8 10  8
> 23|  4  4  4  4  4  4  6  7  7  6  7  6  4  4  4  4  4  4  5  7  6  7  7 10
> 
> https://i.imgur.com/PQvT2oR.png

Nice, this makes it even more clear.

> Looking at lstopo (url) I have hopefully mapped out my hardware
> correctly. In numa node “0” I can see my GTX 970 on PCI 10de:13c2 .
> 
> http://i.imgur.com/GBczQvi.png

Yep, you definitely want to use 0-5 and 12-17.

> I am a assuming that if I want to use HT on CPU1 my xml file should
> look like this? Have I done something wrong with how I have pinned
> cores?
> 
> <domain type='kvm'>
>   <name>Windows</name>
>   <uuid>cc52dc82-ce9a-45ff-99e6-a92ab0f42b59</uuid>
>   <memory unit='KiB'>16777216</memory>
>   <currentMemory unit='KiB'>16777216</currentMemory>
>   <vcpu placement='static'>8</vcpu>
>   <cputune>
>     <vcpupin vcpu='0' cpuset='2'/>
>     <vcpupin vcpu='1' cpuset='3'/>
>     <vcpupin vcpu='2' cpuset='4'/>
>     <vcpupin vcpu='3' cpuset='5'/>
>     <vcpupin vcpu='4' cpuset='14'/>
>     <vcpupin vcpu='5' cpuset='15'/>
>     <vcpupin vcpu='6' cpuset='16'/>
>     <vcpupin vcpu='7' cpuset='17'/>
>   </cputune>
>   <os>
>     <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type>
>     <loader type='rom'>/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
>   </os>
>   <features>
>     <acpi/>
>     <apic/>
>     <pae/>
>     <kvm>
>       <hidden state='on'/>
>     </kvm>
>     <vmport state='off'/>
>   </features>
>   <cpu mode='host-passthrough'>
>     <topology sockets='1' cores='4' threads='2'/>
>   </cpu>

Looks ok to me, but see my post from last week:

https://www.redhat.com/archives/vfio-users/2015-September/msg00041.html

You may have better latency without exposing threads to the guest,
reserving the other half of the core to be idle or reserved for running
the emulator.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]