[libvirt] [RFC PATCH] NUMA tuning support

Osier Yang jyang at redhat.com
Fri May 6 03:48:10 UTC 2011


于 2011年05月06日 04:43, Bill Gray 写道:
>
> Thanks for the feedback Lee!
>
> One reason to use "membind" instead of "preferred" is that one can
> prefer only a single node. For large guests, you can specify multiple
> nodes with "membind". I think "preferred" would be preferred if it
> allowed multiple nodes.
>
> - Bill

Hi, Bill

Will "preferred" be still useful even if it only support single node?

Regards
Osier
>
>
> On 05/05/2011 10:33 AM, Lee Schermerhorn wrote:
>> On Thu, 2011-05-05 at 17:38 +0800, Osier Yang wrote:
>>> Hi, All,
>>>
>>> This is a simple implenmentation for NUMA tuning support based on binary
>>> program 'numactl', currently only supports to bind memory to
>>> specified nodes,
>>> using option "--membind", perhaps it need to support more, but I'd like
>>> send it early so that could make sure if the principle is correct.
>>>
>>> Ideally, NUMA tuning support should be added in qemu-kvm first, such
>>> as they could provide command options, then what we need to do in
>>> libvirt
>>> is just to pass the options to qemu-kvm, but unfortunately qemu-kvm
>>> doesn't
>>> support it yet, what we could do currently is only to use numactl,
>>> it forks process, a bit expensive than qemu-kvm supports NUMA tuning
>>> inside with libnuma, but it shouldn't affects much I guess.
>>>
>>> The NUMA tuning XML is like:
>>>
>>> <numatune>
>>> <membind nodeset='+0-4,8-12'/>
>>> </numatune>
>>>
>>> Any thoughts/feedback is appreciated.
>>
>> Osier:
>>
>> A couple of thoughts/observations:
>>
>> 1) you can accomplish the same thing -- restricting a domain's memory to
>> a specified set of nodes -- using the cpuset cgroup that is already
>> associated with each domain. E.g.,
>>
>> cgset -r cpuset.mems=<nodeset> /libvirt/qemu/<domain>
>>
>> Or the equivalent libcgroup call.
>>
>> However, numactl is more flexible; especially if you intend to support
>> more policies: preferred, interleave. Which leads to the question:
>>
>> 2) Do you really want the full "membind" semantics as opposed to
>> "preferred" by default? Membind policy will restrict the VMs pages to
>> the specified nodeset and will initiate reclaim/stealing and wait for
>> pages to become available or the task is OOM-killed because of mempolicy
>> when all of the nodes in nodeset reach their minimum watermark. Membind
>> works the same as cpuset.mems in this respect. Preferred policy will
>> keep memory allocations [but not vcpu execution] local to the specified
>> set of nodes as long as there is sufficient memory, and will silently
>> "overflow" allocations to other nodes when necessary. I.e., it's a
>> little more forgiving under memory pressure.
>>
>> But then pinning a VM's vcpus to the physical cpus of a set of nodes and
>> retaining the default local allocation policy will have the same effect
>> as "preferred" while ensuring that the VM component tasks execute
>> locally to the memory footprint. Currently, I do this by looking up the
>> cpulist associated with the node[s] from e.g.,
>> /sys/devices/system/node/node<i>/cpulist and using that list with the
>> vcpu.cpuset attribute. Adding a 'nodeset' attribute to the
>> cputune.vcpupin element would simplify specifying that configuration.
>>
>> Regards,
>> Lee
>>
>>




More information about the libvir-list mailing list