[libvirt] [RFC PATCH] NUMA tuning support
Daniel P. Berrange
berrange at redhat.com
Fri May 6 09:27:44 UTC 2011
On Thu, May 05, 2011 at 10:33:46AM -0400, Lee Schermerhorn wrote:
> On Thu, 2011-05-05 at 17:38 +0800, Osier Yang wrote:
> > Hi, All,
> >
> > This is a simple implenmentation for NUMA tuning support based on binary
> > program 'numactl', currently only supports to bind memory to specified nodes,
> > using option "--membind", perhaps it need to support more, but I'd like
> > send it early so that could make sure if the principle is correct.
> >
> > Ideally, NUMA tuning support should be added in qemu-kvm first, such
> > as they could provide command options, then what we need to do in libvirt
> > is just to pass the options to qemu-kvm, but unfortunately qemu-kvm doesn't
> > support it yet, what we could do currently is only to use numactl,
> > it forks process, a bit expensive than qemu-kvm supports NUMA tuning
> > inside with libnuma, but it shouldn't affects much I guess.
> >
> > The NUMA tuning XML is like:
> >
> > <numatune>
> > <membind nodeset='+0-4,8-12'/>
> > </numatune>
> >
> > Any thoughts/feedback is appreciated.
>
> Osier:
>
> A couple of thoughts/observations:
>
> 1) you can accomplish the same thing -- restricting a domain's memory to
> a specified set of nodes -- using the cpuset cgroup that is already
> associated with each domain. E.g.,
>
> cgset -r cpuset.mems=<nodeset> /libvirt/qemu/<domain>
>
> Or the equivalent libcgroup call.
>
> However, numactl is more flexible; especially if you intend to support
> more policies: preferred, interleave. Which leads to the question:
>
> 2) Do you really want the full "membind" semantics as opposed to
> "preferred" by default? Membind policy will restrict the VMs pages to
> the specified nodeset and will initiate reclaim/stealing and wait for
> pages to become available or the task is OOM-killed because of mempolicy
> when all of the nodes in nodeset reach their minimum watermark. Membind
> works the same as cpuset.mems in this respect. Preferred policy will
> keep memory allocations [but not vcpu execution] local to the specified
> set of nodes as long as there is sufficient memory, and will silently
> "overflow" allocations to other nodes when necessary. I.e., it's a
> little more forgiving under memory pressure.
I think we need to make the choice of strict binding, vs preferred
binding an XML tunable, since both options are valid.
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
More information about the libvir-list
mailing list