[libvirt] [RFC PATCH] NUMA tuning support

Daniel P. Berrange berrange at redhat.com
Fri May 6 09:27:44 UTC 2011


On Thu, May 05, 2011 at 10:33:46AM -0400, Lee Schermerhorn wrote:
> On Thu, 2011-05-05 at 17:38 +0800, Osier Yang wrote:
> > Hi, All,
> > 
> > This is a simple implenmentation for NUMA tuning support based on binary
> > program 'numactl', currently only supports to bind memory to specified nodes,
> > using option "--membind", perhaps it need to support more, but I'd like
> > send it early so that could make sure if the principle is correct.
> > 
> > Ideally, NUMA tuning support should be added in qemu-kvm first, such
> > as they could provide command options, then what we need to do in libvirt
> > is just to pass the options to qemu-kvm, but unfortunately qemu-kvm doesn't
> > support it yet, what we could do currently is only to use numactl,
> > it forks process, a bit expensive than qemu-kvm supports NUMA tuning
> > inside with libnuma, but it shouldn't affects much I guess.
> > 
> > The NUMA tuning XML is like:
> > 
> > <numatune>
> >   <membind nodeset='+0-4,8-12'/>
> > </numatune>
> > 
> > Any thoughts/feedback is appreciated.
> 
> Osier:
> 
> A couple of thoughts/observations:
> 
> 1) you can accomplish the same thing -- restricting a domain's memory to
> a specified set of nodes -- using the cpuset cgroup that is already
> associated with each domain.  E.g.,
> 
> 	cgset -r cpuset.mems=<nodeset> /libvirt/qemu/<domain>
> 
> Or the equivalent libcgroup call.
> 
> However, numactl is more flexible; especially if you intend to support
> more policies:  preferred, interleave.  Which leads to the question:
> 
> 2) Do you really want the full "membind" semantics as opposed to
> "preferred" by default?  Membind policy will restrict the VMs pages to
> the specified nodeset and will initiate reclaim/stealing and wait for
> pages to become available or the task is OOM-killed because of mempolicy
> when all of the nodes in nodeset reach their minimum watermark.  Membind
> works the same as cpuset.mems in this respect.  Preferred policy will
> keep memory allocations [but not vcpu execution] local to the specified
> set of nodes as long as there is sufficient memory, and will silently
> "overflow" allocations to other nodes when necessary.  I.e., it's a
> little more forgiving under memory pressure.

I think we need to make the choice of strict binding, vs preferred
binding an XML tunable, since both options are valid.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|




More information about the libvir-list mailing list