[libvirt-users] VMs fail to start with NUMA configuration

Osier Yang jyang at redhat.com
Thu Jan 31 04:30:23 UTC 2013


[ CC Peter ]

On 2013年01月31日 06:01, Doug Goldstein wrote:
> On Wed, Jan 30, 2013 at 1:21 AM, Wayne Sun<gsun at redhat.com>  wrote:
>> On 01/30/2013 01:25 PM, Doug Goldstein wrote:
>>>
>>> On Mon, Jan 28, 2013 at 10:23 AM, Osier Yang<jyang at redhat.com>   wrote:
>>>>
>>>> On 2013年01月29日 00:17, Doug Goldstein wrote:
>>>>>
>>>>> On Sun, Jan 27, 2013 at 10:46 PM, Osier Yang<jyang at redhat.com>    wrote:
>>>>>>
>>>>>> On 2013年01月28日 11:47, Osier Yang wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2013年01月28日 11:44, Osier Yang wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2013年01月26日 01:07, Doug Goldstein wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Jan 24, 2013 at 12:58 AM, Osier Yang<jyang at redhat.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 2013年01月24日 14:26, Doug Goldstein wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jan 23, 2013 at 11:02 PM, Osier Yang<jyang at redhat.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 2013年01月24日 12:11, Doug Goldstein wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Jan 23, 2013 at 3:45 PM, Doug
>>>>>>>>>>>>> Goldstein<cardoe at gentoo.org>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am using libvirt 0.10.2.2 and qemu-kvm 1.2.2 (qemu-kvm 1.2.0
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> qemu
>>>>>>>>>>>>>> 1.2.2 applied on top plus a number of stability patches).
>>>>>>>>>>>>>> Having
>>>>>>>>>>>>>> issue
>>>>>>>>>>>>>> where my VMs fail to start with the following message:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> kvm_init_vcpu failed: Cannot allocate memory
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Smell likes we have problem on setting the NUMA policy (perhaps
>>>>>>>>>>>> caused by the incorrect host NUMA topology), given that the
>>>>>>>>>>>> system
>>>>>>>>>>>> still has enough memory. Or numad (if it's installed) is doing
>>>>>>>>>>>> something wrong.
>>>>>>>>>>>>
>>>>>>>>>>>> Can you see if there is something about the Nodeset used to set
>>>>>>>>>>>> the policy in debug log?
>>>>>>>>>>>>
>>>>>>>>>>>> E.g.
>>>>>>>>>>>>
>>>>>>>>>>>> % cat libvirtd.debug | grep Nodeset
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Well I don't see anything but its likely because I didn't do
>>>>>>>>>>> something
>>>>>>>>>>> correct. I had LIBVIRT_DEBUG=1 exported and ran libvirtd --verbose
>>>>>>>>>>> from the command line.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> If the process is in background, it's expected you can't see
>>>>>>>>>> anything
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> My /etc/libvirt/libvirtd.conf had:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> log_outputs="3:syslog:libvirtd 1:file:/tmp/libvirtd.log" But I
>>>>>>>>>>> didn't
>>>>>>>>>>> get any debug messages.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> log_level=1 has to be set.
>>>>>>>>>>
>>>>>>>>>> Anyway, let's simply do this:
>>>>>>>>>>
>>>>>>>>>> % service libvirtd stop
>>>>>>>>>> % LIBVIRT_DEBUG=1 /usr/sbin/libvirtd 2>&1 | tee -a libvirtd.debug
>>>>>>>>>>
>>>>>>>>> That's what I was doing, minus the tee just to the console and
>>>>>>>>> nothing
>>>>>>>>> was coming out. Which is why I added the 1:file:/tmp/libvirtd.log,
>>>>>>>>> which also didn't get any debug messages. Turns out this instance
>>>>>>>>> must
>>>>>>>>> have been built with --disable-debug,
>>>>>>>>>
>>>>>>>>> All I've got in the log is:
>>>>>>>>>
>>>>>>>>> # grep -i 'numa' libvirtd.debug
>>>>>>>>> 2013-01-25 16:50:15.287+0000: 417: debug : virCommandRunAsync:2200 :
>>>>>>>>> About to run /usr/bin/numad -w 2:2048
>>>>>>>>> 2013-01-25 16:50:17.295+0000: 417: debug : qemuProcessStart:3614 :
>>>>>>>>> Nodeset returned from numad: 1
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> This looks right.
>>>>>>>>
>>>>>>>>> Immediately below that is
>>>>>>>>>
>>>>>>>>> 2013-01-25 16:50:17.295+0000: 417: debug : qemuProcessStart:3622 :
>>>>>>>>> Setting up domain cgroup (if required)
>>>>>>>>> 2013-01-25 16:50:17.295+0000: 417: debug : virCgroupNew:619 : New
>>>>>>>>> group /libvirt/qemu/bb-2.6.35.9-i686
>>>>>>>>> 2013-01-25 16:50:17.295+0000: 417: debug : virCgroupDetect:273 :
>>>>>>>>> Detected mount/mapping 1:cpuacct at /sys/fs/cgroup/cpuacct in
>>>>>>>>> 2013-01-25 16:50:17.295+0000: 417: debug : virCgroupDetect:273 :
>>>>>>>>> Detected mount/mapping 2:cpuset at /sys/fs/cgroup/cpuset in
>>>>>>>>> 2013-01-25 16:50:17.296+0000: 417: debug : virCgroupMakeGroup:537 :
>>>>>>>>> Make group /libvirt/qemu/bb-2.6.35.9-i686
>>>>>>>>> 2013-01-25 16:50:17.296+0000: 417: debug : virCgroupMakeGroup:562 :
>>>>>>>>> Make controller
>>>>>>>>> /sys/fs/cgroup/cpuacct/libvirt/qemu/bb-2.6.35.9-i686/
>>>>>>>>> 2013-01-25 16:50:17.296+0000: 417: debug : virCgroupMakeGroup:562 :
>>>>>>>>> Make controller /sys/fs/cgroup/cpuset/libvirt/qemu/bb-2.6.35.9-i686/
>>>>>>>>> 2013-01-25 16:50:17.296+0000: 417: debug :
>>>>>>>>> virCgroupCpuSetInherit:469
>>>>>>>>> : Setting up inheritance /libvirt/qemu ->
>>>>>>>>> /libvirt/qemu/bb-2.6.35.9-i686
>>>>>>>>> 2013-01-25 16:50:17.296+0000: 417: debug : virCgroupGetValueStr:361
>>>>>>>>> :
>>>>>>>>> Get value /sys/fs/cgroup/cpuset/libvirt/qemu/cpuset.cpus
>>>>>>>>> 2013-01-25 16:50:17.296+0000: 417: debug : virFileClose:72 : Closed
>>>>>>>>> fd 39
>>>>>>>>> 2013-01-25 16:50:17.296+0000: 417: debug :
>>>>>>>>> virCgroupCpuSetInherit:482
>>>>>>>>> : Inherit cpuset.cpus = 0-63
>>>>>>>>> 2013-01-25 16:50:17.296+0000: 417: debug : virCgroupSetValueStr:331
>>>>>>>>> :
>>>>>>>>> Set value
>>>>>>>>> '/sys/fs/cgroup/cpuset/libvirt/qemu/bb-2.6.35.9-i686/cpuset.cpus'
>>>>>>>>> to '0-63'
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> This looks not right, it should be 0-7 instead.
>>>>>>>>
>>>>>>>>> 2013-01-25 16:50:17.296+0000: 417: debug : virFileClose:72 : Closed
>>>>>>>>> fd 39
>>>>>>>>> 2013-01-25 16:50:17.296+0000: 417: debug : virCgroupGetValueStr:361
>>>>>>>>> :
>>>>>>>>> Get value /sys/fs/cgroup/cpuset/libvirt/qemu/cpuset.mems
>>>>>>>>> 2013-01-25 16:50:17.296+0000: 417: debug : virFileClose:72 : Closed
>>>>>>>>> fd 39
>>>>>>>>> 2013-01-25 16:50:17.296+0000: 417: debug :
>>>>>>>>> virCgroupCpuSetInherit:482
>>>>>>>>> : Inherit cpuset.mems = 0-7
>>>>>>>>> 2013-01-25 16:50:17.296+0000: 417: debug : virCgroupSetValueStr:331
>>>>>>>>> :
>>>>>>>>> Set value
>>>>>>>>> '/sys/fs/cgroup/cpuset/libvirt/qemu/bb-2.6.35.9-i686/cpuset.mems'
>>>>>>>>> to '0-7'
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> This is right.
>>>>>>>>
>>>>>>>>> 2013-01-25 16:50:17.296+0000: 417: debug : virFileClose:72 : Closed
>>>>>>>>> fd 39
>>>>>>>>> 2013-01-25 16:50:17.296+0000: 417: warning : qemuSetupCgroup:388 :
>>>>>>>>> Could not autoset a RSS limit for domain bb-2.6.35.9-i686
>>>>>>>>> 2013-01-25 16:50:17.296+0000: 417: debug : virCgroupSetValueStr:331
>>>>>>>>> :
>>>>>>>>> Set value
>>>>>>>>> '/sys/fs/cgroup/cpuset/libvirt/qemu/bb-2.6.35.9-i686/cpuset.mems'
>>>>>>>>> to '1'
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> And it's strange that the cpuset.mems is changed to '1' here.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Oh, actually this is right, cpuset.mems is about the memory nodes.
>>>>>>
>>>>>>
>>>>>>>>> 2013-01-25 16:50:17.296+0000: 417: debug : virFileClose:72 : Closed
>>>>>>>>> fd 39
>>>>>>>>>
>>>>>>>>> Could the RSS issue be related? Some kernel related option not
>>>>>>>>> playing
>>>>>>>>> nice or enabled?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Instead, I'm wondering if the problem is caused by the mismatch
>>>>>>> (from libvirt p.o.v) between cpuset.cpus and cpuset.mems, which
>>>>>>> thus cause the problem for kernel memory management?
>>>>>>
>>>>>>
>>>>>>
>>>>>> So, the simple method to prove the guess is to use static placement
>>>>>> like:
>>>>>>
>>>>>> <vcpu placement='static' cpuset='0-63'>2</vcpu>
>>>>>> <numatune>
>>>>>>      <memory placement='static' nodeset='1'/>
>>>>>> </numatune>
>>>>>>
>>>>>> Osier
>>>>>
>>>>>
>>>>> Same error. Which I don't know if you expected or didn't expect.
>>>>>
>>>> It's expected. as "0-63" is the final result when using "auto"
>>>> placement.
>>>
>>> Since there's another user on the libvirt-list asking about the exact
>>> same CPU I've got, I figured I'd do some poking. Oddly enough him and
>>> I had different outputs from virsh nodeinfo. Just as background its
>>> AMD 6272 CPUs. I've for 4 of them in the box but they're organized as
>>> follows:
>>>
>>> Sockets: 4
>>> Cores: 16
>>> Threads: 1 per core (16)
>>> NUMA nodes: 8
>>> Mem per node: 16GB
>>> Total: 128GB
>>>
>>> # virsh nodeinfo
>>> CPU model:           x86_64
>>> CPU(s):              64
>>> CPU frequency:       2100 MHz
>>> CPU socket(s):       1
>>> Core(s) per socket:  64
>>> Thread(s) per core:  1
>>> NUMA cell(s):        1
>>> Memory size:         132013200 KiB
>>>
>>> # virsh capabilities
>>> <snip>
>>>         <topology sockets='1' cores='64' threads='1'/>
>>> <snip>
>>>       <topology>
>>>         <cells num='8'>
>>> <snip>
>>>
>>> I've hand verified all the values in
>>> /sys/devices/system/nodeX/cpuX/topology/physical_package_id to show
>>> that the physical package is oriented in pairs (0&1, 2&3, 4&5, 6&7)
>>> for the NUMA nodes.
>>>
>>> Need to give git a whirl as I know that's got a bit different code
>>> than 1.0.1 but I'll report back.

As far as I see, Peter committed more patches to fix the CPU toplogy
parsing on AMD platfroms. Perhaps he will known if this is fixed
in new release.

>>>
>> For AMD 62xx CPUs, the output is expected.
>>
>> Check out this bug:
>> virsh nodeinfo can't get the right info on AMD Bulldozer cpu
>> https://bugzilla.redhat.com/show_bug.cgi?id=874050
>>
>> Wayne Sun
>> 2013-01-30
>>
>
> Wayne,
>
> I'd argue we need to determine what format we really need the data in.
> Do we actually really care about physical sockets? Or should we care
> about packages? Because with this specific CPU there are 2 packages in
> 1 physical socket to form 2 NUMA nodes per package.

I agreed. Though the total number of CPUs is correct, which guarantees
most of the stuffs related with CPU topology work. But it still should
be fixed.

>
> The reason I say this is that we went from NUMA being defined for the
> domain working to the domain failing to start up with a cryptic error
> message which IMHO is worse.
>
> The flip side of the coin is that we can just strip out all the NUMA
> settings when starting the domain up if we know it won't work.
>




More information about the libvirt-users mailing list