[libvirt] [PATCH 0/3] several cgroups/cpuset fixes

Henning Schild henning.schild at siemens.com
Fri Jan 8 11:46:22 UTC 2016


On Thu, 7 Jan 2016 19:56:33 -0500
John Ferlan <jferlan at redhat.com> wrote:

> 
> 
> On 01/07/2016 02:01 PM, Henning Schild wrote:
> > On Thu, 7 Jan 2016 11:20:23 -0500
> > John Ferlan <jferlan at redhat.com> wrote:
> > 
> >>
> >> [...]
> >>
> >>>> No problem - although it seems they've generated a regression in
> >>>> the virttest memtune test suite.  I'm 'technically' on vacation
> >>>> for the next couple of weeks; however, I think/perhaps the
> >>>> problem is a result of this patch and the change to adding the
> >>>> task to the cgroup at the end of the for loop, but perhaps the
> >>>> following code causes the control to jump back to the top of the
> >>>> loop:
> >>>>
> >>>>              if (!cpumap)
> >>>>                  continue;
> >>>>
> >>>>               if (qemuSetupCgroupCpusetCpus(cgroup_vcpu, cpumap)
> >>>> < 0) goto cleanup;
> >>>>
> >>>> not allowing the
> >>>>
> >>>>
> >>>>         /* move the thread for vcpu to sub dir */
> >>>>         if (virCgroupAddTask(cgroup_vcpu,
> >>>>                              qemuDomainGetVcpuPid(vm, i)) < 0)
> >>>>             goto cleanup;
> >>>>
> >>>> to be executed.
> >>>>
> >>>> The code should probably change to be (like IOThreads):
> >>>>
> >>>>              if (cpumap &&
> >>>>                  qemuSetupCgroupCpusetCpus(cgroup_vcpu, cpumap) <
> >>>> 0) goto cleanup;
> >>>>
> >>>>
> >>>> As for the rest, I suspect things will be quite quiet around here
> >>>> over the next couple of weeks. A discussion to perhaps start in
> >>>> the new year.
> >>>
> >>> Same here. I will have a look at that regression after my
> >>> vacation, should it still be there.
> >>>
> >>> Henning
> >>>
> >>
> >> More data from the issue...  While the above mentioned path is an
> >> issue, I don't believe it's what's causing the test failure.
> >>
> >> I haven't quite figured out why yet, but it seems
> >> the /proc/#/cgroup file isn't getting the proper path for the
> >> 'memory' slice and thus the test fails because it's looking at the:
> >>
> >>    /sys/fs/cgroup/memory/machine.slice/memory.*
> >>
> >> files instead of the
> >>
> >>     /sys/fs/cgroup/memory/machine.slice/$path/memory.*
> > 
> > To be honest i did just look at the cgroup/cpuset/ hierarchy, but i
> > just browsed cgroup/memory/ as well.
> > 
> > The target of my patch series was to get
> > cgroup/cpuset/machine.slice/tasks to be emtpy, all tasks should be
> > in their sub-cgroup under the machine.slice. And the ordering
> > patches make sure the file is always empty.
> > 
> > In the memory cgroups all tasks are in the parent group (all in
> > machine.slice/tasks). machine.slice/*/tasks are empty. I am not sure
> > whether that is intended, i can just assume it is a bug in the
> > memory cgroup subsystem. Why are the groups created and tuned when
> > the tasks stay in the big superset?
> 
> TBH - there's quite a bit of this that mystifies me... Use of cgroups
> is not something I've spent a whole lot of time looking at...
> 
> I guess I've been working under the assumption that when the
> machine.slice/$path is created, the domain would use that for all
> cgroup specific file adjustments for that domain. Not sure how the
> /proc/$pid/cgroup is related to this.
> 
> My f23 system seems to generate the /proc/$pid/cgroup with the
> machine.slice/$path/ for each of the cgroups libvirt cares about while
> the f20 system with the test only has that path for cpuset and
> cpu,cpuacct. Since that's what the test uses for to find the memory
> path for validation that's why it fails.
> 
> I've been looking through the libvirtd debug logs to see if anything
> jumps out at me, but it seems both the systems I've looked at will
> build the path for the domain using the machine.slice/$path as seen
> during domain startup.
> 
> Very odd - perhaps looking at it too long right now though!
> 
> 
> > /proc/#/cgroup is showing the correct path, libvirt seems to fail to
> > migrate tasks into memory subgroups. (i am talking about a patched
> > 1.2.19 where vms do not have any special memory tuning)
> 
> I'm using latest upstream 1.3.1 - it seems to set the
> machine.slice/$path for blkio, cpu,cpuacct, cpuset, memory, and
> devices entries.
> 
> > 
> > Without my patches the first qemu thread was in
> > "2:cpuset:/machine.slice" and the name did match
> > "4:memory:/machine.slice". Now if the test wants matching names the
> > test might just be wrong. Or as indicated before there might be a
> > bug in the memory cgroups.
> > 
> 
> I'm leaning towards something in the test. I'll check if reverting
> these changes alters the results. I don't imagine it will.

The real question is which thread it fails on and at what point in
time. My patches only changed the order of operations where threads
enter the cpuset cgroups at a slightly different time. And the qemu main
thread never enters the parent group, it becomes an emulator-thread.
Maybe you can point to exactly the assertion that fails. Including a
link to the test code. And yes if you can confirm that the patches are
to blame that would be a good first step ;).

Thanks,
Henning

> John
> >> Where $path is "machine-qemu\x2dvirt\x2dtests\x2dvm1.scope"
> >>
> >> This affects the virsh memtune $dom command test suite which uses
> >> the /proc/$pid/cgroup file in order to find the path for the
> >> 'memory' or 'cpuset' or 'cpu,cpuacct' cgroup paths.
> >>
> >> Seems to be some interaction with systemd that I have quite figured
> >> out.
> >>
> >> I'm assuming this is essentially the issue you were trying to fix -
> >> that is changes to values should be done to the machine-qemu*
> >> specific files rather than the machine.slice files.
> >>
> >> The good news is I can see the changes occurring in the
> >> machine-qemu* specific files, so it seems libvirt is doing the
> >> right thing.
> >>
> >> However, there's something strange with perhaps previously
> >> existing/running domains where that /proc/$pid/cgroup file doesn't
> >> get the $path for the memory entry, thus causing the test
> >> validation to look in the wrong place.
> >>
> >> Hopefully this makes sense. What's really strange (for me at
> >> least) is that it's only occurring on one test system. I can set
> >> up the same test on another system and things work just fine.  I'm
> >> not quite sure what interaction generates that /proc/$pid/cgroup
> >> file - hopefully someone else understands it and help me make
> >> sense of it.
> > 




More information about the libvir-list mailing list