[libvirt] [PATCH 5/6] qemu: Leave cpuset.mems in parent cgroup alone

Martin Kletzander mkletzan at redhat.com
Tue Dec 23 11:40:23 UTC 2014


On Mon, Dec 22, 2014 at 10:45:33AM +0100, Martin Kletzander wrote:
>On Wed, Dec 17, 2014 at 04:59:30PM -0700, Eric Blake wrote:
>>On 12/17/2014 08:06 AM, Martin Kletzander wrote:
>>> On Wed, Dec 17, 2014 at 12:00:36AM -0700, Eric Blake wrote:
>>>> On 12/16/2014 11:51 PM, Eric Blake wrote:
>>>>> On 12/15/2014 12:58 AM, Martin Kletzander wrote:
>>>>>> Instead of setting the value of cpuset.mems once when the domain starts
>>>>>> and then re-calculating the value every time we need to change the
>>>>>> child
>>>>>> cgroup values, leave the cgroup alone and rather set the child data
>>>>>> every time there is new cgroup created.  We don't leave any task in the
>>>>>> parent group anyway.  This will ease both current and future code.
>>>>>>
>>>>>> Signed-off-by: Martin Kletzander <mkletzan at redhat.com>
>>>>>> ---
>>>>>>  src/qemu/qemu_cgroup.c | 67
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++--
>>>>>>  src/qemu/qemu_driver.c | 59
>>>>>> +++++++++++++++-----------------------------
>>>>>>  2 files changed, 85 insertions(+), 41 deletions(-)
>>>>>
>>>>> This patch causes libvirtd to segfault on startup:
>>>>
>>>> More particularly, I'm doing an upgrade from older libvirtd to this
>>>> version, while leaving a transient domain running that was started by
>>>> the older libvirtd.  Hope that helps you narrow in on the problem.
>>
>>Weird - At the time I made the report, I ran 'git bisect' and reliably
>>reproduced the crash across multiple libvirtd restarts (a restart for
>>each new build while trying to nail down the culprit commit), as long as
>>I left that transient domain running.
>>

I managed to reproduce this (or actually talk to a person who did) and
found out a bit more.  As you say, until the domain was running
everything was reproducible.  However, when I started it with the old
libvirt second time to find out the root cause, it just worked...

>>>>
>>>
>>> I tried that and it works for me.  And I tried various domains, both
>>> heavily cgroup dependent and simple ones.
>>
>>But now that I've rebooted, and then proceeded to do incremental builds
>>from both before and after the patch, I can't reproduce the crash.
>>Although my formula for creating my transient domain was the same both
>>yesterday and today, there may be some difference in the version of
>>libvirtd that was running at the time I created the domain that then
>>affected the XML affecting the libvirtd restarts.
>>
>>>
>>>> Reverting 86759e and af2a1f0 was sufficient to get me going again
>>>> locally, but I'm not going to push my reversions until you've first had
>>>> a chance to address the regression.
>>>>
>>
>>>>> #2  0x00007ffff7405673 in virCgroupHasEmptyTasks (cgroup=0x0,
>>>>> controller=2)
>>>>>     at util/vircgroup.c:3935
>>>
>>> From this line it looks like priv->cgroup was not initialized.  I did
>>> not add a check for that, so that may be the cause.  I'll send a patch
>>> soon.
>>>
>>> But I wonder how did you manage to do that, is that a session libvirtd
>>> you restarted?  Otherwise how come virCgroupNewDetectMachine() didn't
>>> fill the cgroup?
>>
>>I'm not spotting anything obvious why it wasn't initialized at the time
>>I reproduced the crash, nor why I can't reproduce it now.  Hopefully
>>it's not a lurking time bomb.
>>

It might be and it may cause more problems that just non-working
hot-plugging of vCPUs.  And it is not caused by these patches I
added.  The thing is that old libvirt starts a domain, but it
*somehow* (this is the part I was not able to find out yet) doesn't
add the task into the cgroup hierarchy for all cgroup controllers.
The /proc/$(pidof qemu-kvm)/cgroups file looks like this then:

10:hugetlb:/
9:perf_event:/machine.slice/machine-qemu\x2da.scope
8:blkio:/
7:net_cls:/machine.slice/machine-qemu\x2da.scope
6:freezer:/machine.slice/machine-qemu\x2da.scope
5:devices:/
4:memory:/
3:cpuacct,cpu:/
2:cpuset:/machine.slice/machine-qemu\x2da.scope/emulator
1:name=systemd:/machine.slice/machine-qemu\x2da.scope

And when new libvirtd is starting, it detects the cgroup mounts and
placements (in this example the placement for memory cgroup is "/")
and the it goes and validates them.  If the cgroup doesn't have a
placement that ends with machine-qemu-blah or libvirt-qemu or
whatever, the following condition in virCgroupValidateMachineGroup()
fails:

if (STRNEQ(tmp, name) &&
    STRNEQ(tmp, partname) &&
    STRNEQ(tmp, scopename))

[in here the variable tmp points after the last slash in the partition
 string]

The function returns false, hence virCgroupNewDetectMachine() frees
the cgroup (and sets it to NULL) and returns success.  This means we
will still properly detect the machine but we won't have access to any
cgroup settings of that particular machine.

The patch I posted fixes it to the extent that the daemon doesn't
crash, but it doesn't fix the root cause (which is unknown).

I played with it a bit more and I've found out that a domain, which
had its own partition in all the cgroups controllers when started,
*lost* some of them after a while.  I might have been caused by
libvirtd restarting, but that's highly unlikely, I guess.

Even though inotify works for cgroup fs it does not show us who
accessed it.  I'll have to resort to using systemtap again.

>>If it happens again, I'll definitely report it; but for now, without a
>>reliable reproduction, it could easily have been something caused on my
>>end (since it is my dev machine, it may have been caused by a half-baked
>>patch on my end that I was testing at the time I created the transient
>>domain, where using only upstream patches wouldn't see the issue).  So
>>for now, don't worry too hard if you can't find it either.
>>
>
>There is a valid code path that allows the cgroup to be NULL, although
>not easy to reach.  However, since it is valid, we added a patch that
>just skips the cgroup restoration for cgroup == NULL.
>
>If you happen to stumble upon a similar problem, I'd be happy to try
>dealing with it.
>
>Martin



>--
>libvir-list mailing list
>libvir-list at redhat.com
>https://www.redhat.com/mailman/listinfo/libvir-list
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20141223/895d0fa8/attachment-0001.sig>


More information about the libvir-list mailing list