[libvirt] [REPOST] regarding cgroup v2 support in libvirt

Lennart Poettering mzerqung at 0pointer.de
Fri Oct 21 12:13:17 UTC 2016


On Fri, 21.10.16 11:19, Daniel P. Berrange (berrange at redhat.com) wrote:

> On Thu, Oct 20, 2016 at 02:59:45PM -0400, Tejun Heo wrote:
> > (reposting w/ libvir-list cc'd, sorry about the delay in reposting,
> >  was traveling and then on vacation)
> > 
> > Hello, Daniel.  How have you been?
> > 
> > We (facebook) are deploying cgroup v2 and internally use libvirt to
> > manage virtual machines, so I'm trying to add cgroup v2 support to
> > libvirt.
> > 
> > Because cgroup v2's resource configurations differ from v1 in varying
> > degrees depending on the specific resource type, it unfortunately
> > introduces new configurations (some completely new configs, others
> > just a different range / format).  This means that adding cgroup v2
> > support to libvirt requires adding new config options to it and maybe
> > implementing some form of translation mechanism between overlapping
> > configs.
> > 
> > The upcoming systemd release includes all that's necessary to support
> > v1/v2 compatibility so that users setting resource configs through
> > systemd don't have to worry about whether v1 or v2 is in use.  I'm
> > wondering whether it would make sense to make libvirt use dbus calls
> > to systemd to set resource configs when systemd is in use, so that it
> > can piggyback on systemd's v1/v2 compatibility.
> 
> The big question I have around cgroup v2 is state of support for all
> controllers that libvirt uses (cpu, cpuacct, cpuset, memory, devices,
> freezer, blkio).  IIUC, not all of these have been ported to cgroup
> v2 setup and the cpu port in particular was rejected by Linux maintainers.
> Libvirt has a general policy that we won't support features that only
> exist in out of tree patches (applies to kernel and any other software
> we build against or use).
> 
> IIRC from earlier discussions, the model for dealing with processes in
> cgroup v2 was quite different. In libvirt we rely on the ability to
> assign different threads within a process to different cgroups, because
> we need to control CPU schedular parameters on different threads in
> QEMU. eg we have vCPU threads, I/O threads and general emulator threads
> each of which get different policies.
> 
> When I spoke with Lennart about cgroup v2, way back in Jan, he indicated
> that while systemd can technically work with a system where some
> controllers are mounted as v1, while others are mounted as v2, this
> would not be an officially supported solution. Thus systemd in  Fedora
> was not likely to switch to v2 until all required controllers could use
> v2. I'm not sure if this still corresponds to Lennarts current views, so
> CC'ing him to confirm/deny.

So, the "hybrid" mode is probably nothing RHEL or so would want to
support. However, I think it might be a good step for Fedora at
least. But yes, supporting this mode means additional porting effort
for the various daemons that access cgroupfs...

> I recall that systemd policy for v2 was inteded to be that no app
> should write to cgroup sysfs except for systemd, unless there was
> a sub-tree created with Delegate=yes set on the scope. So this clearly
> means when using v2 we'll have to use the systemd DBus APIs for managing
> cgroups v2 on such hosts.

Yes, this is our policy: the cgroup tree is private property of
systemd (at least regarding write access), except when your have a
service or scope unit where Delegate=yes is set, in which case you can
manage your own subtree of that freely.

Lennart

-- 
Lennart Poettering, Red Hat




More information about the libvir-list mailing list