Red Hat Blog
As an OpenShift Container platform operator, managing resources on nodes is one of the most important tasks. Setting LimitRange and Quota are the right way to limit resources. Many blog posts cover the Quota and LimitRange from the OpenShift Container Platform perspective, but they do not explain the relationship between those objects in Kubernetes and control groups (cgroups) in the Linux kernel. Since I haven’t seen this covered elsewhere, I decided I’d get into the connection with particular attention to the CPU/memory limit.
Needed: basic knowledge of Red Hat Enterprise Linux 7
Red Hat OpenShift has Red Hat Enterprise Linux at the foundation. In order to understand Quota and LimitRange in OpenShift, we need to take a look at Red Hat Enterprise Linux stuff first. I will cover basic information in Red Hat Enterprise Linux 7 regarding cgroups, systemd and so on.
The cgroups feature has existed in Linux for quite some time, but it has become more prominent because of Linux containers and Kubernetes recently. It allows us to limit the resource usage of processes. In Red Hat Enterprise Linux 7, we can use cgroups by default and systemd to help mount important resource controllers in the /sys/fs/cgroups directory.
The systemd system and service manager is responsible for controlling how services are started, stopped and otherwise managed on Red Hat Enterprise Linux 7 systems.
Slice: A slice unit, according to the systemd.slice man page, is a concept for hierarchically managing resources of a group of processes. A slice divides up computer resources (such as CPU and memory) and apply them to selected units.
Scope: A process that is created by another process not systemd. Unlike service units, scope units manage externally created processes, and does not fork off processes on its own.
Service: A unit configuration file whose name ends in ".service" encodes information about a process controlled and supervised by systemd.
The relationship between slice, scope, service and processes
Let’s take a quick look at how these terms relate to one another. A slice organizes scopes and services hierarchies. Processes are attached to services or scopes, not slices.
We know the definition so now let’s try to do actual battle. This example command is from an article by Frederic Giloux: Controlling resources with cgroups for performance testing. Here we create a scope called “fredunit” and then call its status using systemctl.
This next example will use a systemd service. If you’d like to learn more about services, Jayaraj Deenadayalan has written a good article to help us understand a Red Hat Enterprise Linux 7 systemd unit file, and how to generate one from traditional sysV init scripts.
The next example shows what the slice looks, and as you see -- the slice organizes the scope and service hierarchies.
Resource management in cgroups
Slices will divide many different types, with four default cgroups:
The “root” slice
And other slices
A Slice with its own cgroup lets you control the amount of resource.
Processes under a slice share resources.
A slice can set CPU/Memory Limit.
A systemd unit is always associated with its own cgroup
With systemd's use of cgroups, precise limits can be set on CPU and memory usage, as well as other resources.
Useful systemd commands
To see what services and other units (service, mount, path, socket, and so on) are associated with a particular target, type this command:
systemctl list-dependencies multi-user.target
To see dependencies of a service, use the list-dependencies option:
systemctl list-dependencies atomic-openshift-node.service
To list specific types of units:
systemctl list-units --type service systemctl list-units --type mount
To list all units installed on the system, along with their current states:
To view processes associated with a particular service (cgroup) - Once systemd-cgtop is running, you can press keys to sort by memory (m), CPU (c), task (t), path (p), or I/O load (i):
To output a recursive list of cgroup content:
Using cgroups, we can divide resources for each process. From a Red Hat Enterprise Linux perspective, this is how we set the limit for CPU/memory, and how to monitor assigned resources by cgroups.
I created two different scenarios to set limit in cgroup. Following external url will give you the detailed steps.
Scenario 1 : Use cpuset hierarchy creating folder /sys/fs/cgroup (scope mode)
Scenario 2 : Use conf file to set cpu/memory amount for limit (service mode)
Now, that we know how to set the limit, let’s test it. To do this, we will give load for memory/cpu. Let’s see if the limit config is really blocking the process to not exceed limit resources.
Lastly, I will try to make a similar Slice that kubernetes uses and I hope that it gives you insight into how Kubernetes uses cgroups for LimitRange. Basically, Kubernetes uses one of three Quality of Service (QoS) classes: Burstable, Guaranteed, or BestEffort and creates slices based on the QoS. The way to generate slices is by creating folders under /sys/fs/cgroup. It looks at the chain of slices.
The results of Scenario 4 are briefly summarized as follows:
The slices that I created
The slices that Kubernetes created
What do you think? They are very similar each other, aren’t they?
Red Hat OpenShift Container Platform/Kubernetes uses cgroups because it uses containers. Which means the way to set limits should be the same. OpenShift Container Platform/Kubernetes uses QoS (Quality of Service) and the chain of slices will be created recursively in /sys/fs/cgroup based on QoS. The chain of slices allows each container to set limits for resources like a normal process. Through the series of demo scenarios, I hope you have better understanding of how OpenShift Container Platform sets limits by cgroups.
I would like to thank to Frédéric Giloux and Marc Richter. This blog is written on top of their wonderful blogs: ”controlling resources with cgroups for performance testing,” and the ”world domination with cgroups” series.
About the author
Jooho Lee is a senior OpenShift Technical Account Manager (TAM) in Toronto supporting middleware products(EAP/ DataGrid/ Web Server) and cloud products (Docker/ Kubernetes/ OpenShift/ Ansible). He is an active member of JBoss User Group Korea and Openshift / Ansible Group.