Exactly What Are OpenShift Runlevels and Should You Really Use Them?

2021년 7월 8일Mark Cooper4분 읽기

Runlevels in the OpenShift Container Platform (OCP) help manage the startup of major API groups such as the kube-apiserver and openshift-apiserver. They are used to indicate that a namespace contains pods that must be running before the openshift-apiserver pods. As runlevels are applied as labels inside of a Kubernetes namespace, other products and components can also implement them.

It is important to note that the concept of runlevels in OpenShift do not relate to the Linux runlevels. For example, in OCP runlevel 1 only dictates the start order of components (after the kube-apiserver) and not the init single-user text mode.

However, while it may seem like a good feature to use to help start a component as early as possible, this weakens the security protections in the cluster. The intent is to use runlevels during startup; however, no Security Context Constraints (SCCs) are applied to any pod in the labeled namespace. This is significant. If the SCC is not set, then any workload running in that namespace may be highly privileged, which is a level reserved for trusted workloads. Early runlevels are used for namespaces containing pods that provide admission webhooks for workload pods.

The general advice is to avoid their usage entirely.

What are runlevels?

In OpenShift, the concept of runlevels is closely tied to the startup order of components, but remember: they do not relate to the Linux init runlevels. There are two main runlevels of interest:

Runlevel 0 is required to start the kube-apiserver
Runlevel 1 is required to start the openshift-apiserver and oauth-apiserver

The concept of runlevels is not new in Kubernetes; however, it is only mentioned in passing in the documentation, and hence the implementation here is specific to OpenShift. Runlevels first allow the kube-apiserver to start, then the openshift-apiserver, and then subsequent components.

Runlevels are essentially implemented as a built-in admission hook, determining what admission plug-ins to run at each different level, and more importantly, what plug-ins to skip. This is notable as the admission process in OpenShift applies SCCs to help enforce a minimum security level across all pods. Any namespace, which is in either runlevel 1 or 0, skips the enforcement of Security Context Constraints (SCC) entirely.

Within a default install of OpenShift, there is a static list of existing namespaces that have a runlevel set for the core components:

default	kube-system	kube-public
openshift	openshift-infra	openshift-node

However, it is important to note that the runlevels are inclusive, meaning that runlevel 1 includes everything set in runlevel 0:

func init() {
    runLevelOneNamespaces.Insert(runLevelZeroNamespaces.List()...)
}

The runlevels also dictates what admission controls to skip:

SkipRunLevelOnePlugins = sets.NewString(
    imagepolicyapiv1.PluginName, // "image.openshift.io/ImagePolicy"
    “quota.openshift.io/ClusterResourceQuota",
    "security.openshift.io/SecurityContextConstraint",
    "security.openshift.io/SCCExecRestrictions",
)

Hence, both runlevel 0 and 1 will not have these admission plug-ins applied.

This also applies to any new namespaces created in a running cluster as well, which includes the label:

labels:
    openshift.io/run-level: "0"

As similar to the static list discussed above, runlevel 1 labels are inclusive of runlevel 0:

skipRunLevelOneSelector, err = labels.Parse(runLevelLabel + " notin ( 0,1 )")

Regardless of a given user's permissions, any pod created here will not receive an SCC context. A user needs the appropriate permissions to create pods by a cluster-admin in these namespaces (besides cluster-admin), as by default, such requests are forbidden.

Consequences

As stated previously, as runlevels are applied at the namespace level, the permissions of the current user are not taken into account. Even if a given user does not have permissions to create a namespace with runlevel 1 but can create a pod in a namespace that does have the runlevel already set, the pod will still not receive an SCC context.

For example, if a user is permitted to create a pod in the default namespace, the pod is created with no SCC:

$ oc get pod tests -o yaml | grep -i scc

Whereas created in a regular namespace (regular meaning no runlevel specified):

$ oc get pod tests -o yaml | grep -i scc

openshift.io/scc: restricted

Without the SCC restrictions enforced in these namespaces, the power to create pods in these namespaces is equivalent to root on the node. Security measures like requiring workloads to run as pseudo-random UIDs (a good thing for multitenancy and helping to protect against container escapes) and dropping some capabilities are never applied.

Regarding a simple example, running the Grafana container in a runlevel 1 namespace it can be observed that OCP has honoured the uid that the container has asked for:

$ id
uid=472(grafana) gid=472(grafana) groups=472(grafana)

Instead of what it should be:

$ id
uid=1000680000(1000680000) gid=0(root) groups=0(root),1000680000

And likewise, the pod is granted unwanted capabilities such as:

- KILL
- MKNOD
- SETUID
- SETGID

$ capsh --print
Current: = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service+i

This alone does not automatically mean that a potential attacker can perform a container escape, but it does significantly weaken the security in depth approach of OpenShift.

Conclusion

Setting runlevels in OpenShift in such components like operators (or anything really) should be avoided, as each workload must have an SCC context.

Historically, in older versions of OCP (4.4), there was a significant delay in the bootstrapping flow. This meant that if a component (that is, a pod) existed in a namespace which used SCC, there would be a delay before it could start.

In newer versions of OCP (4.6+), this delay has been virtually eliminated, meaning that the usage of runlevels should now not be required at all. Hence the primary alternative is to simply try the workload without any runlevel specified to begin with.

If a workload does not function correctly due to SCC, then the solution is not specifying a runlevel, but rather using one of the other provided SCC contexts.

Special thanks to David Eads and his assistance on understanding how runlevels work within OCP.

저자 소개

Mark Cooper

Security Engineer

Mark Cooper is a Product Security Engineer at Red Hat, specializing in a number of cloud technologies such as OpenShift and ServiceMesh.

Read full bio

유사한 검색 결과

Blog post

채널별 검색

모든 채널 탐색

Exactly What Are OpenShift Runlevels and Should You Really Use Them?

What are runlevels?

Consequences

Conclusion

저자 소개

Mark Cooper

유사한 검색 결과

채널별 검색

플랫폼

툴

체험, 구매 & 영업

커뮤니케이션

Red Hat 소개

Change page language

Red Hat legal and privacy links

Red Hat legal and privacy links