Runlevels in the OpenShift Container Platform (OCP) help manage the startup of major API groups such as the kube-apiserver and openshift-apiserver. They are used to indicate that a namespace contains pods that must be running before the openshift-apiserver pods. As runlevels are applied as labels inside of a Kubernetes namespace, other products and components can also implement them. 

It is important to note that the concept of runlevels in OpenShift do not relate to the Linux runlevels. For example, in OCP runlevel 1 only dictates the start order of components (after the kube-apiserver) and not the init single-user text mode. 

However, while it may seem like a good feature to use to help start a component as early as possible, this weakens the security protections in the cluster. The intent is to use runlevels during startup; however, no Security Context Constraints (SCCs) are applied to any pod in the labeled namespace. This is significant. If the SCC is not set, then any workload running in that namespace may be highly privileged, which is a level reserved for trusted workloads. Early runlevels are used for namespaces containing pods that provide admission webhooks for workload pods.

The general advice is to avoid their usage entirely

What are runlevels?

In OpenShift, the concept of runlevels is closely tied to the startup order of components, but remember: they do not relate to the Linux init runlevels. There are two main runlevels of interest:

  • Runlevel 0 is required to start the kube-apiserver
  • Runlevel 1 is required to start the openshift-apiserver and oauth-apiserver

The concept of runlevels is not new in Kubernetes; however, it is only mentioned in passing in the documentation, and hence the implementation here is specific to OpenShift. Runlevels first allow the kube-apiserver to start, then the openshift-apiserver, and then subsequent components.

Runlevels are essentially implemented as a built-in admission hook, determining what admission plug-ins to run at each different level, and more importantly, what plug-ins to skip. This is notable as the admission process in OpenShift applies SCCs to help enforce a minimum security level across all pods. Any namespace, which is in either runlevel 1 or 0, skips the enforcement of Security Context Constraints (SCC) entirely. 

Within a default install of OpenShift, there is a static list of existing namespaces that have a runlevel set for the core components:

default
kube-system
kube-public
openshift
openshift-infra
openshift-node

However, it is important to note that the runlevels are inclusive, meaning that runlevel 1 includes everything set in runlevel 0:

func init() {
   runLevelOneNamespaces.Insert(runLevelZeroNamespaces.List()...)
}

The runlevels also dictates what admission controls to skip:

SkipRunLevelOnePlugins = sets.NewString(
   imagepolicyapiv1.PluginName, // "image.openshift.io/ImagePolicy"
   “quota.openshift.io/ClusterResourceQuota",
   "security.openshift.io/SecurityContextConstraint",
   "security.openshift.io/SCCExecRestrictions",
)

Hence, both runlevel 0 and 1 will not have these admission plug-ins applied.

This also applies to any new namespaces created  in a running cluster as well, which includes the label:

labels:
   openshift.io/run-level: "0"

As similar to the static list discussed above, runlevel 1 labels are inclusive of runlevel 0:

skipRunLevelOneSelector, err = labels.Parse(runLevelLabel + " notin ( 0,1 )")

Regardless of a given user's permissions, any pod created here will not receive an SCC context. A user needs the appropriate permissions to create pods by a cluster-admin in these namespaces (besides cluster-admin), as by default, such requests are forbidden. 

Consequences

As stated previously, as runlevels are applied at the namespace level, the permissions of the current user are not taken into account. Even if a given user does not have permissions to create a namespace with runlevel 1 but can create a pod in a namespace that does have the runlevel already set, the pod will still not receive an SCC context.

For example, if a user is permitted to create a pod in the default namespace, the pod is created with no SCC:

$ oc get pod tests -o yaml | grep -i scc

Whereas created in a regular namespace (regular meaning no runlevel specified):

$ oc get pod tests -o yaml | grep -i scc

openshift.io/scc: restricted

Without the SCC restrictions enforced in these namespaces, the power to create pods in these namespaces is equivalent to root on the node.  Security measures like requiring workloads to run as pseudo-random UIDs (a good thing for multitenancy and helping to protect against container escapes) and dropping some capabilities are never applied. 

Regarding a simple example, running the Grafana container in a runlevel 1 namespace it can be observed that OCP has honoured the uid that the container has asked for:

$ id
uid=472(grafana) gid=472(grafana) groups=472(grafana)

Instead of what it should be:

$ id
uid=1000680000(1000680000) gid=0(root) groups=0(root),1000680000

And likewise, the pod is granted unwanted capabilities such as: 

- KILL
- MKNOD
- SETUID
- SETGID

$ capsh --print
Current: = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service+i

This alone does not automatically mean that a potential attacker can perform a container escape, but it does significantly weaken the security in depth approach of OpenShift. 

Conclusion

Setting runlevels in OpenShift in such components like operators (or anything really) should be avoided, as each workload must have an SCC context. 

Historically, in older versions of OCP (4.4), there was a significant delay in the bootstrapping flow. This meant that if a component (that is, a pod) existed in a namespace which used SCC, there would be a delay before it could start. 

In newer versions of OCP (4.6+), this delay has been virtually eliminated, meaning that the usage of runlevels should now not be required at all. Hence the primary alternative is to simply try the workload without any runlevel specified to begin with.

If a workload does not function correctly due to SCC, then the solution is not specifying a runlevel, but rather using one of the other provided SCC contexts. 

Special thanks to David Eads and his assistance on understanding how runlevels work within OCP.


저자 소개

Mark Cooper is a Product Security Engineer at Red Hat, specializing in a number of cloud technologies such as OpenShift and ServiceMesh.

 
UI_Icon-Red_Hat-Close-A-Black-RGB

채널별 검색

automation icon

오토메이션

기술, 팀, 인프라를 위한 IT 자동화 최신 동향

AI icon

인공지능

고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트

open hybrid cloud icon

오픈 하이브리드 클라우드

하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요

security icon

보안

환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보

edge icon

엣지 컴퓨팅

엣지에서의 운영을 단순화하는 플랫폼 업데이트

Infrastructure icon

인프라

세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보

application development icon

애플리케이션

복잡한 애플리케이션에 대한 솔루션 더 보기

Virtualization icon

가상화

온프레미스와 클라우드 환경에서 워크로드를 유연하게 운영하기 위한 엔터프라이즈 가상화의 미래