In our last post, we discussed SELinux and how it can be used to improve container security. We also looked at the Multi-Level Security (MLS) and Multi-Category Security (MCS) models. In this post, we'll compare those models and explain why we believe MCS to be a better approach to container security.
We often describe SELinux policy for containers as "what happens in Vegas stays in Vegas." What we mean by this is we use SELinux to keep the processes inside of the container file system. If somehow they break out of container confinement, SELinux can prevent them from reading and writing other content on the hosts file systems.
SELinux has been proven to block container breakouts based on file system attacks. The goal of MLS is similar in that it allows the processes running in the same sensitivity level to read/write all of the content at the same level.
Container environments use Type enforcement to protect the host file system from the container processes. Almost every container runs with an SELinux process type of
container_t. All of the container content inside of the container is labeled with the SELinux file type of
container_file_t. The SELinux rules state that
container_t can read/write/execute all content labeled
container_file_t. This is the only type that
container_t can write to, so if a container breaks out it would be blocked from writing any other types, like
Type enforcement is not enough in a container system, since we want to run more than one container on the system at a time. We could create a new process type for each container, but then we would need to create new file types as well, and quickly the system would become unmanageable. As was stated above, all container processes in every container run as
container_t, so we need a mechanism to prevent one container from attacking another.
We use MCS labels to achieve this. Basically the container engine (Podman, Docker, Buildah, CRI-O …) picks a unique random MCS Label with two categories. The engine then sets up the labels for the container image to match the MCS label and the
container_file_t type. Once the image is mounted, the container engine launches the container process with the MCS label that matches. Each container gets launched with a different MCS label, and the kernel prevents container processes with one MCS label from interacting with the container processes and container files labeled with a different MCS Label.
Bottom line, in an MCS system the categories mean nothing, other than to provide uniqueness in the container and are used to guarantee a hacked container can not attack other containers on the system.
In MLS, the sensitivity and Categories mean something. MLS was designed to control information flow between different processes on the system. It controls whether a process could raise or lower the sensitivity level of data, either through writing it to the file system or communicating over sockets. MLS also could be used to control information flow on different network cards.
On container systems, we prevent all information flow between different container processes on the system. Container processes can only communicate with other processes in different containers over the network.
Container systems use virtual private networks to wire containers together. This means that workloads that work at different security levels and/or different categories cannot communicate with other containers via the file system, and they can only communicate over the network with containers that they are wired together with. This makes the MLS features much less valuable.
I would argue that a container system based on MLS would be less secure than one based on MCS, since you would tend to run multiple different containers on the same system with the same MLS Label, as they are handling similar data. But if a container running with a label of MLS1 on a system was hacked, then this container process would not only be allowed access to the content in the container (from an SELinux point of view), but also would be allowed access to all other containers running with the same MLS label on the system through the file system as well as the network. On an MCS system, the hacked container running as MCS1 would not be allowed to attack other containers on the system, other than through the network.
Bottom line in containers we are taking advantage of other kernel features like:
Network Namespaces and VPNs
The kernel controls which containers can talk to other containers over the network
The kernel only shows the process inside of the container to other processes in the container. Container processes in one container can not see processes in other containers.
Each container only sees the content it needs and at its sensitivity level. Host data and other container data is hidden from the processes inside of the container.
User Namespace, to lock down even processes that require root.
Dropped capabilities to limit the power of root.
Seccomp syscall filtering to limit the syscalls available to processes inside of the container.
SELinux to control access to the file system and other labeled parts of the OS.
In the second part of the series, we learned that if you are going to run containers on a system with different sensitivity levels on the same system, that you should use MCS separation to guarantee isolation rather than MLS. Our next post will focus on creating a more secure pipeline via containers.
About the authors
Daniel Walsh has worked in the computer security field for over 30 years. Dan is a Senior Distinguished Engineer at Red Hat. He joined Red Hat in August 2001. Dan leads the Red Hat Container Engineering team since August 2013, but has been working on container technology for several years.
Lukas Vrabec is a Senior Software engineer & SELinux technology evangelist at Red Hat. He is part of Security Controls team working on SELinux projects focusing especially on security policies. Lukas is author of udica, the tool for generating custom SELinux profiles for containers and currently maintains the selinux-policy packages for Fedora and Red Hat Enterprise Linux distributions.
Simon Sekidde is a Solution Architect for the North America Red Hat Public Sector team specializing in the application of open source enterprise technologies for the Federal Department of Defense (DoD) customers.
Ben Bennett is a Senior Principal Software Engineer and is the group lead for the SDN, Routing, DNS, and Storage components of Red Hat OpenShift. He has more than 25 years of experience working with networking, distributed systems, and Linux.