In our last post, we discussed SELinux and how it can be used to improve container security. We also looked at the Multi-Level Security (MLS) and Multi-Category Security (MCS) models. In this post, we'll compare those models and explain why we believe MCS to be a better approach to container security.
We often describe SELinux policy for containers as "what happens in Vegas stays in Vegas." What we mean by this is we use SELinux to keep the processes inside of the container file system. If somehow they break out of container confinement, SELinux can prevent them from reading and writing other content on the hosts file systems.
SELinux has been proven to block container breakouts based on file system attacks. The goal of MLS is similar in that it allows the processes running in the same sensitivity level to read/write all of the content at the same level.
Container environments use Type enforcement to protect the host file system from the container processes. Almost every container runs with an SELinux process type of
container_t. All of the container content inside of the container is labeled with the SELinux file type of
container_file_t. The SELinux rules state that
container_t can read/write/execute all content labeled
container_file_t. This is the only type that
container_t can write to, so if a container breaks out it would be blocked from writing any other types, like
Type enforcement is not enough in a container system, since we want to run more than one container on the system at a time. We could create a new process type for each container, but then we would need to create new file types as well, and quickly the system would become unmanageable. As was stated above, all container processes in every container run as
container_t, so we need a mechanism to prevent one container from attacking another.
We use MCS labels to achieve this. Basically the container engine (Podman, Docker, Buildah, CRI-O …) picks a unique random MCS Label with two categories. The engine then sets up the labels for the container image to match the MCS label and the
container_file_t type. Once the image is mounted, the container engine launches the container process with the MCS label that matches. Each container gets launched with a different MCS label, and the kernel prevents container processes with one MCS label from interacting with the container processes and container files labeled with a different MCS Label.
Bottom line, in an MCS system the categories mean nothing, other than to provide uniqueness in the container and are used to guarantee a hacked container can not attack other containers on the system.
In MLS, the sensitivity and Categories mean something. MLS was designed to control information flow between different processes on the system. It controls whether a process could raise or lower the sensitivity level of data, either through writing it to the file system or communicating over sockets. MLS also could be used to control information flow on different network cards.
On container systems, we prevent all information flow between different container processes on the system. Container processes can only communicate with other processes in different containers over the network.
Container systems use virtual private networks to wire containers together. This means that workloads that work at different security levels and/or different categories cannot communicate with other containers via the file system, and they can only communicate over the network with containers that they are wired together with. This makes the MLS features much less valuable.
I would argue that a container system based on MLS would be less secure than one based on MCS, since you would tend to run multiple different containers on the same system with the same MLS Label, as they are handling similar data. But if a container running with a label of MLS1 on a system was hacked, then this container process would not only be allowed access to the content in the container (from an SELinux point of view), but also would be allowed access to all other containers running with the same MLS label on the system through the file system as well as the network. On an MCS system, the hacked container running as MCS1 would not be allowed to attack other containers on the system, other than through the network.
Bottom line in containers we are taking advantage of other kernel features like:
Network Namespaces and VPNs
User Namespace, to lock down even processes that require root.
Dropped capabilities to limit the power of root.
Seccomp syscall filtering to limit the syscalls available to processes inside of the container.
SELinux to control access to the file system and other labeled parts of the OS.
In the second part of the series, we learned that if you are going to run containers on a system with different sensitivity levels on the same system, that you should use MCS separation to guarantee isolation rather than MLS. Our next post will focus on creating a more secure pipeline via containers.