The Application Apartment Complex: Red Hat Enterprise Linux & Linux Containers

31 mars 2014Bhavna Sarathy6 minutes (temps de lecture)

The advent of any new technology tends to generate a lot of excitement. Over the course of my career, however, I have never experienced “a buzz” like what we are seeing around Linux containers and application packaging and isolation, containerized applications built in the Docker format. From my perspective, the ways in which containers may influence our ever evolving technological ecosystem are, quite possibly, limitless...okay, limitless may be strong, and while “game changing technology” may sound cliche, it’s not far from the truth in this case.

Let’s dive into world of containers. Red Hat customers often desire for their applications to run in a secure environment and have been seeking, whether they actually know it or not, a fully supported light weight application isolation solution. At the other end of the spectrum, ISVs want to develop software applications that are easy to deploy, update, and scale. They may also want to more precisely control certain runtime elements so as to reduce the risk of application failure, and desire a separation of host OS and runtime images. And let’s not forget that developers stand to benefit from a world where they can build and package their applications into a small, portable runtime image; just think of the possibilities! Red Hat is paying close attention to these needs and is working to a) help Red Hat Enterprise Linux customers take full advantage of this application evolution and to b) build a strong foundation for the rest of the Red Hat product portfolio.

Red Hat Enterprise Linux 7 Beta enables operating system level application isolation and fully supports the core Linux Container capabilities. So, what are these core container capabilities? We believe there are four key elements that an operating system should implement. They are:

Resource Management
Process Isolation
Security
Tooling/CLI

An analogy to describe the key elements would be an apartment complex, as these key container elements are not dissimilar from the set of elements required when creating / supporting / managing an apartment complex. For example, in consideration of resource management, each apartment will require hot water and electricity and these resources should be distributed fairly. With respect to isolation, the apartment complex constructs walls to keep people and pets separate from their respective neighbors. While oft taken for granted, each apartment also has a door, lock, and keys for security. Finally, most apartment complexes benefit from a manager who works to ensure a consistent and smooth steady state of operations.

In the context of Linux containers, resource management is provided by control groups (cgroups), process isolation is provided by kernel namespaces, security is provided by SELinux, and (overall) management by Docker CLI.

Let’s explore each element in more detail... first up: resource management as provided by cgroups.

In a nutshell, cgroups allow a user to allocate resources such as CPU time, system memory, network bandwidth, block IO or any combination of these resources to a set of user-defined task groups or processes running on a given system. Users can then monitor any cgroups they configure, deny cgroups access to certain resources, and even dynamically reconfigure cgroups on a running system. By using cgroups, system administrators gain fine-grained control over allocating, prioritizing, denying, managing, and monitoring system resources. Hardware resources can be smartly divided among tasks and users, often increasing overall system efficiency. For those who may not know, cgroups are not a new concept and the use of cgroups dates back to Red Hat Enterprise Linux 6. Red Hat Enterprise Linux 7 Beta has improved management capabilities of cgroups through systemd, which is a system and service manager.

Process isolation, the heart of the Linux container architecture, is provided by kernel namespaces within Red Hat Enterprise Linux. Currently, Linux implements six different types of namespaces, with the purpose each being to wrap a particular global system resource in an abstraction. This makes each specific resource appear as an isolated instance to the processes within the namespace, enabling isolation by creating the illusion that this group of processes are alone on the system. But why do we have to implement namespaces? For the simple reason that the Linux kernel is not container aware, as it is a user space concept, making it important to “teach” the kernel how to work with the notion of an isolated environment, using namespaces.

Red Hat Enterprise Linux 7 Beta implements the following namespaces:

PID namespaces provide isolation in the process ID namespace, allowing processes in different PID namespaces to have the same PID. One of the main benefits of PID namespaces is that containers can be migrated between hosts while keeping the same process IDs for the processes inside the container. PID namespace also allows each container to have its own init process that manages various system initialization tasks, and container lifecycle management.
Network namespaces provide isolation of network controllers, system resources associated with networking, firewall and routing tables. Network namespace allows each container to have its own virtual network stack that is associated with a process group. Each namespace has its own loopback device and process space. Virtual or real devices can be added to each network namespace, and IP addresses can be assigned to these devices and be used as a network node.
UTS namespaces isolate two system identifiers, nodename and domainname, returned by the uname() system call. The UTS namespace feature allows each container to have its own hostname and NIS domain name. This is useful for initialization and configuration scripts that tailor their actions based on these names.
Mount namespaces isolate the set of filesystem mount points seen by a group of processes, and facilitate the creation of different read-only filesystems. Processes in different mount namespaces can have different views of the filesystem hierarchy. With the addition of mount namespaces, the mount() and umount() system calls cease to operate on a global set of mount points (visible to all processes) on the system and instead perform operations that affect just the mount namespace associated with the container process.
IPC namespaces isolate certain interprocess communication (IPC) resources, such as System V IPC objects and POSIX message queues. Each IPC namespace has its own set of System V IPC identifiers and its own POSIX message queue filesystem.
User namespaces isolate the user and group ID number spaces, such that a process's user and group IDs can be different inside and outside a user namespace. The most interesting case here is that a process can have a normal unprivileged user ID outside a user namespace while at the same time having a user ID of 0 inside the namespace. This means that the process has full root privileges for operations inside the user namespace, but is unprivileged for operations outside the namespace. While very promising, there are still a few kinks to work out before it meets our standards for enabling it in Red Hat Enterprise Linux 7 Beta. Rest assured, we are working on stabilizing the code base and solving the issues so we can consider turning it on in the future when it becomes enterprise ready.

For security, SELinux fulfills this role, which, as with cgroups, is most certainly not a new concept, as it has been central to the Red Hat Enterprise Linux security strategy since its introduction in Red Hat Enterprise Linux 4. SELinux applies security labels and policies to Linux containers and their resources, providing an additional layer of security above and beyond the isolation provided by kernel namespaces.

So where does Docker come into the picture? Well, we are working closely with Docker, Inc in the open source way; our collaboration started with the Docker 0.7 release, with key contributions from the Red Hat team towards a new storage driver (device mapper thin provisioning) that allowed Docker to run on Red Hat Enterprise Linux. Red Hat is doing a lot of heavy lifting on extending capabilities in the Docker upstream project, accelerating the pace of development, bringing in new features at a faster clip. One of the hallmarks of Docker 0.9, is a new built-in execution driver based on libcontainer, developed to access the kernel’s container APIs directly, without any other tooling dependencies. This native toolkit can manipulate core system capabilities such as cgroups, namespaces, network interfaces, firewall and other kernel features. Our collaboration continues to make Docker containers enterprise ready, and bring image-based life cycle management capabilities to Red Hat Enterprise Linux.

In summary, Linux Containers have emerged as a key open source application packaging and delivery technology, combining lightweight application isolation with the flexibility of image-based deployment methods. The Linux container apartment complex in Red Hat Enterprise Linux 7 Beta provides its tenants with their own secure and isolated home, oblivious of its neighbors. Red Hat Enterprise Linux 7 Beta provides a certified, stable software stack, optimized for a specific application, with underlying capabilities to package an application inside a Docker image.

So what do you think of the Linux container core capabilities I have outlined here? Is the Linux container topic in Red Hat Enterprise Linux 7 Beta relevant to your own day-to-day operations? I look forward to reading your feedback, comments and questions.