Introduction
Container technology has radically changed the way that applications are being developed and deployed. Notably, containers dramatically ease dependency management, so shipping new features or code is faster than ever before. While Docker containers and Kubernetes are great for DevOps, they also present new security challenges that both security practitioners and developers must understand and address with diligence. Docker’s team of security experts has built some valuable security features into the Docker platform over the last several years. This blog post offers up valuable tips and practical suggestions on how containers and hosts can be hardened as a first step toward a more secure container environment. You should also check out our blog post on container security best practices you should be following.
A brief look at containers from a security perspective
In essence, Docker containers are a wrapper around Linux control groups (cgroups
) and namespaces. Cgroups are used in the Linux kernel for monitoring and restricting resources among a group of processes. Namespaces determine what a process can see. For example, the PID
namespace restricts which processes can be seen within a container.
Each container running on a host shares a common underlying kernel. Containers are isolated from one another, which – from a security standpoint – is advantageous. However, if the host OS is compromised, all containers running on it are at risk. Similarly, if a container is using a vulnerable library, it could be exploited to gain access to the underlying host.
Securing the host OS
The underlying host OS needs to be secured in order to prevent container breaches from affecting the host. For this, Linux provides several out-of-the-box security modules. Some of the popular ones are SELinux, AppArmor and seccomp. One can also develop custom security modules using Linux Security Modules (LSMs
).
SELinux
SELinux is a type of Mandatory Access Control (MAC
) security module based on type enforcement. Type enforcement revolves around defining a type and assigning privileges to those types. Here’s a simple example of using an SELinux policy:
policy_module(localpolicy, 1.0)
gen_require('
type user_t;
type var_log_t;
)
allow user_t var_log_t:dir { getattr search open read };
This allows any Linux process to execute file operations (such as getattr
, search
, open
, read
) for which user_t
and var_log_t
are applied.
Even though SELinux is comprehensive, you can see the policy language is pretty hard to get right. Often users will end up creating different SELinux policies over time to address different scenarios.
AppArmor
AppArmor is another MAC solution. It is based on file system paths rather than defining types. Users can specify a file path to a binary and the permissions they have. Here’s a simple example of confining nginx:
#include <tunables/global>
/usr/sbin/nginx {
#include <abstractions/apache2-common>
#include <abstractions/base>
#include <abstractions/nis>
capability dac_override,
capability dac_read_search,
capability net_bind_service,
capability setgid,
capability setuid,
/data/www/safe/* r,
deny /data/www/unsafe/* r,
/etc/group r,
/etc/nginx/conf.d/ r,
/etc/nginx/mime.types r,
/etc/nginx/nginx.conf r,
/etc/nsswitch.conf r,
/etc/passwd r,
/etc/ssl/openssl.cnf r,
/run/nginx.pid rw,
/usr/sbin/nginx mr,
/var/log/nginx/access.log w,
/var/log/nginx/error.log w,
}
This AppArmor profile blocks reading from unsafe directories. It may not be suitable for all situations; users may have to customize it. AppArmor is good for restricting application access. However, it requires a learning curve to be able to write good enforcement profiles.
Seccomp
Seccomp
(short for ‘Secure Computing’) is another security module included in many Linux distributions that allows users to restrict system calls. Users can specify custom actions to be taken when a certain system call is executed. The actions are allow
, kill
, err
, and trap
. Seccomp can be used to sandbox applications that handle untrusted user inputs to a subset of system calls.
The first step in using seccomp is to determine all the system calls an application makes when it runs. This can be a difficult and error-prone exercise that should be conducted when the application is written. Users can use tools like audit
to profile all the system calls that it makes by exercising it in different ways.
Seccomp policies are defined using JSON files. A sample seccomp policy looks like this,
{
"defaultAction": "SCMP_ACT_ALLOW",
"syscalls": [
{
"name": "mkdir",
"action": "SCMP_ACT_ERRNO"
},
{
"name": "chown",
"action": "SCMP_ACT_ERRNO"
}
]
}
This policy causes an error to be returned when mkdir
or chown
are executed.
The drawback with seccomp is that the profile has to be applied during the launch of the application. The granularity of restricting system calls is too narrow and requires extensive working knowledge of Linux to come up with good profiles.
Capabilities
Linux capabilities are groups of permissions that can be given to child processes. Child processes cannot acquire newer capabilities. The idea behind capabilities is that no process should have all privileges, but instead, have only enough privileges to perform their intended service. By bootstrapping processes with limited privileges, we can “contain” the damage that can occur if they are ever compromised.
Some capabilities give excessive privileges to processes such as:
- SYS_ADMIN: This gives processes many privileges, some of which are given only to the root user.
- SETUID: Many of the Linux distributions ship binaries that run with the
setuid
bit set to give root privileges by default. Instead ofsetuid
bit, they can be replaced with capabilities to provide more granular privileges.
These options work well on the host OS, but it can be challenging to adapt them for containers. Let’s look at some of the relevant attack surfaces and practical ways to secure the Docker container runtime.
Container Runtime Security Practices
There are various factors to consider when adopting Docker containers for production. When it comes to running Docker container securely, users can follow these recommendations.
Unix socket (/var/run/docker.sock)
By default, the Docker client communicates with the Docker daemon using the unix socket. This socket can also be mounted by any other container unless proper permissions are in place. Once mounted, it is very easy to spin up any container, create new images, or shut down existing containers.
Solution: Set up appropriate SELinux/AppArmor profiles to limit containers mounting this socket.
Volume mounts
Docker allows mounting to sensitive host directories. Also, the contents of the host file system can be changed directly from the container. For application containers with direct Internet exposure, it is important to be extra careful when mounting sensitive host directories (/etc/, /usr/)
. Any breach can lead to damaging data loss.
Solution: Mount host-sensitive directories as read-only.
Privileged containers
Privileged containers can do almost anything a host can do. It runs with all capabilities.
Solution: Use capabilities to grant fine-grained privileges instead.
SSH within container
Running ssh service within containers makes managing ssh keys/ access policies difficult. This should be avoided if possible.
Solution:
- Do not run ssh services inside a container.
- Instead, run ssh on the host and use
docker exec
ordocker attach
to interact with the container.
Binding privileged ports
By default, Docker allows binding privileged ports (<1024) to a container. Normal users cannot access these ports. In many cases, mapping http port 80 and https port 443 is necessary for running servers in a container.
Solution:
- List all containers and their port mappings using the code below to ensure that the container’s ports are not mapped to host ports below port 1024.
docker ps --quiet | xargs docker inspect --format '{{ .Id }}: Ports={{ .NetworkSettings.Ports }}'
Exposing ports
Ports not necessary for the service must not be exposed.
Solution: List all the containers and their exposed ports using the following:
docker ps --quiet | xargs docker inspect --format '{{ .Id }}: Ports={{ .NetworkSettings.Ports }}'
- Ensure that there are no unnecessary ports exposed.
Running without default AppArmor/ SELinux or seccomp
Docker runs containers with default AppArmor/SELinux and seccomp profiles. They can be disabled with the --unconfined
option.
Solution: Do not disable the default profiles that Docker supplies.
Sharing host namespaces
Sharing namespaces have dangerous consequences if not managed properly. Containers can be started with -pid
to connect with the host PID namespace or --net
to share its network namespace. These allow containers to see and kill PIDs running on the host or even connect to privileged ports.
Solution: Avoid sharing host namespaces with containers.
Enabling TLS
If the Docker daemon is running on a TCP endpoint, it is advised to run with TLS enabled.
Solution: Docker offers a helpful guide on enabling TLS with Docker.
Do not set mount propagation mode to shared
Mount propagation mode allows mounting volumes in shared, slave or private mode on a container. Do not use shared mount propagation mode until needed.
A shared mount is replicated at all mounts and the changes made at any mount point are propagated to all mounts. Mounting a volume in shared mode does not restrict any other container to mount and make changes to that volume. This might be catastrophic if the mounted volume is sensitive to changes. Do not set mount propagation mode to shared until needed.
Solution: Run the following command to list the propagation mode for mounted volumes:
docker ps --quiet --all | xargs docker inspect --format '{{ .Id }}: Propagation={{range $mnt := .Mounts}} {{json $mnt.Propagation}} {{end}}'
- Ensure that the mode is not set to
shared
unless needed. - Do not start a container with the following invocation:
docker run --volume=/hostPath:/containerPath:shared <image> <command>
Restrict a container from acquiring new privileges
A process can set the no_new_priv
bit in the kernel. It persists across fork
, clone
and execve
. The no_new_priv
bit ensures that the process or its children processes do not gain any additional privileges via setuid or sgid bits.
Solution:
List the security options for all the containers using the following command:
docker ps --quiet --all | xargs docker inspect --format '{{ .Id }}: SecurityOpt={{.HostConfig.SecurityOpt }}'
- The security options should list
no_new_privileges
as one of them. - One can start a container with
no_new_privileges
as below: docker run <run-options> --security-opt=no-new-privileges <image> <cmd>
Conclusion
Naturally, new technologies presents new security challenges to organizations that choose to deploy it. As has been the case with the disruptive infrastructure technologies that preceded containers, the first step toward establishing a stronger security posture is to reduce the overall attack surface by hardening it. While the practices presented above are effective in making containers and hosts far less susceptible to exploits, the major container security challenges lie within the runtime phase. This is where it is critical for enterprise organizations to employ a dedicated container security platform.
References
About the author
Browse by channel
Automation
The latest on IT automation for tech, teams, and environments
Artificial intelligence
Updates on the platforms that free customers to run AI workloads anywhere
Open hybrid cloud
Explore how we build a more flexible future with hybrid cloud
Security
The latest on how we reduce risks across environments and technologies
Edge computing
Updates on the platforms that simplify operations at the edge
Infrastructure
The latest on the world’s leading enterprise Linux platform
Applications
Inside our solutions to the toughest application challenges
Original shows
Entertaining stories from the makers and leaders in enterprise tech
Products
- Red Hat Enterprise Linux
- Red Hat OpenShift
- Red Hat Ansible Automation Platform
- Cloud services
- See all products
Tools
- Training and certification
- My account
- Customer support
- Developer resources
- Find a partner
- Red Hat Ecosystem Catalog
- Red Hat value calculator
- Documentation
Try, buy, & sell
Communicate
About Red Hat
We’re the world’s leading provider of enterprise open source solutions—including Linux, cloud, container, and Kubernetes. We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.
Select a language
Red Hat legal and privacy links
- About Red Hat
- Jobs
- Events
- Locations
- Contact Red Hat
- Red Hat Blog
- Diversity, equity, and inclusion
- Cool Stuff Store
- Red Hat Summit