This article looks at the mount namespace and is the third in the Linux Namespace series. In the first article, I gave an introduction to the seven most commonly used namespaces, laying the groundwork for the hands-on work started in the user namespaces article. My goal is to build out some fundamental knowledge as to how the underpinnings of Linux containers work. If you're interested in how Linux controls the resources on a system, check out the CGroup series, I wrote earlier. Hopefully, by the time you're done with the namespaces hands-on work, I can tie CGroups and namespaces together in a meaningful way, completing the picture for you.
For now, however, this article examines the mount namespace and how it can help you get closer to understanding the isolation that Linux containers brings to sysadmins and, by extension, platforms like OpenShift and Kubernetes.
[ You might also like: Sharing supplemental groups with Podman containers ]
The mount namespace
The mount namespace doesn't behave as you might expect after creating a new user namespace. By default, if you were to create a new mount namespace with unshare -m
, your view of the system would remain largely unchanged and unconfined. That's because whenever you create a new mount namespace, a copy of the mount points from the parent namespace is created in the new mount namespace. That means that any action taken on files inside a poorly configured mount namespace will impact the host.
Some setup steps for mount namespaces
So what use is the mount namespace then? To help demonstrate this, I use an Alpine Linux tarball.
In summary, download it, untar it, and move it into a new directory, giving the top-level directory permissions for an unprivileged user:
[root@localhost ~] export CONTAINER_ROOT_FOLDER=/container_practice
[root@localhost ~] mkdir -p ${CONTAINER_ROOT_FOLDER}/fakeroot
[root@localhost ~] cd ${CONTAINER_ROOT_FOLDER}
[root@localhost ~] wget https://dl-cdn.alpinelinux.org/alpine/v3.13/releases/x86_64/alpine-minirootfs-3.13.1-x86_64.tar.gz
[root@localhost ~] tar xvf alpine-minirootfs-3.13.1-x86_64.tar.gz -C fakeroot
[root@localhost ~] chown container-user. -R ${CONTAINER_ROOT_FOLDER}/fakeroot
The fakeroot
directory needs to be owned by the user container-user because once you create a new user namespace, the root user in the new namespace will be mapped to the container-user outside of the namespace. This means that a process inside of the new namespace will think that it has the capabilities required to modify its files. Still, the host's file system permissions will prevent the container-user account from changing the Alpine files from the tarball (which have root as the owner).
So what happens if you simply start a new mount namespace?
PS1='\u@new-mnt$ ' unshare -Umr
Now that you're inside the new namespace, you might not expect to see any of the original mount points from the host. However, this isn't the case:
root@new-mnt$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/cs-root 36G 5.2G 31G 15% /
tmpfs 737M 0 737M 0% /sys/fs/cgroup
devtmpfs 720M 0 720M 0% /dev
tmpfs 737M 0 737M 0% /dev/shm
tmpfs 737M 8.6M 728M 2% /run
tmpfs 148M 0 148M 0% /run/user/0
/dev/vda1 976M 197M 713M 22% /boot
root@new-mnt$ ls /
bin container_practice etc lib media opt root sbin sys usr
boot dev home lib64 mnt proc run srv tmp var
The reason for this is that systemd
defaults to recursively sharing the mount points with all new namespaces. If you mounted a tmpfs
filesystem somewhere, for example, /mnt
inside the new mount namespace, can the host see it?
root@new-mnt$ mount -t tmpfs tmpfs /mnt
root@new-mnt$ findmnt |grep mnt
└─/mnt tmpfs tmpfs rw,relatime,seclabel,uid=1000,gid=1000
The host, however, doesn't see this:
[root@localhost ~]# findmnt |grep mnt
So at the very least, you know that the mount namespace is functioning correctly. This is a good time to take a small detour to discuss the propagation of mount points. I'm briefly summarizing but if you are interested in a greater understanding, have a look at Michael Kerrisk's LWN article as well as the man page for the mount namespace. I don't normally rely so much on the man pages as I often find that they're not easily digestible. However, in this case, they are full of examples and in (mostly) plain English.
Theory of mountpoints
Mounts propagate by default because of a feature in the kernel called the shared subtree. This allows every mount point to have its own propagation type associated with it. This metadata determines whether new mounts under a given path are propagated to other mount points. The example given in the man page is that of an optical disk. If your optical disk automatically mounted under /cdrom
, the contents would only be visible in other namespaces if the appropriate propagation type is set.
Peer groups and mount states
The kernel documentation says that a "peer group is defined as a group of vfsmounts that propagate events to each other." Events are things such as mounting a network share or unmounting an optical device. Why is this important, you ask? Well, when it comes to the mount namespace, peer groups are often the deciding factor as to whether or not a mount is visible and can be interacted with. A mount state determines whether a member in a peer group can receive the event. According to the same kernel documentation, there are five mount states:
- shared - A mount that belongs to a peer group. Any changes that occur will propagate through all members of the peer group.
- slave - One-way propagation. The master mount point will propagate events to a slave, but the master will not see any actions the slave takes.
- shared and slave - Indicates that the mount point has a master, but it also has its own peer group. The master will not be notified of changes to a mount point, but any peer group members downstream will.
- private - Does not receive or forward any propagation events.
- unbindable - Does not receive or forward any propagation events and cannot be bind mounted.
It's important to note that the mount point state is per mount point. This means that if you have /
and /boot
, for example, you'd have to separately apply the desired state to each mount point.
In case you're wondering about containers, most container engines use private mount states when mounting a volume inside a container. Don't worry too much about this for now. I just want to provide some context. If you want to try some specific mounting scenarios, look at the man pages as the examples are quite good.
Creating our mount namespace
If you're using a programming language like Go or C, you could use the raw system kernel calls to create the appropriate environment for your new namespace(s). However, since the intent behind this is to help you understand how to interact with a container that already exists, you'll have to do some bash trickery to get your new mount namespace into the desired state.
First, create the new mount namespace as a regular user:
unshare -Urm
Once you're inside the namespace, look at the findmnt
of the mapper device, which contains the root file system (for brevity, I removed most of the mount options from the output):
findmnt |grep mapper
/ /dev/mapper/cs-root xfs rw,relatime,[...]
There is only one mount point that has the root device mapper. This is important because one of the things you have to do is bind the mapper device into the Alpine directory:
export CONTAINER_ROOT_FOLDER=/container_practice
mount --bind ${CONTAINER_ROOT_FOLDER}/fakeroot ${CONTAINER_ROOT_FOLDER}/fakeroot
cd ${CONTAINER_ROOT_FOLDER}/fakeroot
This is because you're using a utility called pivot_root
to perform a chroot
-like action. pivot_root
takes two arguments: new_root
and old_root
(sometimes referred to as put_old
). pivot_root
moves the root file system of the current process to the directory put_old
and makes new_root
the new root file system.
IMPORTANT: A note about chroot
. chroot
is often thought of as having extra security benefits. To some extent, this is true, as it takes a more significant amount of expertise to break free of it. A carefully constructed chroot
can be very secure. However, chroot
does not modify or restrict Linux capabilities which I touched on in the previous namespace article. Nor does it limit system calls to the kernel. This means that a sufficiently skilled aggressor could potentially escape a chroot
that has not been well thought through. The mount and user namespaces help to solve this problem.
If you use pivot_root
without the bind mount, the command responds with:
pivot_root: failed to change root from `.' to `old_root/': Invalid argument
To switch to the Alpine root filesystem, first, make a directory for old_root
and then pivot into the intended (Alpine) root filesystem. Since the Alpine Linux root filesystem doesn't have symlinks for /bin
and /sbin
, you'll have to add those to your path and then finally, unmount the old_root
:
mkdir old_root
pivot_root . old_root
PATH=/bin:/sbin:$PATH
umount -l /old_root
You now have a nice environment where the user and mount namespaces work together to provide a layer of isolation from the host. You no longer have access to binaries on the host. Try issuing the findmnt
command that you used before:
root@new-mnt$ findmnt
-bash: findmnt: command not found
You can also look at the root filesystem or attempt to see what's mounted:
root@new-mnt$ ls -l /
total 12
drwxr-xr-x 2 root root 4096 Jan 28 21:51 bin
drwxr-xr-x 2 root root 18 Feb 17 22:53 dev
drwxr-xr-x 15 root root 4096 Jan 28 21:51 etc
drwxr-xr-x 2 root root 6 Jan 28 21:51 home
drwxr-xr-x 7 root root 247 Jan 28 21:51 lib
drwxr-xr-x 5 root root 44 Jan 28 21:51 media
drwxr-xr-x 2 root root 6 Jan 28 21:51 mnt
drwxrwxr-x 2 root root 6 Feb 17 23:09 old_root
drwxr-xr-x 2 root root 6 Jan 28 21:51 opt
drwxr-xr-x 2 root root 6 Jan 28 21:51 proc
drwxr-xr-x 2 root root 6 Feb 17 22:53 put_old
drwx------ 2 root root 27 Feb 17 22:53 root
drwxr-xr-x 2 root root 6 Jan 28 21:51 run
drwxr-xr-x 2 root root 4096 Jan 28 21:51 sbin
drwxr-xr-x 2 root root 6 Jan 28 21:51 srv
drwxr-xr-x 2 root root 6 Jan 28 21:51 sys
drwxrwxrwt 2 root root 6 Feb 19 16:38 tmp
drwxr-xr-x 7 root root 66 Jan 28 21:51 usr
drwxr-xr-x 12 root root 137 Jan 28 21:51 var
root@new-mnt$ mount
mount: no /proc/mounts
Interestingly, there is no proc
filesystem mounted by default. Try to mount it:
root@new-mnt$ mount -t proc proc /proc
mount: permission denied (are you root?)
root@new-mnt$ whoami
root
Because proc
is a special type of mount related to the PID namespace you can't mount it even though you're in your own mount namespace. This goes back to the capability inheritance that I discussed earlier. I'll pick up this discussion in the next article when I cover the PID namespace. However, as a reminder about inheritance, have a look at the diagram below:
In the next article, I'll rehash this diagram, but if you've followed along since the beginning, you should be able to make some inferences before then.
[ The API owner's manual: 7 best practices of effective API programs ]
Wrapping up
In this article, I covered some deeper theory around the mount namespace. I discussed peer groups and how they relate to the mount states that are applied to each mount point on a system. For the hands-on part, you downloaded a minimal Alpine Linux file system and then walked through how to use the user and mount namespaces to create an environment that looks a lot like chroot
except potentially more secure.
For now, test mounting file systems inside and outside of your new namespace. Try creating new mount points that use the shared, private, and slave mount states. In the next article, I'll use the PID namespace to continue building out the primitive container to gain access to the proc
file system and process isolation.
About the author
Steve is a dedicated IT professional and Linux advocate. Prior to joining Red Hat, he spent several years in financial, automotive, and movie industries. Steve currently works for Red Hat as an OpenShift consultant and has certifications ranging from the RHCA (in DevOps), to Ansible, to Containerized Applications and more. He spends a lot of time discussing technology and writing tutorials on various technical subjects with friends, family, and anyone who is interested in listening.
More like this
Browse by channel
Automation
The latest on IT automation for tech, teams, and environments
Artificial intelligence
Updates on the platforms that free customers to run AI workloads anywhere
Open hybrid cloud
Explore how we build a more flexible future with hybrid cloud
Security
The latest on how we reduce risks across environments and technologies
Edge computing
Updates on the platforms that simplify operations at the edge
Infrastructure
The latest on the world’s leading enterprise Linux platform
Applications
Inside our solutions to the toughest application challenges
Original shows
Entertaining stories from the makers and leaders in enterprise tech
Products
- Red Hat Enterprise Linux
- Red Hat OpenShift
- Red Hat Ansible Automation Platform
- Cloud services
- See all products
Tools
- Training and certification
- My account
- Customer support
- Developer resources
- Find a partner
- Red Hat Ecosystem Catalog
- Red Hat value calculator
- Documentation
Try, buy, & sell
Communicate
About Red Hat
We’re the world’s leading provider of enterprise open source solutions—including Linux, cloud, container, and Kubernetes. We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.
Select a language
Red Hat legal and privacy links
- About Red Hat
- Jobs
- Events
- Locations
- Contact Red Hat
- Red Hat Blog
- Diversity, equity, and inclusion
- Cool Stuff Store
- Red Hat Summit