How containers are stored on disk is often a mystery to users working with the containers. In this post, we’re going to look at how containers images are stored and some of the tools that you can use to work with those images directly -Podman, Skopeo, and Buildah.
Evolution of Container Image Storage
When I first started working with containers, one of the things I did not like about Docker’s architecture was that the daemon hid the information about the image store within itself. The only realistic way someone could use the images was through the daemon. We were working on the
atomic tool and wanted a way to mount the container images so that we could scan them. After all a container image was just a mount point under devicemapper or overlay.
The container runtime team at Red Hat created the
atomic mountcommand to mount images under Docker and this was used within
atomic scan. The issue here was that the daemon did not know about this so if someone attempted to remove the image while we mounted it, the daemon would get confused. The locking and manipulation had to be done within the daemon.
When we began to create new container engines, the first thing we required was to build a new library containers/storage, which did not require a daemon to control it. We wanted to allow multiple tools to use the storage at the same time, without needing to know about each other.
We would use file system locking to control access to storage data. The first step was to separate out the containers/storage under the Docker Project, called the graphdriver. These storage drivers implement different copy on write (COW) storage drivers including overlay, devicemapper, btrfs, xfs, vfs, aufs … If you want to use the library in a go project, then you just implement a store.
Note that the container/storage library is not related to Container Storage Interface (CSI). Container/storage is about storing container images on COW file systems, while CSI is for the volumes that containers write to. For example you might use a CSI to store the database that is used by the MariaDB container image, which is stored in container/storage. I hope this clears up any confusion.
Container storage configuration is defined in the storage.conf file. For containers engines that run as root, the storage.conf file is stored in
/etc/containers/storage.conf. If you are running rootless with a tool like Podman, then the storage.conf file is stored in
Now let’s look at
$ cat /etc/containers/storage.conf
# This file is is the configuration file for all tools
# that use the containers/storage library.
# See man 5 containers-storage.conf for more information
# The "container storage" table contains all of the server options.
# Default Storage Driver
driver = "overlay"
The driver field is critical. In container/storage we default to the overlay driver. In Docker world there are two overlay drivers, overlay and overlay2, today most users use the overlay2 driver, so we just use that one, and called it overlay. If you accidentally use overlay2 in the config containers storage is smart enough to alias it to overlay.
# Temporary storage location
runroot = "/var/run/containers/storage"
# Primary Read/Write location of container storage
graphroot = "/var/lib/containers/storage"
The graph root, defines the location where the actual images will be stored. We recommend that you set up a lot of space on this location, since people tend to store lots of images over time. No special tools are required to set up storage, you should set up storage in any manner that best fits your needs using standard Linux commands, but we recommend that you mount a large device on /var/lib/containers.
# Storage options to be passed to underlying storage drivers
There are a lot of per graphdriver storage options. Some of these allow you to do interesting things with containers storage, I will talk about some of them below.
# AdditionalImageStores is used to pass paths to additional Read/Only image stores
# Must be comma separated list.
additionalimagestores = [
Additional image stores is a cool feature that allows you to set up additional read only stores of images. For example you could set up an NFS share with many overlay container images and share them with all of your container engines via NFS. Then rather than requiring each node running a container engine to pull down huge images, they could use the image on the NFS store, and start the container.
# Size is used to set a maximum size of the container image. Only supported by
# certain container storage drivers.
size = ""
Size controls the size of a container image, if you are running a system where lots of users are going to be pulling images, you might want to set a quota to make sure that no user is able to pull in huge images. OpenShift.com for example uses this feature to control its users, especially when using OpenShift Online.
# Path to an helper program to use for mounting the file system instead of mounting it
# mount_program = "/usr/bin/fuse-overlayfs"
# mountopt specifies comma separated list of extra mount options
mountopt = "nodev"
This flag allows you to pass special mount options into the drivers. For example setting the
nodev field prevents users from using device nodes that show up in a container image. Container engines provide devices on a tmpfs mounted at
/dev, so there is no reason to have devices imbedded in images, especially when they could be used to circumvent security.
# Remap-UIDs/GIDs is the mapping from UIDs/GIDs as they should appear inside of
# a container, to UIDs/GIDs as they should appear outside of the container, and
# the length of the range of UIDs/GIDs. Additional mapped sets can be listed
# and will be heeded by libraries, but there are limits to the number of
# mappings which the kernel will allow when you later attempt to run a
# remap-uids = 0:1668442479:65536
# remap-gids = 0:1668442479:65536
The remap uids and gids flags tells container/storage to store images in a remapped format, for user within the specified user namespace. If you set a
0:100000:65536, this tells containers storage when storing images to remap files owned by
100,000, UID=1 to
100,0002 and so on, up to uid
65536. Now if a container engine runs a container within the mapping, it will run more securely using the uids associated with the user rather than root.
# Remap-User/Group is a name which can be used to look up one or more UID/GID
# ranges in the /etc/subuid or /etc/subgid file. Mappings are set up starting
# with an in-container ID of 0 and the a host-level ID taken from the lowest
# range that matches the specified name, and using the length of that range.
# Additional ranges are then assigned, using the ranges which specify the
# lowest host-level IDs first, to the lowest not-yet-mapped container-level ID,
# until all of the entries have been used for maps.
# remap-user = "storage"
# remap-group = "storage"
# Storage Options for thinpool
The rest of the options are used for the creation of thinpool with drivers like devicemapper, along with a few other options. You can refer to the
/etc/containers/storage.conf file on disk for a description as well as the storage.conf(5) man page for further information.
Using container storage
Container engines and tools like Podman, Buildah, CRI-O, Skopeo share container storage at the same time. They can all see each others images and can be used in conjunction with each other or totally separately, based on file locks. This means podman can mount and examine a container. While they share the actual storage, they do not necessarily share container information. Some tools have different use cases of containers, and will not display other tools’ containers. For example buildah creates build containers just for the process of building container images, since these do not require all of the content of a podman container, it has a separate database. Bot tools can remove each others container images, but they treat them separately.
# podman create -ti --name fedora-ctr fedora sh
sh-4.4# podman mount fedora-ctr
# ls /var/lib/containers/storage/overlay/b16991596db22b90b78ef10276e2ae73a1c2ca9605014cad95aac00bff6608bc/merged
bin boot dev etc home lib lib64 lost+found media mnt opt proc root run sbin srv sys tmp usr var
While buildah and CRI-O can use the same Fedora image
buildah from fedora
You can even use Skopeo to preload the container/storage at boot time, to have the container/images ready for use by any of the container tool technologies. And remember there is no container daemon controlling this access, just standard file system tools.
In the next post, I will show you more things you can do with container/storage.