Recent versions of Podman, Buildah, and CRI-O have started to take advantage of a new kernel feature, volatile overlay mounts. This feature allows you to mount an overlay file system with a flag that tells it not to sync to the disk.

If you need a reminder about the use and benefits of overlay mounts, check out my article from last summer.

What is syncing, and why is it important?

In Linux, when you write to a file or directory, the kernel does not instantly write the data to disk. Instead, it buffers up a bunch of writes and then periodically saves the data to disk to increase performance. This is called a sync. The problem with this is that a process thinks the data was saved when the write completes, but it really isn't until the kernel syncs that data. This means that if you wrote data and the kernel crashed, there is a chance that the data was never saved.

Because of this, lots of file systems sync regularly, and tools can request syncing to happen often. When a sync occurs, the kernel stops processing data with a lock and syncs all of the data to disk. Of course, this causes poorer performance. If you have a process that causes syncs frequently, your job’s performance can really be hurt. Certain tools like RPM call for a sync after every file is written to disk, causing all the dirty pages for that file to be flushed, and it is a considerable overhead.

[ Getting started with containers? Check out this free course. Deploying containerized applications: A technical overview. ]

Containers may not need syncing

In the container world, we have many use cases where we don’t care if the data is saved. If the kernel crashed, we would not use the written data anyway.

When doing a buildah bud or podman build, the container image is written to an overlay mount point, often using DNF or YUM. If the kernel crashed in the middle of creating an image, the content written to the overlay layer would be useless and must be cleaned up by the user. Anything that failed to write would just be deleted. When the build completes, though, the overlay layer is tarred up into an image bundle which can then be synced to the disk.

Another use case for volatile overlay mounts is running Podman with the --rm flag. The --rm flag tells Podman to destroy the container and the overlay mount point when the container completes. A crash of the container would leave content that the user already indicated they have no use for, so there is no reason to care about whether a write was successful.

In the Kubernetes world, CRI-O is the container engine. Kubernetes is almost always set up to remove all containers at boot time. Basically, it wants to start with a clean state. This means if the kernel crashed while data was being written to the overlay mount, this data would be destroyed as soon as the system boots. It is also safe to use such configurations with stateful containers because the data is usually written to external volumes that won’t be affected by the “volatile” flag at runtime.

Adding a volatile option

Container team engineer Giuseppe Scrivano noticed these use cases and thought that we could improve performance by adding a volatile option to the Linux kernel’s overlay file system and implemented this behavior. As a result, newer versions of Buildah, Podman, and CRI-O will default to using the volatile flag in these use cases and hopefully get better performance.

Note that any volumes mounted into the container will continue to have the default syncing behavior of typical file systems, so you do not need to worry about losing data written to permanent storage.

The graph below shows how the number of write IOPS is reduced in a container that runs yum install -y texlive on a machine with 16 GB of RAM. In addition, when the container runs with the volatile flag turned on, its wall clock time is also affected and terminates faster.

The dirty pages will eventually be written to the storage once either the dirty ratio or the inode timeout expires, as these settings are not affected by the volatile mount flag.

graph of write IOPS differences using volatility

Wrap up

With container technology, we constantly push the envelope of what the Linux system can handle and experiment with new use cases. Adding a volatile option to the kernel's overlay file system helps increase performance, allowing containers to continue to evolve and provide greater benefits.

[ Free download: Advanced Linux commands cheat sheet. ]


关于作者

Daniel Walsh has worked in the computer security field for over 30 years. Dan is a Senior Distinguished Engineer at Red Hat. He joined Red Hat in August 2001. Dan leads the Red Hat Container Engineering team since August 2013, but has been working on container technology for several years. 

Dan helped developed sVirt, Secure Virtualization as well as the SELinux Sandbox back in RHEL6 an early desktop container tool. Previously, Dan worked Netect/Bindview's on Vulnerability Assessment Products and at Digital Equipment Corporation working on the Athena Project, AltaVista Firewall/Tunnel (VPN) Products. Dan has a BA in Mathematics from the College of the Holy Cross and a MS in Computer Science from Worcester Polytechnic Institute.

UI_Icon-Red_Hat-Close-A-Black-RGB

按频道浏览

automation icon

自动化

有关技术、团队和环境 IT 自动化的最新信息

AI icon

人工智能

平台更新使客户可以在任何地方运行人工智能工作负载

open hybrid cloud icon

开放混合云

了解我们如何利用混合云构建更灵活的未来

security icon

安全防护

有关我们如何跨环境和技术减少风险的最新信息

edge icon

边缘计算

简化边缘运维的平台更新

Infrastructure icon

基础架构

全球领先企业 Linux 平台的最新动态

application development icon

应用领域

我们针对最严峻的应用挑战的解决方案

Virtualization icon

虚拟化

适用于您的本地或跨云工作负载的企业虚拟化的未来