At the 2017 Red Hat Summit, several people asked me “We normally use full VMs to separate network services like DNS and DHCP, can we use containers instead?”. The answer is
yes, and here’s an example of how to create a system container in Red Hat Enterprise Linux 7 today.
Create a network service that can be updated independently of any other services of the system, yet easily managed and updated from the host.
Let’s explore setting up a BIND server running under systemd in a container. In this part, we’ll look at building our container, as well as managing the BIND configuration and data files.
In Part Two, we’ll look at how systemd on the host integrates with systemd in the container. We’ll explore managing the service in the container, and enabling it as a service on the host.
CREATING THE BIND CONTAINER
To get systemd working inside a container easily, we first need to add two packages on the host:
oci-systemd-hook hook allows us to run systemd in a container without needing to use a privileged container or manually configuring tmpfs and cgroups. The
oci-register-machine hook allows us to keep track of the container with the systemd tools like
[root@rhel7-host ~]# yum install oci-register-machine oci-systemd-hook
On to creating our BIND container. The Red Hat Enterprise Linux 7 base image includes systemd as an init system. We can install and enable BIND the same way we would on a typical system. You can download this Dockerfile from the git repository in the Resources.
[root@rhel7-host bind]# vi Dockerfile
# Dockerfile for BIND
ENV container docker
RUN yum -y install bind && \
yum clean all && \
systemctl enable named
CMD [ "/sbin/init" ]
Since we’re starting with an init system as PID 1, we need to change the signal sent by the docker CLI when we tell the container to stop. From the
kill system call man pages (
man 2 kill):
The only signals that can be sent to process ID 1, the init
process, are those for which init has explicitly installed
signal handlers. This is done to assure the system is not
brought down accidentally.
For the systemd signal handlers,
SIGRTMIN+3 is the signal that corresponds to
systemd start halt.target. We also expose both TCP and UDP ports for BIND, since both protocols could be in use.
With a functional BIND service, we need a way to manage the configuration and zone files. Currently those are inside the container, so we could enter the container any time we wanted to update the configs or make a zone file change. This isn’t ideal from a management perspective. We’ll need to rebuild the container when we need to update BIND, so changes in the images would be lost. Having to enter the container any time we need to update a file or restart the service adds steps and time.
Instead, we’ll extract the configuration and data files from the container and copy them to the host, then mount them at run time. This way we can easily restart or rebuild the container without losing changes. We can also modify configs and zones by using an editor outside of the container. Since this container data looks like “site-specific data served by this system”, let’s follow the File System Hierarchy and create
/srv/named on the local host to maintain administrative separation.
[root@rhel7-host ~]# mkdir -p /srv/named/etc
[root@rhel7-host ~]# mkdir -p /srv/named/var/named
NOTE: If you are migrating an existing configuration, you can skip the following step and copy it directly to the
/srv/named directories. You may still want to check the container assigned GID with a temporary container.
Let’s build and run an temporary container to examine BIND. With a init process as PID 1, we can’t run the container interactively to get a shell. We’ll exec into it after it launches, and check for important files with
[root@rhel7-host ~]# docker build -t named .
[root@rhel7-host ~]# docker exec -it $( docker run -d named ) /bin/bash
[root@0e77ce00405e /]# rpm -ql bind
For this example, we’ll need
/etc/named.conf and everything under
/var/named/. We can extract these with
machinectl. If there’s more than one container registered, we can see what’s running in any machine with
machinectl status. Once we have the configs we can stop the temporary container.
There’s also a sample
named.conf and zone files for
example.com in the Resources if you prefer.
[root@rhel7-host bind]# machinectl list
MACHINE CLASS SERVICE
8824c90294d5a36d396c8ab35167937f container docker
[root@rhel7-host ~]# machinectl copy-from 8824c90294d5a36d396c8ab35167937f /etc/named.conf /srv/named/etc/named.conf
[root@rhel7-host ~]# machinectl copy-from 8824c90294d5a36d396c8ab35167937f /var/named /srv/named/var/named
[root@rhel7-host ~]# docker stop infallible_wescoff
To create and run the final container, add the volume options to mount:
Since this is our final container, we’ll also provide a meaningful name that we can refer to later.
[root@rhel7-host ~]# docker run -d -p 53:53 -p 53:53/udp -v /srv/named/etc/named.conf:/etc/named.conf:Z -v /srv/named/var/named:/var/named:Z --name named-container named
With the final container running, we can modify the local configs to change the behavior of BIND in the container. The BIND server will need to listen on any IP that the container might be assigned. Be sure the GID of any new file matches the rest of the BIND files from the container.
[root@rhel7-host bind]# cp named.conf /srv/named/etc/named.conf
[root@rhel7-host ~]# cp example.com.zone /srv/named/var/named/example.com.zone
[root@rhel7-host ~]# cp example.com.rr.zone /srv/named/var/named/example.com.rr.zone
Curious why I didn't need to change SELinux context on the host directories?
We’ll reload the config by exec’ing the
rndc binary provided by the container. We can use
journald in the same fashion to check the BIND logs. If you run into errors, you can edit the file on the host, and reload the config. Using
dig on the host, we can check the responses from the contained service for example.com.
[root@rhel7-host ~]# docker exec -it named-container rndc reload
server reload successful
[root@rhel7-host ~]# docker exec -it named-container journalctl -u named -n
-- Logs begin at Fri 2017-05-12 19:15:18 UTC, end at Fri 2017-05-12 19:29:17 UTC. --
May 12 19:29:17 ac1752c314a7 named: automatic empty zone: 9.E.F.IP6.ARPA
May 12 19:29:17 ac1752c314a7 named: automatic empty zone: A.E.F.IP6.ARPA
May 12 19:29:17 ac1752c314a7 named: automatic empty zone: B.E.F.IP6.ARPA
May 12 19:29:17 ac1752c314a7 named: automatic empty zone: 8.B.D.0.1.0.0.2.IP6.ARPA
May 12 19:29:17 ac1752c314a7 named: reloading configuration succeeded
May 12 19:29:17 ac1752c314a7 named: reloading zones succeeded
May 12 19:29:17 ac1752c314a7 named: zone 1.0.10.in-addr.arpa/IN: loaded serial 2001062601
May 12 19:29:17 ac1752c314a7 named: zone 1.0.10.in-addr.arpa/IN: sending notifies (serial 2001062601)
May 12 19:29:17 ac1752c314a7 named: all zones loaded
May 12 19:29:17 ac1752c314a7 named: running
[root@rhel7-host bind]# host www.example.com localhost
Using domain server:
www.example.com is an alias for server1.example.com.
server1.example.com is an alias for mail
Did your zone file not update? It might be your editor not the serial number.
THE FINISH LINE (?)
We’ve got what we set out to accomplish. DNS requests and zones are being served from a container. We've got a persistent location to manage data and configurations across updates.
In Part 2 of this series, we’ll see how to treat the container as a normal service on the host.
Follow the RHEL Blog to receive updates on Part 2 of this series and other new posts via email.
You may have noticed that when I copied the files from the container to the local host, I didn’t run a
chcon to change the files on the host to type
svirt_sandbox_file_t. Why didn’t it break? Copying a file into
/srv should have made that file label type
var_t. Did I
Of course not, that would make Dan Walsh cry. And yes,
machinectl did indeed set the label type as expected, take a look:
Before starting the container:
[root@rhel7-host ~]# ls -Z /srv/named/etc/named.conf
-rw-r-----. unconfined_u:object_r:var_t:s0 /srv/named/etc/named.conf
No, I used a volume option in run that makes Dan Walsh happy,
:Z. This part of the command
-v /srv/named/etc/named.conf:/etc/named.conf:Z does two things: first it says this needs to be relabeled with a private volume SELinux label, and second it says to mount it read / write.
After starting the container:
[root@rhel7-host ~]# ls -Z /srv/named/etc/named.conf
-rw-r-----. root 25 system_u:object_r:svirt_sandbox_file_t:s0:c821,c956 /srv/named/etc/named.conf
If you made the edits to the config file with
vim on the local host and you aren’t seeing the changes in the container, you may have inadvertently created a new file that the container isn’t aware of. There are three
vim settings that affect backup copies during editing: backup, writebackup, and backupcopy.
I’ve snipped out the defaults that apply for RHEL 7 from the official VIM backup_table [http://vimdoc.sourceforge.net/htmldoc/editing.html#backup-table]
off on backup current file, deleted afterwards (default)
So we don’t create tilde copies that stick around, but we are creating backups. The other setting is backupcopy, where
auto is the shipped default:
"yes" make a copy of the file and overwrite the original one
"no" rename the file and write a new one
"auto" one of the previous, what works best
This combo means that when you edit a file, unless
vim sees a reason not to (check the docs for the logic) you will end up with a new file that contains your edits, which will be renamed to the original filename when you save. This means the file gets a new inode. For most situations this isn’t a problem, but here the bind mount into the container *is* senstive to inode changes. To solve this, you need to change the backupcopy behavior.
Either in the
vim session or in your
set backupcopy=yes. This will make sure the original file gets truncated and overwritten, preserving the inode and propagating the changes into the container.