Connexion / Inscription Account

In this blog:

  • Walkthrough of setting up updates for edge systems running Red Hat Enterprise Linux (RHEL), including kickstart files. 

  • Discussion of RHEL for Edge components.


With the release of Red Hat Enterprise Linux (RHEL) 8.3 last November, Red Hat added support for resilient edge computing architectures based on proven features within RHEL. RHEL for Edge creates operating system images that update "atomically" with automatic rollback on failure. 

These operating system images provide efficient updates to RHEL systems by delivering only the deltas to an updated operating system image instead of the complete image. RHEL for Edge enables rapid application of security updates and patches with minimum downtime and continuity of operations if updates should fail.

RHEL for Edge components

This post augments the content in the RHEL for Edge technical video and it dives deeper into the configuration needed to showcase the various capabilities built-in to RHEL.

Follow the full instructions to demonstrate the RHEL for Edge capabilities. Whereas the GitHub repository tracks the contents of the RHEL for Edge summit session, this post will discuss in greater detail how all these features are configured and used to achieve the desired outcome.

Blueprints

The blueprint file, which follows Tom's Obvious Minimal Language (TOML) formatting, codifies the contents of the RHEL for Edge operating system image. The blueprint reference describes the various table headers and key/value pairs that appear in this file. TOML files resemble the familiar INI file format. The blueprint contains several important sections. At the top is the meta-data definition for this image including the name, description, and version which is shown in this snippet.

name = "RFE"
description = "RHEL for Edge"
version = "0.0.1"

Packages, modules, and groups entries define the contents of the operating system image. For example, the packages entity can be repeated to describe the list of packages to be added to the default minimal set included in the image. An example to install all versions of the strace package would look like:

[[packages]]
name = "strace"
version = "*"

Subsequent sections in the blueprint file describe customizations to the generated operating system image. TCP port 8080 is being opened since the image will host a web server, and a user with administrator privileges is also defined. A hashed password is included in the file, but you can also use a plaintext value.

Operating system images output by image-builder contain a defined set of packages based on the image type being generated. At a minimum, the packages from the comps core group are included.

Image Builder for Red Hat Enterprise Linux

RHEL for Edge includes an image builder backend with a web interface. This service, accessed from either the command line or via a web console plug-in,  composes operating system images based on the contents of the blueprint file. The resulting rpm-ostree operating system images are then deployed to the relevant systems using a range of methods.

rpm-ostree operating system image

A simple web server offers the operating system as an expanded rpm-ostree tarball. The rpm-ostree format creates a middle-ground between package-based operating systems and image-based operating systems by enabling atomic image updates with automatic rollback on failure while also allowing persistent configuration in various file system locations including /etc and /var. RHEL 8.4 expands the delivery options to include an OCI container that wraps the rpm-ostree image with a built-in web server to provide content to RHEL for Edge systems. Alternatively, the rpm-ostree content can be packaged in a bootable ISO with a self-launching installer for systems in disconnected environments.

The kickstart file

Configuration of the demo occurs principally within the kickstart file. This file installs the initial rpm-ostree image and then configures the RHEL for Edge system. The various scripts in the repository ensure that the installation is started without operator intervention by creating customized boot ISOs that include both the kickstart file and necessary kernel boot parameters. The remainder of this post walks through the content of the kickstart file, explaining the relevance of each section.

Initial installation

The beginning stanza launches installation of our RHEL for Edge device. Review the lines in the kickstart file below.

  5 # set locale defaults for the Install
  6 lang en_US.UTF-8
  7 keyboard us
  8 timezone UTC
  9
 10 # initialize any invalid partition tables and destroy all of their contents
 11 zerombr
 12
 13 # erase all disk partitions and create a default label
 14 clearpart --all --initlabel
 15
 16 # automatically create xfs partitions with no LVM and no /home partition
 17 autopart --type=plain --fstype=xfs --nohome
 18
 19 # poweroff after installation is successfully completed
 20 poweroff
 21
 22 # installation will run in text mode
 23 text
 24
 25 # activate network devices and configure with DHCP
 26 network --bootproto=dhcp --noipv6
 27
 28 # Kickstart requires that we create default user 'core' with sudo
 29 # privileges using password 'edge'
 30 user --name=core --groups=wheel --password=edge --homedir=/var/home/core
 31
 32 # set up the OSTree-based install with disabled GPG key verification, the base
 33 # URL to pull the installation content, 'rhel' as the management root in the
 34 # repo, and 'rhel/8/x86_64/edge' as the branch for the installation
 35 ostreesetup --nogpg --url=http://${HOSTIP}:8000/repo/ --osname=rhel --remote=edge --ref=rhel/8/x86_64/edge

Most of this content is boiler-plate. Lines 5 through 17 ensure fresh disk partitions for our installation. Lines 20 through 26 power down the system after installation, enable a text-based installation, and configure networking, respectively.

Of note, line 30 defines user "core" with system administration privileges. The blueprint file used to generate the rpm-ostree operating system image also defines the user "core." Line 30 is necessary to ensure that the kickstart installation runs unattended and it does not pause for operator input. Anaconda needs a defined root user or other user to enable fully automated and unattended installation.

Line 35 is where most of the work is done. The "ostreesetup" command reaches out to the defined URL and pulls the rpm-ostree content from the specific reference branch to install the operating system. Beyond this point, various "%post" stanzas configure the remainder of the system. We’ll discuss each "%post" section in more detail.

Automated failover with keepalived

Ensuring continuity of operations requires highly available systems with the ability to take over functionality when a primary system fails. The keepalived feature within RHEL enables that behavior. This post focuses the discussion to specifically what is enabled for the demonstration.

For the demonstration we configure primary and backup RHEL for Edge devices. The primary and backup devices use the virtual router redundancy protocol (VRRP), implemented by keepalived, to govern the ownership of a virtual IP address. The primary edge device has a higher priority and it will own the virtual IP address when it’s active. Review the excerpted lines from the kickstart file here:

 43 # parse out boot parameters beginning with "vip"
 44 cmdline=`cat /proc/cmdline`
 45 params=(${cmdline// / })
 46 for param in "${params[@]}"; do
 47   if [[ $param =~ "vip" ]]; then
 48     eval $param
 49   fi
 50 done

Lines 43 through 50 do a little kickstart shell magic to get the kernel boot parameters for configuring keepalived. The specific values being passed are the device’s state and priority. The next snippet writes the configuration for keepalived.

 53 cat << EOF > /etc/keepalived/keepalived.conf
 54 vrrp_instance RFE_VIP {
 55     state $vip_state
 56     interface enp1s0
 57     virtual_router_id 50
 58     priority $vip_priority
 59     advert_int 1
 60     authentication {
 61         auth_type PASS
 62         auth_pass edge123
 63     }
 64     virtual_ipaddress {
 65         $VIP_IP/$VIP_MASK
 66     }
 67 }
 68 EOF

Lines 53 through 67 create the configuration file for keepalived. Here we create a VRRP instance called "RFE_VIP" and then map the variables for state and priority into the configuration file. VRRP advertisements are sent once a second, with failovers taking around three seconds or less. The above linked article series for keepalived goes into greater detail on the various parameters and settings. This is a minimal configuration to enable failover to the backup instance and then restoration to the primary instance when it’s functional again.

 72 cat << EOF > /etc/sysconfig/keepalived
 73 KEEPALIVED_OPTIONS="-D --vrrp"
 74 EOF

Lines 72 through 74 tailor the keepalived service to enable both logging and running only the VRRP subsystem. The keepalived service offers greater functionality, but the demonstration is configured to only handle failover from primary to backup edge instances.

 77 cat << EOF > /etc/systemd/system/enable-vrrp-protocol.service
 78 [Unit]
 79 Wants=firewalld.service
 80 After=firewalld.service
 81
 82 [Service]
 83 Type=oneshot
 84 ExecStart=firewall-cmd --add-rich-rule="rule protocol value='vrrp' accept" --permanent
 85 ExecStartPost=firewall-cmd --reload
 86
 87 [Install]
 88 WantedBy=multi-user.target default.target
 89 EOF
 90
 91 systemctl enable enable-vrrp-protocol.service

Lines 77 through 91 are critically important to the proper operation of keepalived. The primary and backup edge devices need to send multicast advertisements to each other to ensure proper governance of the virtual IP address. 

By default, VRRP packets are blocked by the firewall. A simple one-shot systemd service unit file is created and then the service is enabled to help ensure that VRRP packets are allowed through the firewall. Since some commands are not executable within the kickstart (e.g., there is no running firewalld service when the kickstart is executed), a simple service that runs once at boot up is created to add the VRRP rules to the firewall.

Keeping rpm-ostree up to date

Operating systems should receive regular updates, whether running at the edge or not. Operating at the edge, however, presents unique challenges since there’s limited infrastructure and network connectivity. 

Methods should be devised to provide updates that are non-disruptive to normal operations and fit within the constraints of the operational environment. RHEL for Edge is designed so that a failure during an operating system update triggers an automatic rollback to the prior system state so the edge device is never inoperational.

In disrupted, disconnected, intermittent, and limited-bandwidth environments, alternate schemes should be explored. For example, an edge device could delay updates until the system is in range of supporting infrastructure. A temporary network cable could then be connected and a reboot used to trigger the download and application of the updates.

This section of the kickstart file configures periodic checks for operating system updates and then the automatic application of those updates. This assumes an environment where network connectivity is available to support downloading updated content,  staging it, and then automatically upgrading.

102 echo AutomaticUpdatePolicy=stage >> /etc/rpm-ostreed.conf

Line 102 modifies the update policy to stage any downloaded operating system updates that will then be applied at the next reboot.

112 cat > /etc/systemd/system/applyupdate.service << 'EOF'
113 [Unit]
114 Description=Apply Update Check
115
116 [Service]
117 Type=oneshot
118 ExecStart=/bin/sh -c 'if [[ $(rpm-ostree status -v | grep "Staged: yes") ]]; then systemctl --message="Applying OTA update" reboot; else logger "Running latest available update"; fi'
119 EOF
120
121 # This systemd timer activates every minute to check for staged
122 # updates to the operating system
123 cat > /etc/systemd/system/applyupdate.timer <<EOF
124 [Unit]
125 Description=Daily Update Reboot Check.
126
127 [Timer]
128 # activate every minute
129 OnBootSec=30
130 OnUnitActiveSec=30
131
132 #weekly example for Sunday at midnight
133 #OnCalendar=Sun *-*-* 00:00:00
134
135 [Install]
136 WantedBy=multi-user.target
137 EOF

Lines 112 through 137 create a systemd timer and accompanying service that periodically runs to check if updated operating system content has been staged. If so, an automatic reboot is triggered. This approach is great for a demonstration, but it can be disruptive to normal operations. The key takeaway is that systemd timers and services and the underlying tooling for RHEL for Edge allows for a broad range of solutions to ensure systems are up to date.

It’s also important to note that systemd timers enable a virtually unlimited way of specifying when and at what intervals a service executes. Lines 127 through 130 specify execution every thirty seconds after boot, but this can follow the commented example of every Sunday at midnight or almost any desired periodicity. Timer events can also be "splayed," or distributed over a period of time, to avoid having devices overtaxing available infrastructure when requesting updates.

142 systemctl enable rpm-ostreed-automatic.timer applyupdate.timer

Line 142, the last line in this section, enables not only the applyupdate.timer we just defined, but also a second timer included with RHEL for Edge to check if there’s updates and then stage them per earlier policy settings. By default, the rpm-ostreed-automatic.timer triggers every hour to run a service with a matching name. The timing can be modified to override the default settings.

Image registry policy

In the demonstration, systemd services launch container applications via podman. Policy settings tailor both the image registries that are searched and their security settings.

151 cat > /etc/containers/registries.conf <<EOF
152 [registries.search]
153 registries = ['registry.access.redhat.com', 'registry.redhat.io', 'docker.io']
154 [registries.insecure]
155 registries = ['${HOSTIP}:5000']
156 [registries.block]
157 registries = []
158 EOF

Lines 151 through 158 customize the registry list to mark the self-hosted demonstration registry as insecure, so authentication is not required to download a container image. For obvious reasons, container access should be secured in operational environments.

Rootless, socket-activated container application

This next section is vitally important to the demonstration and it illustrates a great many features of both Podman and systemd sockets, timers, and services. For a deeper explanation of how this all works, please review this excellent article covering podman rootless containers with socket activation. Images from that blog post are excerpted here for clarity.

Systemd sockets, after receiving a request, start a service of the same name. The socket will create a file descriptor for the request and pass that to the underlying service. The service will then in turn process the request and listen for additional requests. To enable socket activated container applications, it’s necessary to use a proxy. This is fully explored in the above linked article, but suffice it to say that podman is not able to accept a file descriptor and then listen for requests. The flow that occurs is illustrated here.

Figure 1.

A client request to the systemd socket will trigger the proxy service to be started. The proxy service will then trigger a dependent container service to start and the request will be forwarded to the container service. This flow enables on-demand activation of both the proxy and the container services. This is important for the demonstration of failover. Neither the primary or backup systems are running services until they are needed, acting as "scale-from-zero" services. With a future systemd release, it will be possible to "scale-to-zero" creating a truly serverless experience.

There’s quite a bit of configuration for this in the kickstart file so each section will be broken down and discussed separately.

168 # create systemd user directories for rootless services, timers,
171 mkdir -p /var/home/core/.config/systemd/user/sockets.target.wants
172 mkdir -p /var/home/core/.config/systemd/user/timers.target.wants
173 mkdir -p /var/home/core/.config/systemd/user/multi-user.target.wants

Lines 168 through 173 create needed directories when running rootless system sockets, timers, services. Normally these directories would be created through the "systemctl --user" commands but that’s not possible in the limited shell environment of a kickstart file. To workaround this limitation, directories and softlinks are created directly.

176 cat << EOF > /var/home/core/.config/systemd/user/container-httpd-proxy.socket
177 [Socket]
178 ListenStream=$VIP_IP:8080
179 FreeBind=true
180
181 [Install]
182 WantedBy=sockets.target
183 EOF

Lines 176 through 183 define the socket listener for web requests. Line 178 binds the socket listener to the virtual IP address and port 8080. Since the virtual IP address is only configured for one edge device at a time, the FreeBind option in line 179 allows the binding to occur even if the IP address is not yet configured for the device. This option is described in the ip(7) man page with the relevant section excerpted here.

IP_FREEBIND (since Linux 2.4)

If enabled, this boolean option allows binding to an IP address that is nonlocal or does not (yet) exist.  This permits listening on a socket, without requiring the underlying network interface or the specified dynamic IP address to be up at the time that the application is trying to bind to it.

187 cat << EOF > /var/home/core/.config/systemd/user/container-httpd-proxy.service
188 [Unit]
189 Requires=container-httpd.service
190 After=container-httpd.service
191 Requires=container-httpd-proxy.socket
192 After=container-httpd-proxy.socket
193
194 [Service]
195 ExecStart=/usr/lib/systemd/systemd-socket-proxyd 127.0.0.1:8080
196 EOF

Lines 187 through 196 define the proxy service that accepts the file descriptor from the socket and then listens for additional http requests. The proxy forwards requests to the container web server application on the localhost address. The Requires and After directives define the dependencies and ordering so that components are started in the correct order when a request is received.

203 cat > /var/home/core/.config/systemd/user/container-httpd.service <<EOF
204 # container-httpd.service
205 # autogenerated by Podman 3.0.2-dev
206 # Thu May 20 10:16:40 EDT 2021
207
208 [Unit]
209 Description=Podman container-httpd.service
210 Documentation=man:podman-generate-systemd(1)
211
212 [Service]
213 Environment=PODMAN_SYSTEMD_UNIT=%n
214 Restart=on-failure
215 TimeoutStopSec=70
216 ExecStartPre=/bin/rm -f %t/container-httpd.pid %t/container-httpd.ctr-id
217 ExecStart=/usr/bin/podman run --conmon-pidfile %t/container-httpd.pid --cidfile %t/container-httpd.ctr-id --cgroups=no-conmon --replace -d --label io.containers.autoupdate=image --name httpd -p 127.0.0.1:    8080:80 ${HOSTIP}:5000/httpd:prod
218 ExecStartPost=/bin/sleep 1
219 ExecStop=/usr/bin/podman stop --ignore --cidfile %t/container-httpd.ctr-id -t 10
220 ExecStopPost=/usr/bin/podman rm --ignore -f --cidfile %t/container-httpd.ctr-id
221 PIDFile=%t/container-httpd.pid
222 Type=forking
223 EOF

Lines 203 through 223 define the systemd service for the container web server. This content was auto-generated as described in the above linked article using the command podman generate systemd. These settings were then tailored. Notably, line 217 adds the label io.containers.autoupdate=image to enable auto update of the container application, discussed in more detail below. Line 218 adds a short pause to ensure that the web server, being launched as a container by podman, is fully up and ready to receive an http request.

240 cat > /var/home/core/.config/systemd/user/podman-auto-update.service <<EOF
241 [Unit]
242 Description=Podman auto-update service
243 Documentation=man:podman-auto-update(1)
244
245 [Service]
246 ExecStart=/usr/bin/podman auto-update
247
248 [Install]
249 WantedBy=multi-user.target default.target
250 EOF
251
252 # This timer ensures podman auto-update is run every minute
253 cat > /var/home/core/.config/systemd/user/podman-auto-update.timer <<EOF
254 [Unit]
255 Description=Podman auto-update timer
256
257 [Timer]
258 # This example runs the podman auto-update daily within a two-hour
259 # randomized window to reduce system load
260 #OnCalendar=daily
261 #Persistent=true
262 #RandomizedDelaySec=7200
263
264 # activate every minute
265 OnBootSec=30
266 OnUnitActiveSec=30
267
268 [Install]
269 WantedBy=timers.target
270 EOF

Lines 240 through 270 define both a system timer and corresponding service to update the container application if the container image in the registry differs from what’s currently running on the edge device. The timer and service periodically run the command podman auto-update that then does all the work necessary to check if a changed image is in the registry, download the image content, stop the currently running container application, and finally restart the container application with the updated container image. 

By default, this service is triggered every thirty seconds after boot but lines 260 through 262 illustrate how this can be run daily within a randomized two hour window so requests to the registry are "splayed."

273 cat > /var/home/core/.config/systemd/user/pre-pull-container-image.service <<EOF
274 [Service]
275 Type=oneshot
276 ExecStart=podman pull $HOSTIP:5000/httpd:prod
277
278 [Install]
279 WantedBy=multi-user.target default.target
280 EOF

Lines 273 through 280 implement an optimization to pre-pull the container images and speed up starting the container web server application when a request is received. This runs once at boot.

282 # enable socket listener
283 ln -s /var/home/core/.config/systemd/user/container-httpd-proxy.socket /var/home/core/.config/systemd/user/sockets.target.wants/container-httpd-proxy.socket
284
285 # enable timer
286 ln -s /var/home/core/.config/systemd/user/podman-auto-update.timer /var/home/core/.config/systemd/user/timers.target.wants/podman-auto-update.timer
287
288 # enable pre-pull container image
289 ln -s /var/home/core/.config/systemd/user/pre-pull-container-image.service /var/home/core/.config/systemd/user/default.target.wants/pre-pull-container-image.service
290 ln -s /var/home/core/.config/systemd/user/pre-pull-container-image.service /var/home/core/.config/systemd/user/multi-user.target.wants/pre-pull-container-image.service

Lines 282 through 290 manually create the various soft links to enable the rootless systemd sockets, timers, and services to run. This is necessary since it’s not possible to run the systemctl --user commands during kickstart as there is no logged in user or fully configured shell.

293 chown -R core: /var/home/core
294 restorecon -vFr /var/home/core

Lines 293 through 294 ensure that both discretionary and mandatory access controls are correct for the rootless systemd unit files.

297 cat << EOF > /etc/systemd/system/enable-linger.service
298 [Service]
299 Type=oneshot
300 ExecStart=loginctl enable-linger core
301
302 [Install]
303 WantedBy=multi-user.target default.target
304 EOF
305
306 systemctl enable enable-linger.service

And wrapping this section up, lines 297 through 306 create a service run once at boot to enable linger for user core. Linger enables rootless systemd sockets, timers, and services to run whether the associated user is logged in or not.

Greenboot and automated rollbacks

RHEL for Edge supports conditional checks to be run during system startup to determine if the current operating system image meets all operational requirements. The requirements are defined in the /etc/greenboot directory and enforced via the greenboot facility, a generic health check framework for systemd. Greenboot is implemented as simple shell scripts in a prescribed directory structure that return pass/fail results. The directory structure is shown below:

    /etc/greenboot
    +-- check
    |   +-- required.d  /* these scripts MUST succeed */
    |   +-- wanted.d    /* these scripts SHOULD succeed */
    +-- green.d         /* scripts run after success */
    +-- red.d           /* scripts run after failure */

All scripts in the required.d directory must return a successful result for startup to proceed. If there's three failed boot attempts, the operating system will be rolled back to the previous version and restarted. Scripts within the wanted.d directory may succeed, but they won't trigger a rollback if they fail. The green.d directory contains any scripts that should run as part of a successful boot and the scripts in the red.d directory will run if there’s a failure.

The demonstration uses a custom greenboot check script to control whether a rollback occurs to the previous version of the operating system. This is explained below.

322 mkdir -p /etc/greenboot/check/required.d
323 cat > /etc/greenboot/check/required.d/01_check_upgrade.sh <<EOF
324 #!/bin/bash
325
326 #
327 # This test fails if the current commit identifier is different
328 # than the original commit
329 #
330
331 if [ ! -f /etc/greenboot/orig.txt ]
332 then
333     rpm-ostree status | grep -A2 '^\*' | grep Commit > /etc/greenboot/orig.txt
334 fi
335
336 rpm-ostree status | grep -A2 '^\*' | grep Commit > /etc/greenboot/current.txt
337
338 diff -s /etc/greenboot/orig.txt /etc/greenboot/current.txt
339 EOF
340
341 chmod +x /etc/greenboot/check/required.d/01_check_upgrade.sh

Lines 322 to 339 define the script itself and line 341 ensures its executable by the greenboot facility. At system startup, the file orig.txt is created only if it doesn’t already exist. This file contains the unique commit identifier of the current operating system image. On each startup, the file current.txt is overwritten with the latest commit identifier for the current operating system image. The two files are then compared, and if they differ, the greenboot check fails.

This mechanism prevents an upgrade from occurring. During the demonstration, the system will attempt three boots with an updated operating system image and then rollback to the previous image. To enable a successful upgrade, simply remove the orig.txt file before attempting the upgrade.

Conclusion

This blog article fully describes the configuration of the RHEL for Edge servers supporting the demonstration. This includes the various systemd timers, sockets, and services to enable rootless, on-demand container web server; automated failover from primary to backup edge servers; automated updates of the container web server; and automated operating system image upgrades with rollback on failure. RHEL provides a very complete and extremely lightweight approach to support computing at the edge.


About the author

As Principal Solution Architect to the DoD, Rich brings almost twenty years of experience working directly for large and small DoD system integrators. With over ten years at Red Hat, Rich focuses on the needs of DoD customers to adopt technologies and platforms that simplify tactical operational workloads and enable the military to adapt to rapidly changing environments.