In this blog:
-
Walkthrough of setting up updates for edge systems running Red Hat Enterprise Linux (RHEL), including kickstart files.
-
Discussion of RHEL for Edge components.
With the release of Red Hat Enterprise Linux (RHEL) 8.3 last November, Red Hat added support for resilient edge computing architectures based on proven features within RHEL. RHEL for Edge creates operating system images that update "atomically" with automatic rollback on failure.
These operating system images provide efficient updates to RHEL systems by delivering only the deltas to an updated operating system image instead of the complete image. RHEL for Edge enables rapid application of security updates and patches with minimum downtime and continuity of operations if updates should fail.
RHEL for Edge components
This post augments the content in the RHEL for Edge technical video and it dives deeper into the configuration needed to showcase the various capabilities built-in to RHEL.
Follow the full instructions to demonstrate the RHEL for Edge capabilities. Whereas the GitHub repository tracks the contents of the RHEL for Edge summit session, this post will discuss in greater detail how all these features are configured and used to achieve the desired outcome.
Blueprints
The blueprint file, which follows Tom's Obvious Minimal Language (TOML) formatting, codifies the contents of the RHEL for Edge operating system image. The blueprint reference describes the various table headers and key/value pairs that appear in this file. TOML files resemble the familiar INI file format. The blueprint contains several important sections. At the top is the meta-data definition for this image including the name, description, and version which is shown in this snippet.
name = "RFE" description = "RHEL for Edge" version = "0.0.1"
Packages, modules, and groups entries define the contents of the operating system image. For example, the packages entity can be repeated to describe the list of packages to be added to the default minimal set included in the image. An example to install all versions of the strace package would look like:
[[packages]] name = "strace" version = "*"
Subsequent sections in the blueprint file describe customizations to the generated operating system image. TCP port 8080 is being opened since the image will host a web server, and a user with administrator privileges is also defined. A hashed password is included in the file, but you can also use a plaintext value.
Operating system images output by image-builder contain a defined set of packages based on the image type being generated. At a minimum, the packages from the comps core group are included.
Image Builder for Red Hat Enterprise Linux
RHEL for Edge includes an image builder backend with a web interface. This service, accessed from either the command line or via a web console plug-in, composes operating system images based on the contents of the blueprint file. The resulting rpm-ostree operating system images are then deployed to the relevant systems using a range of methods.
rpm-ostree operating system image
A simple web server offers the operating system as an expanded rpm-ostree tarball. The rpm-ostree format creates a middle-ground between package-based operating systems and image-based operating systems by enabling atomic image updates with automatic rollback on failure while also allowing persistent configuration in various file system locations including /etc and /var. RHEL 8.4 expands the delivery options to include an OCI container that wraps the rpm-ostree image with a built-in web server to provide content to RHEL for Edge systems. Alternatively, the rpm-ostree content can be packaged in a bootable ISO with a self-launching installer for systems in disconnected environments.
The kickstart file
Configuration of the demo occurs principally within the kickstart file. This file installs the initial rpm-ostree image and then configures the RHEL for Edge system. The various scripts in the repository ensure that the installation is started without operator intervention by creating customized boot ISOs that include both the kickstart file and necessary kernel boot parameters. The remainder of this post walks through the content of the kickstart file, explaining the relevance of each section.
Initial installation
The beginning stanza launches installation of our RHEL for Edge device. Review the lines in the kickstart file below.
5 # set locale defaults for the Install 6 lang en_US.UTF-8 7 keyboard us 8 timezone UTC 9 10 # initialize any invalid partition tables and destroy all of their contents 11 zerombr 12 13 # erase all disk partitions and create a default label 14 clearpart --all --initlabel 15 16 # automatically create xfs partitions with no LVM and no /home partition 17 autopart --type=plain --fstype=xfs --nohome 18 19 # poweroff after installation is successfully completed 20 poweroff 21 22 # installation will run in text mode 23 text 24 25 # activate network devices and configure with DHCP 26 network --bootproto=dhcp --noipv6 27 28 # Kickstart requires that we create default user 'core' with sudo 29 # privileges using password 'edge' 30 user --name=core --groups=wheel --password=edge --homedir=/var/home/core 31 32 # set up the OSTree-based install with disabled GPG key verification, the base 33 # URL to pull the installation content, 'rhel' as the management root in the 34 # repo, and 'rhel/8/x86_64/edge' as the branch for the installation 35 ostreesetup --nogpg --url=http://${HOSTIP}:8000/repo/ --osname=rhel --remote=edge --ref=rhel/8/x86_64/edge
Most of this content is boiler-plate. Lines 5 through 17 ensure fresh disk partitions for our installation. Lines 20 through 26 power down the system after installation, enable a text-based installation, and configure networking, respectively.
Of note, line 30 defines user “core” with system administration privileges. The blueprint file used to generate the rpm-ostree operating system image also defines the user “core.” Line 30 is necessary to ensure that the kickstart installation runs unattended and it does not pause for operator input. Anaconda needs a defined root user or other user to enable fully automated and unattended installation.
Line 35 is where most of the work is done. The “ostreesetup” command reaches out to the defined URL and pulls the rpm-ostree content from the specific reference branch to install the operating system. Beyond this point, various “%post” stanzas configure the remainder of the system. We’ll discuss each “%post” section in more detail.
Automated failover with keepalived
Ensuring continuity of operations requires highly available systems with the ability to take over functionality when a primary system fails. The keepalived feature within RHEL enables that behavior. This post focuses the discussion to specifically what is enabled for the demonstration.
For the demonstration we configure primary and backup RHEL for Edge devices. The primary and backup devices use the virtual router redundancy protocol (VRRP), implemented by keepalived, to govern the ownership of a virtual IP address. The primary edge device has a higher priority and it will own the virtual IP address when it’s active. Review the excerpted lines from the kickstart file here:
43 # parse out boot parameters beginning with "vip" 44 cmdline=`cat /proc/cmdline` 45 params=(${cmdline// / }) 46 for param in "${params[@]}"; do 47 if [[ $param =~ "vip" ]]; then 48 eval $param 49 fi 50 done
Lines 43 through 50 do a little kickstart shell magic to get the kernel boot parameters for configuring keepalived. The specific values being passed are the device’s state and priority. The next snippet writes the configuration for keepalived.
53 cat << EOF > /etc/keepalived/keepalived.conf 54 vrrp_instance RFE_VIP { 55 state $vip_state 56 interface enp1s0 57 virtual_router_id 50 58 priority $vip_priority 59 advert_int 1 60 authentication { 61 auth_type PASS 62 auth_pass edge123 63 } 64 virtual_ipaddress { 65 $VIP_IP/$VIP_MASK 66 } 67 } 68 EOF
Lines 53 through 67 create the configuration file for keepalived. Here we create a VRRP instance called “RFE_VIP” and then map the variables for state and priority into the configuration file. VRRP advertisements are sent once a second, with failovers taking around three seconds or less. The above linked article series for keepalived goes into greater detail on the various parameters and settings. This is a minimal configuration to enable failover to the backup instance and then restoration to the primary instance when it’s functional again.
72 cat << EOF > /etc/sysconfig/keepalived 73 KEEPALIVED_OPTIONS="-D --vrrp" 74 EOF
Lines 72 through 74 tailor the keepalived service to enable both logging and running only the VRRP subsystem. The keepalived service offers greater functionality, but the demonstration is configured to only handle failover from primary to backup edge instances.
77 cat << EOF > /etc/systemd/system/enable-vrrp-protocol.service 78 [Unit] 79 Wants=firewalld.service 80 After=firewalld.service 81 82 [Service] 83 Type=oneshot 84 ExecStart=firewall-cmd --add-rich-rule="rule protocol value='vrrp' accept" --permanent 85 ExecStartPost=firewall-cmd --reload 86 87 [Install] 88 WantedBy=multi-user.target default.target 89 EOF 90 91 systemctl enable enable-vrrp-protocol.service
Lines 77 through 91 are critically important to the proper operation of keepalived. The primary and backup edge devices need to send multicast advertisements to each other to ensure proper governance of the virtual IP address.
By default, VRRP packets are blocked by the firewall. A simple one-shot systemd service unit file is created and then the service is enabled to help ensure that VRRP packets are allowed through the firewall. Since some commands are not executable within the kickstart (e.g., there is no running firewalld service when the kickstart is executed), a simple service that runs once at boot up is created to add the VRRP rules to the firewall.
Keeping rpm-ostree up to date
Operating systems should receive regular updates, whether running at the edge or not. Operating at the edge, however, presents unique challenges since there’s limited infrastructure and network connectivity.
Methods should be devised to provide updates that are non-disruptive to normal operations and fit within the constraints of the operational environment. RHEL for Edge is designed so that a failure during an operating system update triggers an automatic rollback to the prior system state so the edge device is never inoperational.
In disrupted, disconnected, intermittent, and limited-bandwidth environments, alternate schemes should be explored. For example, an edge device could delay updates until the system is in range of supporting infrastructure. A temporary network cable could then be connected and a reboot used to trigger the download and application of the updates.
This section of the kickstart file configures periodic checks for operating system updates and then the automatic application of those updates. This assumes an environment where network connectivity is available to support downloading updated content, staging it, and then automatically upgrading.
102 echo AutomaticUpdatePolicy=stage >> /etc/rpm-ostreed.conf
Line 102 modifies the update policy to stage any downloaded operating system updates that will then be applied at the next reboot.
112 cat > /etc/systemd/system/applyupdate.service << 'EOF' 113 [Unit] 114 Description=Apply Update Check 115 116 [Service] 117 Type=oneshot 118 ExecStart=/bin/sh -c 'if [[ $(rpm-ostree status -v | grep "Staged: yes") ]]; then systemctl --message="Applying OTA update" reboot; else logger "Running latest available update"; fi' 119 EOF 120 121 # This systemd timer activates every minute to check for staged 122 # updates to the operating system 123 cat > /etc/systemd/system/applyupdate.timer <<EOF 124 [Unit] 125 Description=Daily Update Reboot Check. 126 127 [Timer] 128 # activate every minute 129 OnBootSec=30 130 OnUnitActiveSec=30 131 132 #weekly example for Sunday at midnight 133 #OnCalendar=Sun *-*-* 00:00:00 134 135 [Install] 136 WantedBy=multi-user.target 137 EOF
Lines 112 through 137 create a systemd timer and accompanying service that periodically runs to check if updated operating system content has been staged. If so, an automatic reboot is triggered. This approach is great for a demonstration, but it can be disruptive to normal operations. The key takeaway is that systemd timers and services and the underlying tooling for RHEL for Edge allows for a broad range of solutions to ensure systems are up to date.
It’s also important to note that systemd timers enable a virtually unlimited way of specifying when and at what intervals a service executes. Lines 127 through 130 specify execution every thirty seconds after boot, but this can follow the commented example of every Sunday at midnight or almost any desired periodicity. Timer events can also be “splayed,” or distributed over a period of time, to avoid having devices overtaxing available infrastructure when requesting updates.
142 systemctl enable rpm-ostreed-automatic.timer applyupdate.timer
Line 142, the last line in this section, enables not only the applyupdate.timer we just defined, but also a second timer included with RHEL for Edge to check if there’s updates and then stage them per earlier policy settings. By default, the rpm-ostreed-automatic.timer triggers every hour to run a service with a matching name. The timing can be modified to override the default settings.
Image registry policy
In the demonstration, systemd services launch container applications via podman. Policy settings tailor both the image registries that are searched and their security settings.
151 cat > /etc/containers/registries.conf <<EOF 152 [registries.search] 153 registries = ['registry.access.redhat.com', 'registry.redhat.io', 'docker.io'] 154 [registries.insecure] 155 registries = ['${HOSTIP}:5000'] 156 [registries.block] 157 registries = [] 158 EOF
Lines 151 through 158 customize the registry list to mark the self-hosted demonstration registry as insecure, so authentication is not required to download a container image. For obvious reasons, container access should be secured in operational environments.
Rootless, socket-activated container application
This next section is vitally important to the demonstration and it illustrates a great many features of both Podman and systemd sockets, timers, and services. For a deeper explanation of how this all works, please review this excellent article covering podman rootless containers with socket activation. Images from that blog post are excerpted here for clarity.
Systemd sockets, after receiving a request, start a service of the same name. The socket will create a file descriptor for the request and pass that to the underlying service. The service will then in turn process the request and listen for additional requests. To enable socket activated container applications, it’s necessary to use a proxy. This is fully explored in the above linked article, but suffice it to say that podman is not able to accept a file descriptor and then listen for requests. The flow that occurs is illustrated here.
A client request to the systemd socket will trigger the proxy service to be started. The proxy service will then trigger a dependent container service to start and the request will be forwarded to the container service. This flow enables on-demand activation of both the proxy and the container services. This is important for the demonstration of failover. Neither the primary or backup systems are running services until they are needed, acting as “scale-from-zero” services. With a future systemd release, it will be possible to “scale-to-zero” creating a truly serverless experience.
There’s quite a bit of configuration for this in the kickstart file so each section will be broken down and discussed separately.
168 # create systemd user directories for rootless services, timers, 171 mkdir -p /var/home/core/.config/systemd/user/sockets.target.wants 172 mkdir -p /var/home/core/.config/systemd/user/timers.target.wants 173 mkdir -p /var/home/core/.config/systemd/user/multi-user.target.wants
Lines 168 through 173 create needed directories when running rootless system sockets, timers, services. Normally these directories would be created through the “systemctl --user” commands but that’s not possible in the limited shell environment of a kickstart file. To workaround this limitation, directories and softlinks are created directly.
176 cat << EOF > /var/home/core/.config/systemd/user/container-httpd-proxy.socket 177 [Socket] 178 ListenStream=$VIP_IP:8080 179 FreeBind=true 180 181 [Install] 182 WantedBy=sockets.target 183 EOF
Lines 176 through 183 define the socket listener for web requests. Line 178 binds the socket listener to the virtual IP address and port 8080. Since the virtual IP address is only configured for one edge device at a time, the FreeBind
option in line 179 allows the binding to occur even if the IP address is not yet configured for the device. This option is described in the ip(7) man page with the relevant section excerpted here.
IP_FREEBIND (since Linux 2.4)
If enabled, this boolean option allows binding to an IP address that is nonlocal or does not (yet) exist. This permits listening on a socket, without requiring the underlying network interface or the specified dynamic IP address to be up at the time that the application is trying to bind to it.
187 cat << EOF > /var/home/core/.config/systemd/user/container-httpd-proxy.service 188 [Unit] 189 Requires=container-httpd.service 190 After=container-httpd.service 191 Requires=container-httpd-proxy.socket 192 After=container-httpd-proxy.socket 193 194 [Service] 195 ExecStart=/usr/lib/systemd/systemd-socket-proxyd 127.0.0.1:8080 196 EOF
Lines 187 through 196 define the proxy service that accepts the file descriptor from the socket and then listens for additional http requests. The proxy forwards requests to the container web server application on the localhost address. The Requires
and After
directives define the dependencies and ordering so that components are started in the correct order when a request is received.
203 cat > /var/home/core/.config/systemd/user/container-httpd.service <<EOF 204 # container-httpd.service 205 # autogenerated by Podman 3.0.2-dev 206 # Thu May 20 10:16:40 EDT 2021 207 208 [Unit] 209 Description=Podman container-httpd.service 210 Documentation=man:podman-generate-systemd(1) 211 212 [Service] 213 Environment=PODMAN_SYSTEMD_UNIT=%n 214 Restart=on-failure 215 TimeoutStopSec=70 216 ExecStartPre=/bin/rm -f %t/container-httpd.pid %t/container-httpd.ctr-id 217 ExecStart=/usr/bin/podman run --conmon-pidfile %t/container-httpd.pid --cidfile %t/container-httpd.ctr-id --cgroups=no-conmon --replace -d --label io.containers.autoupdate=image --name httpd -p 127.0.0.1: 8080:80 ${HOSTIP}:5000/httpd:prod 218 ExecStartPost=/bin/sleep 1 219 ExecStop=/usr/bin/podman stop --ignore --cidfile %t/container-httpd.ctr-id -t 10 220 ExecStopPost=/usr/bin/podman rm --ignore -f --cidfile %t/container-httpd.ctr-id 221 PIDFile=%t/container-httpd.pid 222 Type=forking 223 EOF
Lines 203 through 223 define the systemd service for the container web server. This content was auto-generated as described in the above linked article using the command podman generate systemd
. These settings were then tailored. Notably, line 217 adds the label io.containers.autoupdate=image
to enable auto update of the container application, discussed in more detail below. Line 218 adds a short pause to ensure that the web server, being launched as a container by podman, is fully up and ready to receive an http request.
240 cat > /var/home/core/.config/systemd/user/podman-auto-update.service <<EOF 241 [Unit] 242 Description=Podman auto-update service 243 Documentation=man:podman-auto-update(1) 244 245 [Service] 246 ExecStart=/usr/bin/podman auto-update 247 248 [Install] 249 WantedBy=multi-user.target default.target 250 EOF 251 252 # This timer ensures podman auto-update is run every minute 253 cat > /var/home/core/.config/systemd/user/podman-auto-update.timer <<EOF 254 [Unit] 255 Description=Podman auto-update timer 256 257 [Timer] 258 # This example runs the podman auto-update daily within a two-hour 259 # randomized window to reduce system load 260 #OnCalendar=daily 261 #Persistent=true 262 #RandomizedDelaySec=7200 263 264 # activate every minute 265 OnBootSec=30 266 OnUnitActiveSec=30 267 268 [Install] 269 WantedBy=timers.target 270 EOF
Lines 240 through 270 define both a system timer and corresponding service to update the container application if the container image in the registry differs from what’s currently running on the edge device. The timer and service periodically run the command podman auto-update
that then does all the work necessary to check if a changed image is in the registry, download the image content, stop the currently running container application, and finally restart the container application with the updated container image.
By default, this service is triggered every thirty seconds after boot but lines 260 through 262 illustrate how this can be run daily within a randomized two hour window so requests to the registry are “splayed.”
273 cat > /var/home/core/.config/systemd/user/pre-pull-container-image.service <<EOF 274 [Service] 275 Type=oneshot 276 ExecStart=podman pull $HOSTIP:5000/httpd:prod 277 278 [Install] 279 WantedBy=multi-user.target default.target 280 EOF
Lines 273 through 280 implement an optimization to pre-pull the container images and speed up starting the container web server application when a request is received. This runs once at boot.
282 # enable socket listener 283 ln -s /var/home/core/.config/systemd/user/container-httpd-proxy.socket /var/home/core/.config/systemd/user/sockets.target.wants/container-httpd-proxy.socket 284 285 # enable timer 286 ln -s /var/home/core/.config/systemd/user/podman-auto-update.timer /var/home/core/.config/systemd/user/timers.target.wants/podman-auto-update.timer 287 288 # enable pre-pull container image 289 ln -s /var/home/core/.config/systemd/user/pre-pull-container-image.service /var/home/core/.config/systemd/user/default.target.wants/pre-pull-container-image.service 290 ln -s /var/home/core/.config/systemd/user/pre-pull-container-image.service /var/home/core/.config/systemd/user/multi-user.target.wants/pre-pull-container-image.service
Lines 282 through 290 manually create the various soft links to enable the rootless systemd sockets, timers, and services to run. This is necessary since it’s not possible to run the systemctl --user
commands during kickstart as there is no logged in user or fully configured shell.
293 chown -R core: /var/home/core 294 restorecon -vFr /var/home/core
Lines 293 through 294 ensure that both discretionary and mandatory access controls are correct for the rootless systemd unit files.
297 cat << EOF > /etc/systemd/system/enable-linger.service 298 [Service] 299 Type=oneshot 300 ExecStart=loginctl enable-linger core 301 302 [Install] 303 WantedBy=multi-user.target default.target 304 EOF 305 306 systemctl enable enable-linger.service
And wrapping this section up, lines 297 through 306 create a service run once at boot to enable linger for user core
. Linger enables rootless systemd sockets, timers, and services to run whether the associated user is logged in or not.
Greenboot and automated rollbacks
RHEL for Edge supports conditional checks to be run during system startup to determine if the current operating system image meets all operational requirements. The requirements are defined in the /etc/greenboot directory and enforced via the greenboot facility, a generic health check framework for systemd. Greenboot is implemented as simple shell scripts in a prescribed directory structure that return pass/fail results. The directory structure is shown below:
/etc/greenboot +-- check | +-- required.d /* these scripts MUST succeed */ | +-- wanted.d /* these scripts SHOULD succeed */ +-- green.d /* scripts run after success */ +-- red.d /* scripts run after failure */
All scripts in the required.d
directory must return a successful result for startup to proceed. If there's three failed boot attempts, the operating system will be rolled back to the previous version and restarted. Scripts within the wanted.d directory may succeed, but they won't trigger a rollback if they fail. The green.d directory contains any scripts that should run as part of a successful boot and the scripts in the red.d directory will run if there’s a failure.
The demonstration uses a custom greenboot check script to control whether a rollback occurs to the previous version of the operating system. This is explained below.
322 mkdir -p /etc/greenboot/check/required.d 323 cat > /etc/greenboot/check/required.d/01_check_upgrade.sh <<EOF 324 #!/bin/bash 325 326 # 327 # This test fails if the current commit identifier is different 328 # than the original commit 329 # 330 331 if [ ! -f /etc/greenboot/orig.txt ] 332 then 333 rpm-ostree status | grep -A2 '^\*' | grep Commit > /etc/greenboot/orig.txt 334 fi 335 336 rpm-ostree status | grep -A2 '^\*' | grep Commit > /etc/greenboot/current.txt 337 338 diff -s /etc/greenboot/orig.txt /etc/greenboot/current.txt 339 EOF 340 341 chmod +x /etc/greenboot/check/required.d/01_check_upgrade.sh
Lines 322 to 339 define the script itself and line 341 ensures its executable by the greenboot facility. At system startup, the file orig.txt
is created only if it doesn’t already exist. This file contains the unique commit identifier of the current operating system image. On each startup, the file current.txt
is overwritten with the latest commit identifier for the current operating system image. The two files are then compared, and if they differ, the greenboot check fails.
This mechanism prevents an upgrade from occurring. During the demonstration, the system will attempt three boots with an updated operating system image and then rollback to the previous image. To enable a successful upgrade, simply remove the orig.txt
file before attempting the upgrade.
Conclusion
This blog article fully describes the configuration of the RHEL for Edge servers supporting the demonstration. This includes the various systemd timers, sockets, and services to enable rootless, on-demand container web server; automated failover from primary to backup edge servers; automated updates of the container web server; and automated operating system image upgrades with rollback on failure. RHEL provides a very complete and extremely lightweight approach to support computing at the edge.
关于作者
产品
工具
试用购买与出售
沟通
关于红帽
我们是世界领先的企业开源解决方案供应商,提供包括 Linux、云、容器和 Kubernetes。我们致力于提供经过安全强化的解决方案,从核心数据中心到网络边缘,让企业能够更轻松地跨平台和环境运营。