Introducing hirte, a deterministic multi-node service controller

26 avril 2023Pierre-Yves Chibon, Dan Walsh8 minutes (temps de lecture)

A few months ago, we discussed the challenges of running containers in cars. Some of the challenges included:

Non-deterministic systems in environments requiring a high level of determinism on fixed hardware
Multiple nodes that require coordinated state changes
Differences in handling failures and degraded mode
Performance overhead of Kubernetes, along with the manager being written in the Go language, makes it much harder to achieve FuSa certification

With these challenges in mind, the usual container orchestration solution is not ideally suited for cars.

In this article, we are pleased to introduce hirte, a deterministic multi-node service controller. Hirte was designed for highly-regulated industries, such as automotive and others that have functional safety requirements. Hirte manages the states of different services across multiple nodes, integrating with systemd via its D-Bus API and relaying D-Bus messages over TCP for multi-node support.

In Podman at the edge, Valentin Rothberg wrote, "Running containerized workloads in systemd is a simple yet powerful means for reliable and rock-solid deployments [..] This integration allows systemd to manage service dependencies, monitor the lifecycle and service state, and possibly also restart services in case of failure."

And now it is easier than ever to run containers using systemd thanks to Alexander Larsson’s Quadlet work in Podman 4.4.

By integrating hirte with Quadlet, Podman and systemd, you have the magical quartet you need to build a strong architecture to manage services and containers in safety-critical environments.

Hirte components

Hirte is built around three components:

The hirte service, which is the primary controller that runs on the primary node
The hirte-agent services, which run on each managed node and are the agents that talk locally to systemd to instruct systemd to act on services
The hirtectl command line, which is used by administrators to test, debug and manually manage services across nodes

Hirte is meant to be used in conjunction with a state manager (a program or person) that knows the desired state of the systems. This design choice has a few consequences:

Hirte does not know the desired final state of the systems; it only knows how to transition between states, such as how to start, stop or restart a service on one or more nodes
Hirte monitors and reports changes in the services that are running and alerts the state manager when a service stops or when the connection to a node is lost, but hirte itself does not act on these notifications
Hirte does not handle the initial setup of the system; the system boots into a desired state and hirte handles the transitions from this state

The state manager program integrates with hirte over D-Bus. The state manager tells hirte to perform actions or to receive the outcome of actions. Hirte monitors services and nodes via D-Bus and reports state changes back to the state manager. Administrators can use the hirtectl interface to avoid interacting directly with hirte via D-Bus.

Hirte architecture

The state manager talks to hirte over the system D-Bus daemon. In turn, hirte talks to the hirte-agents using a dedicated D-Bus connection over TCP/IP for remote nodes. The hirte-agent running on each node talks to systemd using D-Bus via the systemd UNIX domain socket. Systemd then interacts with the local services and notifies the agent over the system-wide D-Bus. The agent forwards the notification to hirte on the primary node, which then replays it on the D-Bus daemon for the benefit of the state manager or any other program listening to it.

This graphic visualizes the hirte architecture and workflow:

Illustration of the hirte architecture and workflow

Using hirte

In the following sections, we’ll demonstrate how to install, configure and use hirte on a two-node system. The primary node, with the IP address 192.168.42.10, runs on a laptop. The worker node, with the IP address 192.168.42.15, runs on a Raspberry Pi 4, but you could run it on any system.

Installing hirte

On Fedora systems, install hirte directly from the Fedora repository. On CentOS Stream systems, install hirte using the COPR repository that shipped the latest hirte code.

To enable the COPR repository:

dnf copr enable mperina/hirte-snapshot centos-stream-9

NOTE: This step will no longer be necessary after hirte is made available in EPEL.

On the laptop, install both hirte and the hirte-agent:

dnf install hirte hirte-agent hirtectl

On the Raspberry Pi 4, install only the agent:

dnf install hirte-agent

Configuring hirte

After hirte and its agents are installed, you must configure them so that the agents know where the primary node is located and hirte can identify the nodes.

The configuration files for hirte are under: /etc/hirte/.

On the laptop where both hirte and hirte-agent run, configure hirte in /etc/hirte/hirte.conf:

[hirte]
ManagerPort=2020
AllowedNodeNames=laptop,rpi4

NOTE: The default port used by hirte is 842, which is considered a privileged port that must be open in the firewall. To simplify this demo, we used port 2020. This non-privileged port does not require a firewall change if you use Fedora with the default settings.

Configure the agent in /etc/hirte/agent.conf:

[hirte-agent]
NodeName=laptop
ManagerHost=127.0.0.1
ManagerPort=2020

NOTE: The IP address in the ManagerHost line can be either 127.0.0.1 (ie: localhost) or the public IP address of the node, 192.168.42.10.

On the Raspberry Pi 4, where only the agent is running, configure /etc/hirte/agent.conf:

[hirte-agent]
NodeName=rpi4
ManagerHost=192.168.42.10
ManagerPort=2020

In this case, the IP address in the ManagerHost line refers to the IP address of the laptop, because that's where hirte runs.

Starting hirte and the agent

Starting hirte and the agent is now as simple as starting normal systemd services.

Start hirte and the hirte-agent using systemd on the primary system, the laptop:

systemctl start hirte hirte-agent

Start the hirte-agent via systemd on the worker node, the Raspberry Pi 4:

systemctl start hirte-agent

On each system, monitor the services accordingly:

journalctl -lfu hirte

journalctl -lfu hirte-agent

After the services are running, the laptop logs show that the Raspberry Pi successfully connected to the laptop:

Mar 13 10:20:36 flame.pingoured.fr systemd[1]: Started hirte.service - Hirte systemd service controller manager daemon.
Mar 13 10:20:36 flame.pingoured.fr hirte[124510]: 10:20:36 INFO    ../src/manager/node.c:602 node_method_register    msg="Registered managed node from fd 8 as 'laptop'"
Mar 13 10:20:38 flame.pingoured.fr hirte[124510]: 10:20:38 INFO    ../src/manager/node.c:602 node_method_register    msg="Registered managed node from fd 9 as 'rpi4'"

Testing hirte

Now you can use hirtectl on the laptop to control services running on either node.

In this example, we are going to start httpd on the raspberry pi 4 from the laptop. Before doing so, let's verify which services are running:

# hirtectl list-units
NODE            |ID                                                     |   ACTIVE|  SUB
====================================================================================================
laptop          |time-sync.target                                       | inactive| dead
laptop          |nfs-idmapd.service                                     | inactive| dead
laptop          |sys-devices-platform-serial8250-tty-ttyS5.device       |   active|  plugged
laptop          |dev-disk-by\x2did-wwn\x2d0x5001b448b9db9490\x2dpart3.device|   active|  plugged
laptop          |podman.socket                                          |   active|listening
....

NOTE: This returns the list of all units and their state on all nodes.

Restrict the list to a certain node by running:

# hirtectl list-units rpi4
ID                                                                          |   ACTIVE|  SUB
====================================================================================================
systemd-update-done.service                                                 |   active|   exited
boot.mount                                                                  |   active|  mounted
dbus-broker.service                                                         |   active|  running
system-getty.slice                                                          |   active|   active
sshd-keygen@ecdsa.service                                                   | inactive| dead
....

Verify the status of httpd on the Raspberry Pi.

If nothing shows, it means the httpd.service is not running and you can start it:

# hirtectl start rpi4 httpd.service

On the Raspberry Pi, the logs of hirte-agent shows the service being started:

Mar 13 10:21:05 Host-002 hirte-agent[1556]: 10:21:05 INFO    ../src/agent/agent.c:836 agent_run_unit_lifecycle_method    msg="Request to StartUnit unit: httpd.service - Action: replace"

Verify the outcome from the laptop:

# hirtectl list-units rpi4 |grep httpd
httpd-init.service                                                          | inactive| dead
httpd.service                                                               |   active|  running

In addition to using hirtectl to start, stop, restart or reload units, you can also use hirtectl to list units on any or all nodes. All of those actions can also be performed independently by using the hirte D-Bus API. The repository on GitHub contains a few examples of that D-Bus API.

Conclusion

Hirte controls services across multiple systems with very limited overhead. Services can be regular processes or containerized applications. In environments where determinism is a requirement, where the workload can be statically assigned to certain systems, and where the services running may change over time, hirte provides a great solution.

À propos des auteurs

Pierre-Yves Chibon

Principal Software Engineer, Automotive

Pierre-Yves Chibon (aka pingou) is a Principal Software Engineer who spent nearly 15 years in the Fedora community and is now looking at the challenges the automotive industry offers to the FOSS ecosystems.

Read full bio

Dan Walsh

Senior Distinguished Engineer

Daniel Walsh has worked in the computer security field for over 30 years. Dan is a Senior Distinguished Engineer at Red Hat. He joined Red Hat in August 2001. Dan leads the Red Hat Container Engineering team since August 2013, but has been working on container technology for several years.

Dan helped developed sVirt, Secure Virtualization as well as the SELinux Sandbox back in RHEL6 an early desktop container tool. Previously, Dan worked Netect/Bindview's on Vulnerability Assessment Products and at Digital Equipment Corporation working on the Athena Project, AltaVista Firewall/Tunnel (VPN) Products. Dan has a BA in Mathematics from the College of the Holy Cross and a MS in Computer Science from Worcester Polytechnic Institute.

Read full bio