A little over a year ago, Red Hat announced its intention to collaborate with the automotive industry to help drive the transition to software-defined vehicles (SDVs). In May 2022, this intention became concrete with Red Hat and General Motors announcing their collaboration to help trailblaze SDVs at the edge. Our goal is to produce a base operating system to run all sorts of in-vehicle software for safety-critical use cases as well as non-safety ones.
Containers have become the de facto standard in the wider IT industry, and they are front and center in the vision of the software-defined vehicle, allowing for applications to be isolated, providing more flexibility for developers and deployment, and generally allowing for faster innovation. Red Hat leads the work on containers in the cloud, and now it’s time to take that expertise to the automotive industry.
The word “container” can mean many things, however, and the meaning often varies from person to person and from organization to organization. This ranges from low-level details like "containers mean kernel support for application isolation" to large-scale ideas like "containers mean support for running distributed software on planet-scale clusters."
While all these meanings make sense for some use cases, not all are necessarily applicable to the in-vehicle use case. This article discusses some automotive industry-specific requirements and how we can take advantage of containers in those cases.
One of the core differentiators for software in the automotive industry is the requirements around functional safety. The main objective of functional safety is to ensure that a product is free from unreasonable risk. Functional safety extends beyond the classic hardware and software questions of "Does the product do what users expect?" (quality) to "Could this product fail due to malfunctioning behavior and result in physical harm to people?" (safety). Theoretically, the safest car is the one that's never driven, but this is obviously impractical. This means that it’s not just quality, but also availability, reliability, security and functional safety that serve as the key factors in developing an automotive experience.
Linux, and Red Hat products in particular, are widely used and tested, so the level of quality is generally high. But functional safety is not typically addressed in traditional open source development. To prove that a system meets the required level of “safe,” the developer must make safety arguments for all of the risks to be addressed in the entire system. This is a journey that Red Hat started at the beginning of 2021.
The more complex the system is, the harder it is to make a functional safety argument about it. So, it helps to have less code and use simpler constructs and techniques, like formal methods.
Not all software on a system needs to have the same safety requirements. Typically, there is a mix of safety levels. This is referred to as "mixed criticality". Containers, by their nature, allow systems to run multiple types of workloads and applications, safety-certified or not. This flexibility makes container technologies an ideal solution for mixed criticality. However, to run functionally-safe code in a container, the container runtime and management system must also be functionally safe.
In order to deliver a system where functionally-safe code can run in containers, we need to be very careful of what goes into the container runtime and management system.
The concept of containers is very often used in distributed systems. A distributed system is one in which many loosely-coupled and typically geographically-distributed systems are combined into a cluster. Distributed systems are also generally scalable, which means that as the requirements change over time, you can extend the resources at run time by adding more computers to the cluster.
Distributed systems are complex because the network is unreliable and concurrent. For example, individual computers can disagree or fail, or a network failure can partition the cluster into separate subsets, and distributed systems need to account for these situations.
This complexity is handled by resource over-allocation and complex algorithms, such as voting, and a concept called "eventual consistency." Eventual consistency means that the system might not always be in a consistent state, but there is always progress toward it.
Overall, distributed systems are not deterministic (for example, individual runs can execute in different orders, or on different systems leading to different code paths being executed without knowing in advance which will be). This also increases the complexity of the system, which, in turn, adds to the difficulty in crafting a functional safety argument about the system as a whole.
For many use cases, the complexities of using distributed systems are worth it because of the enormous flexibility and dynamic properties you gain with such systems. Furthermore, many of these use cases must remain distributed due to the nature of the problem they exist to solve. For example, Netflix streaming around the world will always be distributed.
On the other hand, the software in a car is not distributed in any real sense. The hardware is fixed, meaning it doesn't change while the car is in use, so there is also no real scalability to worry about. Also, the typical way of handling failures by running the system in a previously untested degraded mode is generally not safe. Any such degraded state in a car should be deemed unsafe and switch to a safe mode (such as slowly coming to a stop) instead of continuing in a best-effort mode.
Much of the container ecosystem is focused on distributed systems that are simply not suitable for in-vehicle use cases. In fact, the complexity that is derived from the strengths of these systems works against in-vehicle use cases from a functional safety perspective.
So, what are the actual requirements of containers in cars? This, of course, depends on exactly what you want to do. We have been in discussions with key players in the automotive industry, and some commonalities have emerged.
First, we believe that containers will be an integral part of the evolution of the automotive industry towards software defined vehicles. We also believe that these containers must be the same as those that run elsewhere, such as on your laptop or the cloud. This is important for both the developer experience and to help make testing in the cloud easy and cost-effective.
Second, the containers need to be efficient in terms of resource use. Modern cars have powerful computers compared to older vehicles, but there are still some hard resource limits.
Third, there needs to be some form of high-level container management. For example, the required containers must start when the car starts, in the correct order, and with sufficient monitoring to enforce that they behave as expected.
Fourth, there is the expectation that the system as a whole is in one of a fixed set of global states, each with its own set of workloads running, and that the main system will be able to transition the entire car between these states. All of these states need to be scheduled and validated ahead of time to confirm that they fit the resource requirements.
Finally, the entire container and management system must be simple enough to make a supportable safety argument that proves we can run functionally-safe code in containers onboard a vehicle.
Red Hat has long been working on the Podman project. Podman is a fully-featured container system that is a core part of Red Hat OpenShift, which is based on Kubernetes. Podman is also well tested, with millions of hours of runtime in critical systems around the world.
We believe Podman will make a great container system for in-vehicle use, particularly in combination with crun which is a new, lighter-weight container runtime that is enabled by default in RHEL 9.
When it comes to orchestration, while OpenShift is a very powerful product in the cloud sphere, it is not a great fit for the in-vehicle use case because it is targeted for distributed system use cases. But Podman also integrates well with systemd, which is the regular service manager in RHEL. Systemd is a much better fit for the requirements we have seen in terms of in-vehicle container management. In particular, it is much simpler, more deterministic and has native support for the kind of global state transitions that are required in cars.
Kubernetes is the tool of choice when it comes to cloud work, including things like developer/testing experience among other services and solutions. It would be beneficial if we could integrate and reuse all the available software, workflows, experience and concepts that make sense in the in-vehicle use case and the automotive industry.
An example can be found in this recent work on Podman to support Kubernetes application descriptions when used together with systemd. This leads to an ideal combination in which Kubernetes can be used in the cloud, and the same Kubernetes descriptions can then be used in a vehicle without the overhead and complexity of Kubernetes itself. This would be the first step to joining the best of both worlds: cars and the cloud.
Here we've described the high-level requirements gathered from talking with a number of companies in the automotive industry. We also hinted at which solutions we are looking at to satisfy these requirements. If you are interested in more technical details or in Red Hat’s vision for running in-vehicle containers, stay tuned. We'll talk about both of these things in future articles.