We introduced the new image mode for Red Hat Enterprise Linux (RHEL) a little over three months ago, but image-based operations aren’t particularly new. Most of the devices we use daily update via images. The container movement brought image-based updates to the application world. For the desktop world, image-based deployment and updates have become fairly common. Outside of a few areas like high performance computing (HPC), image-based operations for the server operating system (OS) haven't been particularly popular. While the tools may not have been mainstream, I think the problem has been us. No, not Red Hat, us the practitioners.
Those who went before
Why did we call it "image mode" in the first place? According to Oxford, “mode” is “a particular way of doing something.” This new thing we introduced is just that, a particular way of operating a RHEL host, but using an image-based workflow instead of the typical RPM package-based workflows.
Image-based operations are at least 25 years old, which was the first time I saw a commercially viable datacenter virtualization platform at LinuxWorld in San Jose. Virtualization is basically image based disks with some metadata to explain to an engine what virtual machine (VM) capabilities to create. Other than a few key features like snapshots and cloning, we treated VMs as if they were servers. In doing so, we also pushed the tools away from image operations toward the standard server-based tools we had in our pockets. This makes sense: Change is hard for political, budgetary, time and lots of other reasons.
We got a second bite at image-centered operations for servers with the introduction of cloud computing. Looking back, most of the naysayers for cloud computing were basing their opinions on the needs of the datacenter, and not wanting to change paradigms for the new ephemeral, image-based infrastructure. Ten years ago, I spent time in a workshop at OSCON to learn how Netflix created infrastructure in the cloud with their AMI Bakery tools like AMInator, Asgard and the Simian Army. All open source, all very automated, all very image oriented.
But again, change is hard, so we pushed for more familiar tools and methods that let us reuse our server-based work. This turned out to be less successful than the virtualization push because we no longer controlled the environment, we consumed it. But for most folks, their cloud operations were generally similar to their data centers unless they had an opportunity to start a fresh project with a new mindset. That has led to a new set of issues, but this is basically the world most of us inhabit today.
This little history lesson begs the question: If the old ways mostly still work fine, why would I ever change? That’s what I’d really like to address here. We have other places you can look to see how to start an experiment with image mode for RHEL, but I'd like to explain why I think you’d benefit from changing your point of view.
What’s in it for you?
There’s quite a few ways to think about the benefits of an image mode mindset, but here are a few that stand out to me.
Smoother updates
Single transactions are safer than ones that combine multiple components. I recently spent 30 minutes updating my Fedora laptop. It wasn’t the number of packages, nothing went completely sideways, I just had to run it twice. Why? Something transient. The software was (and is) fine. DNF (Fedora's package manager) did exactly what it was supposed to. One package install failed, so the whole transaction failed. Likely, the wifi hiccuped, or maybe some other CPU intensive process kicked off, or maybe disk usage spiked. In any case, it was completely environmental. And transient. I just reran the update, and everything went smoothly. Then I rebooted and got back to work.
If I’d been using something with image-based updates, that update would have been built off my machine and presented as a single update to be downloaded, prepared and set to be active on the next reboot. Thirty minutes of one day may not seem like a big deal, but let’s extrapolate my laptop to a group of 100 production servers during a maintenance window. Let’s say each DNF full transaction takes 10 minutes, you troubleshoot for 5, and reboots take 5 to get back to fully operational. This isn’t a serial process, of course, so we’re not talking 16+ hours, but every 15 minutes of troubleshooting and rerunning an update adds up quickly. What if you find you need to free up space in /var to be able to even download the packages on one system? How long will that take? Especially when you’re working under the constraints of a service level agreement (SLA).
Better recovery and visibility
What if there was a problem with the updates that were installed rather than a hiccup in the process? Let’s say one of the updates was a new version of OpenSSL, and there was a change that didn’t affect development but broke SSL certificates in production. Do you have a backup? Do you push through troubleshooting SSL in production? Is it your app? Image-based systems have a safety valve: the rollback. Since these are complete images on disk, if you have the previous working install available, you can switch back to the known good host and troubleshoot elsewhere. Oh, and since you know what image that production system was running, you can bring up an exact duplicate in a development environment for troubleshooting. You’ll know fairly quickly if it’s the system install, the app configs, the component configs or something else.
The ability to quickly determine exactly what versions of software are on any host \ is also a benefit of image-based operations. Drift happens when you can make small local changes. Drift happens when package versions aren’t controlled quite as tightly between environments. Drift happens when we start with a standard build but then layer on lots of changes to support different applications without creating downstream standards. Image-based systems track a known resource. The right tools allow you not only to see what’s on a host but also what’s in a particular image.
Simpler update operations, rollbacks and traceability are all great tools for control, but they’re also a little defensive. These will help smooth out operations, but is that enough? Can we move faster with image-based systems?
Faster experiments
If changing out entire operating systems on a host is made easy for updates, that translates to any reason you would want to change the role of a host. One of the biggest farms of servers I managed was always the development environment. We had to run several parallel stacks to support different experiments for the applications we built. Some were application-level changes, some were component-level changes like exploring new versions of Java. But each of these required completely new hardware and full installs of the app and data. What if, instead, you could roll a new version of the OS that had the new component and could drop it into an existing application stack? You’d be able to A/B test the latest breaking changes to your PHP app by just rebooting the app server.
More options
Speaking of A/B testing, or blue/green deployments, you could move features into production in controlled experiments the same way application deployments can today. You could build a fairly sophisticated set of operational models that are very difficult to do today with most package-based operating systems.
These are just some of the main reasons why I’m excited about this latest chapter of image-based operations. The tools have come a long way in 25 years, and image mode for RHEL matches a lot of the modern infrastructure we have available today. What could you do with a way to think differently about your operating systems?
What’s next?
If any of this sounds useful, then image mode for RHEL is something that you should explore. And adding containers to handle the creation and curation of images takes this even further beyond where image based operations are today.
More resources for image mode
If you’d like to learn more about image mode, why not try our quick start guide or one of the learning exercises? You can also explore on your own from any subscribed RHEL system that has podman installed.
关于作者
更多此类内容
产品
工具
试用购买与出售
沟通
关于红帽
我们是世界领先的企业开源解决方案供应商,提供包括 Linux、云、容器和 Kubernetes。我们致力于提供经过安全强化的解决方案,从核心数据中心到网络边缘,让企业能够更轻松地跨平台和环境运营。