As a Solution Architect, I’m often asked what Red Hat’s best practices are for patch management. In this article, I'm going to cut through the noise, linking to relevant work and materials where appropriate, to offer some focused guidance around what exactly a best practice is and what tools you can leverage as part of your patch management toolkit.
After reading this article, you'll have a clearer idea about the tools and approaches you can leverage to deliver patches—and the best practices around defining that process—for your organization.
Calling something a "best practice" is admittedly a little presumptuous. What's best for one organization may not be best for another. So rather than label one approach as "the best", I find it better to discuss the process in terms of the most appropriate for a given level of risk. When you talk about patch management, you're ultimately talking about risk management or change management.
Every organization needs an approach to addressing risk. But each one needs a unique approach for how that gets defined, what they focus on and how they weigh various business constraints, impacts or outcomes.
I suggest "good practices" as a more appropriate term. For example, the Automation Community of Practice publishes an Automation Good Practices document based on insights from helping customers in the field.
What can we leverage as part of our patch management toolkit and approach?
There are tools and methodologies for defining how to manage patches in your infrastructure.
Red Hat breaks down its code or content changes (errata) into three different categories. Each category has a different purpose and a different rate of change. Depending on your organization’s goals and tolerance for change, one option might align to an objective and rate of change better than another.
- Red Hat Security Advisory (RHSA): Urgent changes have been made to protect your systems
- Red Hat Bug Advisory (RHBA): Fixes for bugs that may or may not have affected you
- Red Hat Enhancement Advisory (RHEA): New features have been added
For more detail, see Red Hat’s security backporting practice, and what is backporting, and how does it apply to RHEL and other Red Hat products?
Using this principle, you can isolate errata by type and risk, and then choose what patches you consider required and what you consider optional..
For example, if there is an internet-facing infrastructure or a DMZ (demilitarized zone) that has a high security profile, you could choose to apply just the Red Hat Security Advisory errata as often as one is released in that environment. This can address the needs of a high-risk infrastructure while breaking up the change into small and easy-to-consume batches.
These links are examples of how to apply just the security errata in a patch process:
- Is it possible to limit yum so that it lists or installs only security updates?
- How to apply security errata from Red Hat Satellite?
An additional approach to consider is to patch in subsets of packages to reduce change risk even further. This also allows the flexibility of a rollback, should you find that the patch interacts with your system in unexpected ways. As with any approach, there are trade-offs to be aware of and to manage. This might be too granular of a process for some, while for others this may better fit the tolerance for change.
This article provides a thorough guide on how to use Red Hat Satellite to enable and manage a standard operating environment. It covers every topic in great detail, and is meant to be a reference point to answer the most common questions about best practices. While this content was written several years ago when Red Hat Satellite 6 was first introduced, its foundational concepts still apply today.
Content management and content staging are of the highest importance to the patching process. These set the foundation for how patch content is presented to infrastructure, and play a big role in the patching process for risk management and risk reduction.
Focus particularly on Satellite's Lifecycle Environments and Content Views. These are critical concepts to the staging of content, and its promotion (or rollback) to infrastructure.
Secondarily, focus on Organizations, Locations and Host Groups for classifying infrastructure and inventory. Far too often, I see folks get too granular with Lifecycle Environments or Content Views, trying to use these structures to solve for classifying geographic locations, cloud environments, similar infrastructure, or different teams within an organization.
Live kernel patching has been available for several years and it's a part of several major RHEL releases. You can live-patch a kernel without a reboot in order to address a vulnerability very quickly. This provides administrators the ability to address risk immediately, and an opportunity to schedule a more appropriate maintenance window when a reboot can be taken.
This can also be orchestrated with Red Hat Satellite.
Depending on the update and errata content, there may be no reason to reboot after a patch. Sometimes, restarting a service after an update might be all that's needed to make use of an updated binary.
Beginning in RHEL 7, yum-utils includes a plug-in called
needs-restarting. This utility reports whether a reboot is needed after applying patches. This can help you understand the need for change in a system, and reduce the need to perform reboots when patching.
Satellite also has a capability called Tracer. Tracer accomplishes the same thing from the Satellite perspective, helping administrators identify applications that need to be restarted after a system is patched.
5. Consider opening a proactive Red Hat Support case
Having advanced knowledge of a given maintenance activity, and the opportunity to do information and data gathering in advance, can reduce downtime and some of the risk associated with patching.
Opening a proactive support case can significantly reduce the troubleshooting time necessary for the recovery of a service, as well as provide an opportunity to review any procedures or approaches ahead of time. This can have a measurable impact on reducing outages and MTTR (mean time to recovery) metrics.
6. Automate it!
It's no secret that replacing manual steps with an automated approach can reduce human error, increase reliability, and speed up the patching process. Automation is a key differentiator in adopting the appropriate best practices for your organization.
Red Hat Satellite has the capability to use remote execution with Ansible. Red Hat Satellite can run Ansible Playbooks and Roles over its RHEL inventory, and can easily automate and schedule patching activity.
In fact, with the official
redhat.satellite Ansible collection provided in Red Hat Ansible Automation Platform, you can automate and orchestrate the entire content management side of Satellite in addition to applying patch content after it's been staged.
Red Hat has published An Open Approach to Vulnerability Management, a concise yet comprehensive methodology on how we as an organization approach vulnerability management. Risk management is a decision you must make after considering and weighing business constraints, impacts and outcomes.
The question is: Do all vulnerabilities really matter? This article highlights some of the realities and trade-offs related to evaluating constraints, impacts, and outcomes. This is a good example of weighing the desire for outcomes against the cost and benefit of achieving them. While every risk should be considered, not everything should be weighed the same, and no one has unlimited resources to address every single risk. This example is about prioritizing what is most important and having a strategy for classifying risk that is worth mitigating, and risk that is comfortable to live with.
Simply put, you need iteration and commitment. None of the examples in this article were the first attempt at a strategy. They were each revisited and refined over and over until we found a risk management balance that was right for Red Hat. And these above methods continue to be refined as we learn new things and as the industry landscape changes. It's as important to continue to evaluate risk tolerance and posture as it is to have an approach defined in the first place. The continuous tailoring and refinement are what really puts the "best" in "best practice".
About the author
Andrew Ludwar is an enthusiastic open source advocate, with a background in systems administration and enterprise architecture. He's been in the IT and open source field for 18+ years spending most of his time in the telecommunication and energy industries. Andrew holds a B.Sc in Computer Science and a Master's certificate in Systems Design and Project Leadership. Ludwar works for Red Hat as a Senior Solutions Architect helping customers in Western Canada.