How we keep our Linux systems patched with automation
An automated patch-management system helps keep your server infrastructure patched and maintained in a timely manner.
Keeping critical systems patched and updated is one of an IT team's most important duties. However, patching servers manually, or even automating patching using advanced tools such as Ansible on its own, is not the best way to protect your systems from vulnerabilities. A better solution is to implement a Linux patch-management system with automation.
[ Learn how to manage your Linux environment for success. ]
This article focuses on how our team at IBM does Red Hat Enterprise Linux (RHEL) patch management using Red Hat Satellite in conjunction with Ansible automation. Red Hat Satellite Server is the recommended patch management solution for RHEL servers. These servers comprise 33% of the paid enterprise server operating system market, making RHEL the leading Linux platform in the enterprise.
What is Red Hat Satellite?
As its FAQ says, "Red Hat Satellite Server is a system management solution that makes Red Hat infrastructure easier to deploy, scale, and manage across physical, visual, and cloud environments." Satellite allows our team to provision, configure, and update systems to ensure they run efficiently and comply with relevant standards.
Many large companies have thousands of RHEL servers in their datacenters, all of which must be patched regularly and without causing critical outages. Satellite's architecture provides essential features to improve patching aligned with an organization's designated lifecycle environment and needs, including:
- Content repositories of packages that can be made available to any host
- Distribution of content as close as possible to the endpoint
- Reports on hosts that need updates, fixes, or enhancements
- Groups for systems based on criteria that are important to an organization
- Scalable automation to respond quickly to patching requirements
Scheduled patching with lifecycles
One advantage we've found from using Satellite versus patching servers manually or with other automation methods is the ability to use lifecycles, which automate the patching schedule and process.
For example, you could set up a development (DEV) and production patching lifecycle that enables the Satellite server to sync RHEL content directly from Red Hat each week. It first promotes content from the initial sync's base library to the DEV lifecycle using Ansible playbooks within Ansible automation controller. Two weeks later, the playbook automatically promotes this content into production, giving the DevOps teams two weeks to test and validate patches against the DEV servers and verify they are safe to promote into production.
[ Learn more about server and configuration management by downloading Ansible for DevOps. ]
This is a critical step in protecting your production environments from introducing patches that can cause unwanted outages. This method also allows you to lock repositories during the lifecycles to ensure all DEV or production servers have the same patches.
How we patch with Satellite and Ansible
At IBM, we use capsule servers to deploy patches throughout our environment. The main Satellite server is located in one of our datacenters in the United States, and we have multiple capsule servers within the United States and several locations worldwide. This allows patching to be as close as possible to each host. The capsule servers sync the repositories in Satellite to each region and provide the same patch level to all hosts, no matter where they reside in the world.
We can use Ansible automation controller or Satellite's built-in Ansible functions to automate the entire server patching process with playbooks.
[ You might also be interested in reading How to automate Linux patching with Ansible. ]
We use both Ansible automation controller and the built-in Ansible playbook feature for patch management. First, a playbook in Ansible automation controller scans all servers in the environment to identify the RHEL servers. Next, the playbook determines if the RHEL server is registered properly. If it is, nothing is done.
If not, the server is unregistered and cleaned, and the local capsule server installs the proper Katello certificate, which automatically configures the host and installs the subscription manager along with its dependencies if they are missing. Then the server is registered using the appropriate activation key from Satellite. The playbook looks at which version of RHEL is installed and how the device is classified in ServiceNow to determine if this server should be in the DEV or production lifecycle. This playbook runs monthly to ensure all servers remain in compliance with registration.
[ Download now: A system administrator's guide to IT automation. ]
After running this patching automation for a few weeks, we found several standard issues that caused patching to fail in our environment. We developed another playbook using Satellite's internal Ansible that validates there is enough space on the server to accommodate a kernel patch. We run this playbook the day before a patch window, and if it finds insufficient space, it cleans out older kernels. It then checks to see what repositories are active on the server. If it finds one that is not one of the approved repositories required for proper server operations, then it renames the repository to disable it. This prevents broken repositories from causing the entire server to fail during the patch window. These two automated tasks significantly reduce the operations team's time manually resolving server patching issues.
The final stage of our patch cycle is the patching playbook, also done with Satellite's internal Ansible. We schedule the playbook to run ahead of time against a host collection, a group of servers set to be patched. There are multiple groupings, depending on if they are DEV or production servers and on their patch window's scheduled day and time. The playbook validates the server is ready, performs any last-minute cleaning activities, and deploys security and bug-fix patches on the servers in the group at the specified time. Afterward, the playbook checks to see if a reboot is required; if so, it initiates the reboot and reports back once the server is online.
[Cheat sheet: Old Linux commands and their modern replacements ]
Integration with Red Hat Insights
One thing we really like about Satellite is that it also integrates with Red Hat Insights, which uses advanced analytics to help detect risks within an environment before they can become a problem. Insights identifies security risks on a server and provides instructions on resolving the issue. Many of these issues can be resolved with the help of Ansible playbooks written to mitigate the security findings. You can do all of this through the standard Satellite interface.
Results of automating patches with Satellite and Ansible
When we started using Satellite instead of patching servers using only Ansible playbooks, we noticed a dramatic decrease in the time needed to accomplish a task. Previously, patching took the operations team an entire weekend; now, it takes only two hours. This is due to Satellite's advance work to ensure servers are registered properly and resolve most known failures ahead of time.
We also found that having seven different capsule servers patching hundreds of servers simultaneously greatly improved our overall patching efficiency. Initially, we had a 95% success rate across about 3,500 RHEL servers around the globe. This percentage rose with each successive patch cycle as we resolved one-off issues on servers with patching failures.
Another key benefit of automation and using Satellite is now the operations team touches far fewer servers during the patch window. Not only does this make their job easier, but it also reduces problems as there is less manual interaction (and therefore fewer human-caused errors) with the servers.
A properly configured patch management system is critical to keeping a RHEL server infrastructure patched and maintained in a timely manner. Because Satellite is included if you have RHEL entitlements with Smart Management, you may be able to deploy this at little to no cost.
[ No-cost online course: Red Hat Enterprise Linux technical overview. ]
This article originally appeared on Hybrid Cloud How-tos and is republished with permission.
Use automation to reduce the time IT teams spend deploying patches and apply updates consistently across systems.
Collaborate on file changes, with no Git hosting service necessary, using the Linux git diff and patch commands.
Automation allows you to apply compliance and security policies consistently across your servers, verify compliance, and remediate servers.