Multicluster management has been a rapidly evolving part of ITOps over the past several years. As organizations deploy hundreds to thousands of clusters across distributed environments, it’s important they assess their options for platforms that can handle critical workloads at scale. Their goals include operational consistency, reduced manual intervention, improved security posture, and a streamlined, automated lifecycle. 

Red Hat integrates several key technologies to orchestrate a fully automated, security-focused, and efficient workflow for OpenShift environments to manage Day 2 operations. By combining Red Hat solutions, organizations can create scalable and automated operations that reduce manual intervention, minimize the risk of human error, and strengthen the overall security posture of large-scale OpenShift deployments. Solutions can include the following capabilities, delivered through several Red Hat product offerings: 

  • Red Hat Advanced Cluster Management for Kubernetes: It provides comprehensive, end-to-end management for multiple OpenShift clusters, simplifying operations, governance, and application lifecycle management across your hybrid and multicloud environment.
  • Topology aware lifecycle manager: A Kubernetes operator that facilitates software lifecycle management of fleets of clusters, including platform, operator, and configuration updates, by using Red Hat Advanced Cluster Management policies to support remediation.
  • Multicluster global hub: The central point serves to aggregate events from all connected hub clusters into a central Kafka instance, providing the infrastructure to transport, translate, and store the cloud events.
  • Red Hat Ansible Automation Platform: Provides the central platform to automate management at scale across the entire lifecycle of cluster operations, including credential retrieval, secret rotation, and workload orchestration.
  • Event-Driven Ansible: Ansible Automation Platform includes Event-Driven Ansible as a built-in capability. Event-Driven Ansible monitors real-time event streams to initiate automated responses to environmental changes as they happen. It detects external events or alerts, allowing you to design automated actions for these incidents, such as topology aware lifecycle manager events. This results in quicker responses to issues and dynamic conditions with consistency and precision. Technically, this integration uses Ansible Rulebooks configured with specific event source plugins. These rulebooks are designed to process CloudEvents, such as CguSuccess, CguTimedOut, and many more available in the documentation, emitted by topology aware lifecycle manager and aggregated by multicluster global hub. When a condition is satisfied, Event-Driven Ansible triggers the related automation, effectively separating event detection from the operational response.

Combined, these technologies provide a reliable solution to help simplify operations across a distributed OpenShift landscape. 

Figure 1. This diagram illustrates a hierarchical multicluster architecture where a global hub centralizes event orchestration via Ansible Automation Platform, Event-Driven Ansible, and  multicluster global hub, while leaf hubs manage the provisioning and event transmission for distributed managed clusters.

Figure 1. This diagram illustrates a hierarchical multicluster architecture where a global hub centralizes event orchestration via Ansible Automation Platform, Event-Driven Ansible, and  multicluster global hub, while leaf hubs manage the provisioning and event transmission for distributed managed clusters.

Use cases for event-driven automation with Red Hat OpenShift

By integrating topology aware lifecycle manager events with Event-Driven Ansible, organizations can move beyond reactive management to achieve a highly automated, efficient, and resilient OpenShift environment. Event-Driven Ansible acts as an automation mechanism for reacting to changing conditions, allowing you to design automated responses to critical points in an OpenShift cluster's lifecycle.

Here are some use cases and examples based on the integration of topology aware lifecycle manager and Event-Driven Ansible: 

  • Automated credential retrieval of new provisioned clusters: This solves the critical problem of securely obtaining credentials for any managed cluster, directly addressing the scalability challenges faced in large environments. A topology aware lifecycle manager event serves as a universal signal that a cluster is fully provisioned and ready for any subsequent client operation. Once triggered, Event-Driven Ansible can execute a playbook using 2 primary methods for secure credential acquisition: retrieving static Kubeconfig files or rotating authentication bearer tokens. Once obtained, the credentials are safely stored, making them immediately consumable by other operational teams or automated pipelines without manual intervention.
  • Automatic deployment of an application on cluster readiness: When we look at things technically, a topology aware lifecycle manager event is triggered once a cluster is fully provisioned and configured, enabling the automatic deployment of the right workload on top of that cluster. Ansible Automation Platform can execute a playbook to orchestrate the entire process in the provisioned OpenShift cluster, including tasks such as creating a namespace, deploying an application, and making the workload accessible the moment a CguSuccess event is received. We can also integrate custom health checks or verifications against the managed cluster.
  • Event-driven issue response: Event-Driven Ansible can evaluate specific topology aware lifecycle manager failure events, which are categorized by reasons such as CguTimedOut (the remediation failed and timed out) or CguValidationFailure (the validation of the CGU failed). An alert can drive a workflow of automated troubleshooting, fact-gathering, and reporting with ticket creation in an IT service management (ITSM) solution. This places valuable data in the hands of the support teams, saving them time and reducing mean time to resolution (MTTR). Additionally, events can be relayed to an operations team's Slack channel to provide real-time notifications for critical operational insights.
  • Automated configuration and compliance: Topology aware lifecycle manager events serve as triggers for subsequent client operations, such as registering the cluster in a Configuration Management Database (CMDB), running compliance scans, notifying teams, or configuring workloads. For example, as you make changes to systems, you can trigger Event-Driven Ansible to run compliance checks and update the CMDB and ITSM records for you. If systems are found to be out of compliance, Event-Driven Ansible could create an ITSM ticket and restore configurations from the source of truth.
Figure 2: Event-Driven Ansible, part of Ansible Automation Platform, enables automated response to OpenShift topology aware lifecycle manager operator events for use cases such as automated credential retrieval, automated application deployments, configuration and compliance, and issue remediation.

Figure 2: Event-Driven Ansible, part of Ansible Automation Platform, enables automated response to OpenShift topology aware lifecycle manager operator events for use cases such as automated credential retrieval, automated application deployments, configuration and compliance, and issue remediation. 

Credential management scenarios

Managing credentials in a multihub environment requires a security-hardened, hierarchical, and auditable methodology. Ansible Automation Platform simplifies this by offering 2 primary methods for credential acquisition, depending on your security and operational requirements:

  • Scenario 1: Kubeconfig as a cluster credential:
  • Scenario 2: Bearer tokens via ManagedServiceAccount:
    • This method takes advantage of the ManagedServiceAccount add-on available in Red Hat Advanced Cluster Management to provide scoped, rotating tokens. This approach is focused highly on security, as it supports automated token rotation and allows for fine-grained, role-based access control (RBAC) permissions rather than full cluster-admin access. By using modular Ansible roles specifically designed to interact with the OpenShift API, we automate the entire lifecycle of these service accounts. This helps ensure that scoped credentials are synchronized and validated across the global hub and leaf hubs without manual intervention. Detailed information on this scenario can be found in the published article, “Event-driven automated credential retrieval on OpenShift cluster deployments.

In both scenarios, the automation can traverse the hierarchy from a global hub to a leaf hub to find the final managed cluster, keeping the credential "chain of trust" intact and auditable. Thanks to the modularity with Ansible Automation Platform, this approach allows users to integrate credential management with tools like Ansible Automation Platform credentials or third-party vaults (e.g., HashiCorp Vault, Amazon Web Services KMS, or Microsoft Azure Key Vault) without manual intervention.

Scaling Day 2 operations with multicluster global hub

Ansible Automation Platform works in conjunction with the multicluster global hub to manage the complexities of a federated architecture. Multicluster global hub acts as the central nervous system, collecting critical cluster lifecycle events via Kafka, while Ansible Automation Platform acts as the engine that executes the work. 

Once a cluster is provisioned, automation supports ongoing Day 2 management, including: 

  • Orchestrated workflows: Running health checks and compliance validations across thousands of OpenShift clusters. These workflows go beyond simple task execution. By taking advantage of the kubernetes.core Ansible Content Collection, Ansible Automation Platform provides a native way to manage Kubernetes resources as code. This allows for deep cluster inspections and state validations that can adapt to the specific configuration of each managed cluster, whether it’s a standard multi-node OpenShift or a single node OpenShift cluster at the edge.
  • Policy management: Enforcing firmware, network, and storage policies to eliminate configuration drift across the fleet.
  • Dynamic inventory: Using the k8s_info module from the Kubernetes.core collection to dynamically populate your Ansible inventory based on real-time hardware and cluster data. 

By integrating topology aware lifecycle manager events  with Event-Driven Ansible, organizations can move beyond reactive management to achieve a highly automated, efficient, and resilient OpenShift environment. This integration establishes a fully automated, security-focused, and efficient workflow that minimizes the risk of human error and strengthens the overall security posture of large-scale OpenShift deployments.

Check out the additional resources to learn more: 


À propos de l'auteur

UI_Icon-Red_Hat-Close-A-Black-RGB

Parcourir par canal

automation icon

Automatisation

Les dernières nouveautés en matière d'automatisation informatique pour les technologies, les équipes et les environnements

AI icon

Intelligence artificielle

Actualité sur les plateformes qui permettent aux clients d'exécuter des charges de travail d'IA sur tout type d'environnement

open hybrid cloud icon

Cloud hybride ouvert

Découvrez comment créer un avenir flexible grâce au cloud hybride

security icon

Sécurité

Les dernières actualités sur la façon dont nous réduisons les risques dans tous les environnements et technologies

edge icon

Edge computing

Actualité sur les plateformes qui simplifient les opérations en périphérie

Infrastructure icon

Infrastructure

Les dernières nouveautés sur la plateforme Linux d'entreprise leader au monde

application development icon

Applications

À l’intérieur de nos solutions aux défis d’application les plus difficiles

Virtualization icon

Virtualisation

L'avenir de la virtualisation d'entreprise pour vos charges de travail sur site ou sur le cloud