This post was written by: Swati Sehgal, Alexey Perevalov, Killian Muldoon & Francesco Romani
At the node level, resource alignment is handled by Topology Manager, a native part of Kubelet. By default Kubelet won’t try to apply any specific constraints, but a Topology Manager policy can be set to enforce resource alignment. Part 1 of this blog is here.
Topology Manager is the key part of Kubernetes’ resource topology management system. It makes sure that Pods get resources with the correct alignment as they enter runtime. Kubelet is necessarily kept from knowing everything about a cluster, however. The knowledge gap can result in failures, from unexpectedly low performance to stopping an application completely.
This system works well once a pod lands on a node. Kubelet can take into account available resources and make sure that pods get the best possible alignment. What Kubelet can’t do, however, is tell us whether there’s a better resource alignment available elsewhere in the cluster. This is a job for the Scheduler.
Working in concert: Topology Manager and Topology Aware Scheduling.
The worst-case scenario for resource topology today comes when there’s a complete mismatch between the workload request and the policy set on the compute node. If Kubelet is trying to enforce a “single-numa-node” policy for Resource Topology, this sort of mismatch can cause pod failure. This presents in the cluster as a Topology Affinity Error.
Take a heavy workload (Pod 1) requesting 20 dedicated CPUs in its Pod spec, and two worker Nodes, Node A with 20 total cores, 10 in each NUMA zone, and Node B with 40 total cores, 20 in each NUMA Zone. Both Node A and Node B are running the single-numa-zone policy.
Figure 1: Diagram of Machines Layout
It may be clear from the diagram above that only Node B can meet the resource requirements of Pod Spec 1, but it’s not at all clear from the scheduler’s point of view when it reads the Kubernetes API. Here’s what it sees:
Figure 2: Diagram of Kube-Scheduler’s View
With Topology Manager enabled, the scheduler sees both Node A and Node B as suitable platforms for running Pod 1. However, when Pod 1 is deployed to Node A, we get a “Topology Affinity Error.” This prevents the workload from running and can have knock on effects in the cluster. For more discussion of this issue, see scheduler being topology-unaware can cause runaway pod creation.
If we enable Topology Aware Scheduling, the scheduler begins to see the resource topology complexity that underlies the simplified node-level resource view. With Topology Awareness enable in the scheduler, this is what it sees:
Figure 3: Diagram of Scheduler’s View With Topology Aware Scheduling
The above view means the scheduler will not deploy Pod 1 to Node A, avoiding the Topology Affinity Error. The situation described in the above article becomes increasingly likely as more distinct resource requests are added to a Pod spec and more heterogeneous types of machines are added to the cluster.
For more information on Topology Manager see Kubernetes Topology Manager Moves to Beta - Align Up!
What does this add to Kubernetes?
Topology Aware Scheduling is designed to power new kinds of workloads to function on bare- metal Kubernetes clusters.
The design is primarily concerned with offering coherent, predictable resource alignment decisions in a Kubernetes cluster. With it, enabled workloads should never be placed on platforms that cannot meet their resource needs aligned to their topology preferences.
High-performance and low-latency computing rely on almost absolute resource guarantees to enable predictable performance. These workloads are tuned to make sure the absolute maximum performance can be squeezed from a platform, with the minimum amount of disruption over the lifetime of the workload.
In Kubernetes today, NUMA-based servers require significant workarounds in order to deliver that performance. Either something outside of Kubernetes implements the constraints – such as having virtualized nodes – or the flexibility of Node and Pod configuration is reduced.
Packet processing workloads, like those found in 5G core and edge networks and machine learning workloads, are the first targets for Resource Topology alignments. But there are lots of workloads out there that may benefit from the kind of guarantees the system is able to offer.
How is this all going to work?
We’ll be doing a deep dive later in the series on what’s really going to drive Topology Aware Scheduling. From a high level, there are three components the make up the solution:
Figure 4: System Level Diagram of Topology-Aware Scheduling (click image for full size)
1) Kubelet is responsible for making information on existing Resource Topology available through the PodResource API. This API is being enhanced as part of the work on Topology Aware Scheduling.
2) Node Feature Discovery will read from the Kubelet endpoint and make Resource Topology information available through Custom Resources (CRs) corresponding to the nodes in the cluster.
3) Kubernetes Scheduler reads the information exported by Node Featured Discovery and blocks scheduling to nodes that can not satisfy the needs of specific workloads.
Topology Aware Scheduling integrates with existing Kubernetes components, including the community sponsored Node Feature Discovery, to offer a drop-in solution for cluster-level topology management:
Figure 5: Sequence Diagram of Topology-Aware Scheduling (click image for full size)
The components communicate with each other through Kubernetes APIs.
Look out for more articles in this series, which will trace Resource Topology management from the node all the way up to the scheduler.
À propos de l'auteur
Parcourir par canal
Automatisation
Les dernières nouveautés en matière d'automatisation informatique pour les technologies, les équipes et les environnements
Intelligence artificielle
Actualité sur les plateformes qui permettent aux clients d'exécuter des charges de travail d'IA sur tout type d'environnement
Cloud hybride ouvert
Découvrez comment créer un avenir flexible grâce au cloud hybride
Sécurité
Les dernières actualités sur la façon dont nous réduisons les risques dans tous les environnements et technologies
Edge computing
Actualité sur les plateformes qui simplifient les opérations en périphérie
Infrastructure
Les dernières nouveautés sur la plateforme Linux d'entreprise leader au monde
Applications
À l’intérieur de nos solutions aux défis d’application les plus difficiles
Programmes originaux
Histoires passionnantes de créateurs et de leaders de technologies d'entreprise
Produits
- Red Hat Enterprise Linux
- Red Hat OpenShift
- Red Hat Ansible Automation Platform
- Services cloud
- Voir tous les produits
Outils
- Formation et certification
- Mon compte
- Assistance client
- Ressources développeurs
- Rechercher un partenaire
- Red Hat Ecosystem Catalog
- Calculateur de valeur Red Hat
- Documentation
Essayer, acheter et vendre
Communication
- Contacter le service commercial
- Contactez notre service clientèle
- Contacter le service de formation
- Réseaux sociaux
À propos de Red Hat
Premier éditeur mondial de solutions Open Source pour les entreprises, nous fournissons des technologies Linux, cloud, de conteneurs et Kubernetes. Nous proposons des solutions stables qui aident les entreprises à jongler avec les divers environnements et plateformes, du cœur du datacenter à la périphérie du réseau.
Sélectionner une langue
Red Hat legal and privacy links
- À propos de Red Hat
- Carrières
- Événements
- Bureaux
- Contacter Red Hat
- Lire le blog Red Hat
- Diversité, équité et inclusion
- Cool Stuff Store
- Red Hat Summit