This post was written by: Swati Sehgal, Alexey Perevalov, Killian Muldoon & Francesco Romani
At the node level, resource alignment is handled by Topology Manager, a native part of Kubelet. By default Kubelet won’t try to apply any specific constraints, but a Topology Manager policy can be set to enforce resource alignment. Part 1 of this blog is here.
Topology Manager is the key part of Kubernetes’ resource topology management system. It makes sure that Pods get resources with the correct alignment as they enter runtime. Kubelet is necessarily kept from knowing everything about a cluster, however. The knowledge gap can result in failures, from unexpectedly low performance to stopping an application completely.
This system works well once a pod lands on a node. Kubelet can take into account available resources and make sure that pods get the best possible alignment. What Kubelet can’t do, however, is tell us whether there’s a better resource alignment available elsewhere in the cluster. This is a job for the Scheduler.
Working in concert: Topology Manager and Topology Aware Scheduling.
The worst-case scenario for resource topology today comes when there’s a complete mismatch between the workload request and the policy set on the compute node. If Kubelet is trying to enforce a “single-numa-node” policy for Resource Topology, this sort of mismatch can cause pod failure. This presents in the cluster as a Topology Affinity Error.
Take a heavy workload (Pod 1) requesting 20 dedicated CPUs in its Pod spec, and two worker Nodes, Node A with 20 total cores, 10 in each NUMA zone, and Node B with 40 total cores, 20 in each NUMA Zone. Both Node A and Node B are running the single-numa-zone policy.
Figure 1: Diagram of Machines Layout
It may be clear from the diagram above that only Node B can meet the resource requirements of Pod Spec 1, but it’s not at all clear from the scheduler’s point of view when it reads the Kubernetes API. Here’s what it sees:
Figure 2: Diagram of Kube-Scheduler’s View
With Topology Manager enabled, the scheduler sees both Node A and Node B as suitable platforms for running Pod 1. However, when Pod 1 is deployed to Node A, we get a “Topology Affinity Error.” This prevents the workload from running and can have knock on effects in the cluster. For more discussion of this issue, see scheduler being topology-unaware can cause runaway pod creation.
If we enable Topology Aware Scheduling, the scheduler begins to see the resource topology complexity that underlies the simplified node-level resource view. With Topology Awareness enable in the scheduler, this is what it sees:
Figure 3: Diagram of Scheduler’s View With Topology Aware Scheduling
The above view means the scheduler will not deploy Pod 1 to Node A, avoiding the Topology Affinity Error. The situation described in the above article becomes increasingly likely as more distinct resource requests are added to a Pod spec and more heterogeneous types of machines are added to the cluster.
For more information on Topology Manager see Kubernetes Topology Manager Moves to Beta - Align Up!
What does this add to Kubernetes?
Topology Aware Scheduling is designed to power new kinds of workloads to function on bare- metal Kubernetes clusters.
The design is primarily concerned with offering coherent, predictable resource alignment decisions in a Kubernetes cluster. With it, enabled workloads should never be placed on platforms that cannot meet their resource needs aligned to their topology preferences.
High-performance and low-latency computing rely on almost absolute resource guarantees to enable predictable performance. These workloads are tuned to make sure the absolute maximum performance can be squeezed from a platform, with the minimum amount of disruption over the lifetime of the workload.
In Kubernetes today, NUMA-based servers require significant workarounds in order to deliver that performance. Either something outside of Kubernetes implements the constraints – such as having virtualized nodes – or the flexibility of Node and Pod configuration is reduced.
Packet processing workloads, like those found in 5G core and edge networks and machine learning workloads, are the first targets for Resource Topology alignments. But there are lots of workloads out there that may benefit from the kind of guarantees the system is able to offer.
How is this all going to work?
We’ll be doing a deep dive later in the series on what’s really going to drive Topology Aware Scheduling. From a high level, there are three components the make up the solution:
Figure 4: System Level Diagram of Topology-Aware Scheduling (click image for full size)
1) Kubelet is responsible for making information on existing Resource Topology available through the PodResource API. This API is being enhanced as part of the work on Topology Aware Scheduling.
2) Node Feature Discovery will read from the Kubelet endpoint and make Resource Topology information available through Custom Resources (CRs) corresponding to the nodes in the cluster.
3) Kubernetes Scheduler reads the information exported by Node Featured Discovery and blocks scheduling to nodes that can not satisfy the needs of specific workloads.
Topology Aware Scheduling integrates with existing Kubernetes components, including the community sponsored Node Feature Discovery, to offer a drop-in solution for cluster-level topology management:
Figure 5: Sequence Diagram of Topology-Aware Scheduling (click image for full size)
The components communicate with each other through Kubernetes APIs.
Look out for more articles in this series, which will trace Resource Topology management from the node all the way up to the scheduler.
執筆者紹介
チャンネル別に見る
自動化
テクノロジー、チームおよび環境に関する IT 自動化の最新情報
AI (人工知能)
お客様が AI ワークロードをどこでも自由に実行することを可能にするプラットフォームについてのアップデート
オープン・ハイブリッドクラウド
ハイブリッドクラウドで柔軟に未来を築く方法をご確認ください。
セキュリティ
環境やテクノロジー全体に及ぶリスクを軽減する方法に関する最新情報
エッジコンピューティング
エッジでの運用を単純化するプラットフォームのアップデート
インフラストラクチャ
世界有数のエンタープライズ向け Linux プラットフォームの最新情報
アプリケーション
アプリケーションの最も困難な課題に対する Red Hat ソリューションの詳細
オリジナル番組
エンタープライズ向けテクノロジーのメーカーやリーダーによるストーリー
製品
ツール
試用、購入、販売
コミュニケーション
Red Hat について
エンタープライズ・オープンソース・ソリューションのプロバイダーとして世界をリードする Red Hat は、Linux、クラウド、コンテナ、Kubernetes などのテクノロジーを提供しています。Red Hat は強化されたソリューションを提供し、コアデータセンターからネットワークエッジまで、企業が複数のプラットフォームおよび環境間で容易に運用できるようにしています。
言語を選択してください
Red Hat legal and privacy links
- Red Hat について
- 採用情報
- イベント
- 各国のオフィス
- Red Hat へのお問い合わせ
- Red Hat ブログ
- ダイバーシティ、エクイティ、およびインクルージョン
- Cool Stuff Store
- Red Hat Summit