This post was written by: Swati Sehgal, Alexey Perevalov, Killian Muldoon & Francesco Romani
How do you get the most out of your bare-metal hardware? Believe it or not, the physical layout in a computer of the resources a workload uses, from memory and CPU to storage and I/O, can have a dramatic impact on performance. Until recently Kubernetes users had no direct way to influence this key interaction between hardware and software, commonly called Resource Topology.
This blog post series describes Topology Aware Scheduling, a feature being rolled out in Kubernetes in 2021. Topology Aware Scheduling enables the Kubernetes control plane to keep to Resource Topology constraints when placing Pods on Nodes.This approach complements Topology Manager, which was initially introduced in Kubernetes 1.17, the node-level Resource Topology enforcer in kubelet, but more on that later.
Why does resource topology matter?
Non-Uniform Memory Access (NUMA) is a compute platform architecture that allows different CPUs to access different regions of memories at different speeds. The relative locations of CPUs, memory, and PCI devices are what we’re talking about when we say Resource Topology.
This architecture has major advantages. Any CPU core can potentially access all memory on a system, but there are some potential pitfalls with performance. For example, in the diagram below, memory closer to CPU core 1 will be quicker to access by CPU core 1 than memory close to CPU core 7.
FIGURE 1: A Non-uniform Memory Access (NUMA) system
It’s straightforward so far, and the underlying operating system will manage most of this, even in a Kubernetes cluster. When you’re trying to squeeze low-latency performance from bare metal, though, you need to dedicate isolated resources to specific applications. As we add new kinds of resources, things get increasingly complicated.
For I/O-constrained workloads, the network interface on a distant NUMA zone slows down how quickly information can reach the application. High-performance workloads, like those running the 5G network, can’t operate to spec under these conditions.
Taking an example of a pod requesting 2 CPUs and a PCI device, FIGURE 2 shows a scenario where resources are not NUMA aligned whereas FIGURE 3 shows a scenario where resources are NUMA aligned:
FIGURE 2: A NUMA System with no Resource Alignment
FIGURE 3: A NUMA System with Resource Alignment
Without handling Resource Topology, Kubernetes as it exists in 1.20 can’t meet the needs of these sorts of applications. End users can (and have!) found ways around this by adding constraints to their clusters. One option is to replace bare-metal deployments with VMs, while another is to limit the pod configs available to developers.
Does Kubernetes default-scheduler consider Resource Topology when assigning pods to nodes?
Kubernetes Topology Manager allows workloads to run in an environment optimized for low latency. Performance-critical workloads require topology information to use co-located CPU cores and devices for industries like telecommunications, High Powered Computing (HPC), and Internet of Things (IoT), but the current native scheduler does not select a node based on its topology. This happens due to the scheduler’s lack of knowledge of Resource Topology, which can lead to unpredictable application performance. In general, this means under performance, and in the worst case, complete mismatch of resource requests and kubelet policies such as scheduling a pod destined to fail, potentially entering a failure loop.
Exposing cluster level topology to the scheduler empowers it to make intelligent NUMA aware placement decisions optimizing cluster wide performance of workloads.
What is the business case for enabling Topology aware scheduling in Kubernetes?
A company could make a business by providing a public cloud or by selling a cloud solution to third parties (for example, telecom operators for NFV use cases and to others). In case of public cloud, the cloud provider in its end user agreement or in public offer can provide only tariffs with a fixed number of resources. In this case, the problem of resource alignment is solved by IAAS level and by the number of resources (NIC, GPU) we can find in tariffs, and these numbers are aligned to numbers per NUMA.
Another case is when a company sells cloud solutions and clients demand more flexibility. Flexibility to them is the ability to work on bare metal and the ability to request any number and kind of resources. So the solution that makes kube scheduler topology aware is interesting for those companies who sell cloud solutions to third parties.
In the next part of the blog post, we talk about Topology Manager and explain the design of Topology aware Scheduling in more detail.
Sobre o autor
Navegue por canal
Automação
Últimas novidades em automação de TI para empresas de tecnologia, equipes e ambientes
Inteligência artificial
Descubra as atualizações nas plataformas que proporcionam aos clientes executar suas cargas de trabalho de IA em qualquer ambiente
Nuvem híbrida aberta
Veja como construímos um futuro mais flexível com a nuvem híbrida
Segurança
Veja as últimas novidades sobre como reduzimos riscos em ambientes e tecnologias
Edge computing
Saiba quais são as atualizações nas plataformas que simplificam as operações na borda
Infraestrutura
Saiba o que há de mais recente na plataforma Linux empresarial líder mundial
Aplicações
Conheça nossas soluções desenvolvidas para ajudar você a superar os desafios mais complexos de aplicações
Programas originais
Veja as histórias divertidas de criadores e líderes em tecnologia empresarial
Produtos
- Red Hat Enterprise Linux
- Red Hat OpenShift
- Red Hat Ansible Automation Platform
- Red Hat Cloud Services
- Veja todos os produtos
Ferramentas
- Treinamento e certificação
- Minha conta
- Suporte ao cliente
- Recursos para desenvolvedores
- Encontre um parceiro
- Red Hat Ecosystem Catalog
- Calculadora de valor Red Hat
- Documentação
Experimente, compre, venda
Comunicação
- Contate o setor de vendas
- Fale com o Atendimento ao Cliente
- Contate o setor de treinamento
- Redes sociais
Sobre a Red Hat
A Red Hat é a líder mundial em soluções empresariais open source como Linux, nuvem, containers e Kubernetes. Fornecemos soluções robustas que facilitam o trabalho em diversas plataformas e ambientes, do datacenter principal até a borda da rede.
Selecione um idioma
Red Hat legal and privacy links
- Sobre a Red Hat
- Oportunidades de emprego
- Eventos
- Escritórios
- Fale com a Red Hat
- Blog da Red Hat
- Diversidade, equidade e inclusão
- Cool Stuff Store
- Red Hat Summit