Red Hat Performance and Scale Team logo

Red Hat's most recent posts about Performance, Scale, Chaos and more.


LATEST BLOGS

Red Hat Enterprise Linux Performance Results on 5th Gen Intel® Xeon® Scalable Processors

April 4, 2024 - Bill Gray, David Dumas, Douglas Shakshober, Michey Mehta

Intel recently launched the 5th generation of Intel® Xeon® Scalable processors (Intel Xeon SP), code-named Emerald Rapids; a family of high-end, enterprise-focused processors targeted at a diverse range of workloads. To explore how Intel’s new chips measure up, we’ve worked with Intel and others to run benchmarks with Red Hat Enterprise Linux 8.8 / 9.2 and greater…read more

Optimizing Quay/Clair: Database profiling results

March 19, 2024 - Vishnu Challa

Welcome to the second part of our exploration. In this continuation from our previous article, we will delve deeper into the results of our database profiling efforts and discuss strategies for further optimizing overall application performance…read more

Optimizing Quay/Clair: Profiling, performance, and efficiency

March 19, 2024 - Vishnu Challa

Red Hat Quay (also offered as a service via Quay.io) is a cloud-based container registry service that allows users to store, manage, and distribute container images. It provides a platform for hosting, sharing, and securing container images across multiple environments, including on-premise data centers, public cloud platforms, and hybrid cloud deployments…read more

Save memory with OpenShift Virtualization using Free Page Reporting

March 13, 2024 - Robert Krawitz

OpenShift Virtualization, a feature of Red Hat OpenShift, allows running virtual machines (VMs) alongside containers on the same platform, simplifying management. It allows using VMs in containerized environments by running VMs the same way as any other pod, so that organizations with significant investment in virtualization or who desire the greater isolation provided by VMs with legacy workloads can use them in an orchestrated containerized environment…read more

Test Kubernetes performance and scale with kube-burner

March 4, 2024 - Sai Sindhur Malleni, Vishnu Challa, Raul Sevilla Canavate

Three years ago, we introduced kube-burner to the Kubernetes performance and scale communities. Since then, kube-burner has steadily continued its journey, adding a diverse range of features that help solve unique challenges in performing and analyzing results from performance and scale tests on Kubernetes and Red Hat OpenShift.

Over the last few years, multiple new features and usability improvements were added to kube-burner. In this article, we will go beyond the basics, exploring some new bells and whistles added to the tool recently and laying out our vision for the future…read more

5 ways we work to optimize Red Hat Satellite

March 4, 2024 - Imaanpreet Kaur, Jan Hutař, Pablo Mendez Hernandez

In the ever-evolving landscape of technology, the hunt for optimal performance and scale is a constant challenge. With Red Hat Satellite 6.13 and 6.14 versions, we have embarked on an exciting journey to push the boundaries and elevate our capabilities. In this article, we'll take you behind the scenes to explore how we work to enhance performance and scale our operations…read more

Best practices for OpenShift Data Foundation disaster recovery resource planning

February 22, 2024 - Elvir Kuric

Red Hat OpenShift Data Foundation is a key storage component of Red Hat OpenShift. It offers unified block, file, and object storage capabilities to support a wide range of applications.

One of the new exciting features in OpenShift Data Foundation 4.14 is OpenShift Data Foundation Regional Disaster Recovery (RDR), which offers RDR capabilities for Rados Block Device pools (RBD pools) and Ceph File System (CephFS) (via volsync replication) pools. With RBD images replicated between clusters, OpenShift Data Foundation RDR protects customers from catastrophic failures. With OpenShift Data Foundation RDR, we can…read more

DPDK latency in OpenShift - Part II

February 20, 2024 - Rafael Folco, Karl Rister, Andrew Theurer

In a previous article, we shared the results of the DPDK latency tests conducted on a Single Node Openshift (SNO) cluster. We were able to demonstrate that a packet can be transmitted and received back in only 3 µs, mostly under 7 µs and, in the worst case, 12 µs. These numbers represent the round trip latencies in Openshift for a single queue transmission of a 64 byte packet, forwarding packets using an Intel E810 dual port adapter…read more

Correlating QPS rate with resource utilization in self-managed Red Hat OpenShift with Hosted Control Planes

January 23, 2024 - Guoqing Li

The general availability of hosted control planes (HCP) for self-managed Red Hat OpenShift Virtualization (KubeVirt) is an exciting milestone. However, the true test lies in system performance and scalability, which are both crucial factors that determine success. Understanding and pushing these limits is essential for making informed decisions. This article offers a comprehensive analysis and general sizing insights for consolidating existing bare metal resources using hosted control planes for self-managed OpenShift Virtualization. It delves into the resource usage patterns of the hosted control planes, examining their relationship with the KubeAPIServer QPS rate. Through various experiments, we established the linear regression model between the KubeAPI Server QPS rate and CPU/Memory/ETCD storage utilization, providing valuable insights for efficient resource consolidation and node capacity planning…read more

Continuous performance and scale validation of Red Hat OpenShift AI model-serving stack

January 17, 2024 - Kevin Pouget

The model serving stack included in OpenShift AI is generally available (GA) as of December 2023 (release 2.5), meaning that OpenShift AI is fully operational to deploy and serve inference models via the KServe API. You may have read my colleague David Gray's article about the performance of this model serving stack for large language models (LLMs). This article provides a different look at that same model serving stack. It discusses how we stress tested the model deployment and model serving controllers to confirm that they perform and scale well in single-model, multi-model and many-model environments…read more

Kube-burner: Fanning the flames of innovation in the CNCF Sandbox

January 16, 2024 - Sai Sindhur Malleni

We are thrilled to share some exciting news with you all – kube-burner, the robust performance and scale testing tool for Kubernetes and Red Hat OpenShift, has officially achieved the CNCF Sandbox status! In the evolving landscape of cloud-native technologies, the Cloud Native Computing Foundation (CNCF) serves as a hub for incubating and nurturing innovative projects. Kube-burner has become the first and only performance and scale testing tool to attain this recognition, thereby elevating the importance of performance and scale in the cloud native landscape…read more

Evaluating LLM inference performance on Red Hat OpenShift AI

January 16, 2024 - David Gray

The generative artificial intelligence (AI) landscape has undergone rapid evolution over the past year. As the power of generative large language models (LLMs) grows, organizations increasingly seek to harness their capabilities to meet business needs. Because of the intense computational demands of running LLMs, deploying them on a performant and reliable platform is critical to making cost-effective use of the underlying hardware, especially GPUs.

This article introduces the methodology and results of performance testing the Llama-2 models deployed on the model serving stack included with Red Hat OpenShift AI. OpenShift AI is a flexible, scalable MLOps platform with tools to build, deploy and manage AI-enabled applications. Built using open source technologies, it provides trusted, operationally consistent capabilities for teams to experiment, serve models and deliver innovative apps…read more

Operating Tekton at scale: 10 lessons learned from Red Hat Trusted Application Pipeline

January 11, 2024 - Pradeep Surisetty, Gabe Montero, Pavel Macik, Ann Marie Fred, Jan Hutař

Red Hat Trusted Application Pipeline is built on top of Red Hat OpenShift Pipelines and its upstream project, Tekton. We use Tekton’s Pipelines as Code, Tekton Chains and Tekton Results capabilities to provide a more scalable build environment with an enhanced security posture to power Red Hat's next-generation build systems.

This blog shares some generic learnings applicable to Trusted Application Pipeline or OpenShift Pipelines and any large workload distributing system on OpenShift or Kubernetes…read more

Behind the scenes: Introducing OpenShift Virtualization Performance and Scale

January 9, 2024 - Jenifer Abrams

Red Hat OpenShift Virtualization helps to remove workload barriers by unifying virtual machine (VM) deployment and management alongside containerized applications in a cloud-native manner. As part of the larger Performance and Scale team, we have been deeply involved in the measurement and analysis of VMs running on OpenShift since the early days of the KubeVirt open source project and have helped to drive product maturity through new feature evaluation, workload tuning and scale testing. This article dives into several of our focus areas and shares additional insights into running and tuning VM workloads on OpenShift…read more

KrknChaos is joining CNCF Sandbox

January 9, 2024 - Naga Ravi Chaitanya Elluri, Brian Riordan, Pradeep Surisetty

We are excited to announce that krknChaos, a chaos engineering tool for Kubernetes focused on improving resilience and performance, has been accepted as a Sandbox project by the Cloud Native Computing Foundation (CNCF). Additional details can be found in the proposal.  We would like to thank the TAG App Delivery team (to name a few, Josh Gavant and Karena Angell and team), the CNCF Technical Oversight Committee (TOC) for their invaluable guidance and support throughout the process and of course the team and community for their invaluable contributions which are key to making this happen…read more

Supercharging chaos testing using AI

January 8, 2024 - Naga Ravi Chaitanya Elluri, Mudit Verma, Sandeep Hans

There has been a huge increase in demand for running complex systems with tens to hundreds of microservices at massive scale. End users expect 24/7 availability of services they depend on, so even a few minutes of downtime matters. A proactive chaos engineer helps meet user expectations by identifying bottlenecks, hardening services before downtime occurs in a production environment. Chaos engineering is vital to avoid losing trust with your end users…read more

Quantifying performance of Red Hat OpenShift for Machine Learning (ML) Training on Supermicro A+ Servers with MLPerf Training v3.1

November 23, 2023 - Diane Feddema

We are proud to announce the first MLPerf Training submission on Red Hat OpenShift, which is also the first MLPerf training submission on a variant of Kubernetes. Red Hat collaborated with Supermicro on this submission and ran the benchmarks on a Supermicro GPU A+ Server with 8XH100 NVIDIAGPUs. Red Hat OpenShift helps make it easier to run, reproduce and monitor AI/ML workloads, while adding minimal overhead to your training jobs. In this blog we provide the performance numbers of our recent submission to MLPerf v3.1 training...read more

OpenShift Cluster Manager API: Load-testing, breaking, and improving it

October 26, 2023 - Vicente Zepeda Mas

Red Hat OpenShift Cluster Manager (OCM) is a managed service where you can install, modify, operate, and upgrade your Red Hat OpenShift clusters. Because OCM is a cornerstone of Red Hat’s hybrid cloud strategy, we strive to make sure that it is scalable enough to handle peak traffic and find bottlenecks that can be fixed to present a satisfying experience for our customers. The Performance & Scale team discovered a performance problem that affected a core component of the API. In this blog post, we discuss how we identified the problem, how we worked as a cross-functional team to identify and fix it, and the measures we implemented to prevent similar incidents from happening in the future...read more

Data Plane Development Kit (DPDK) latency in Red Hat OpenShift - Part I

October 11, 2023 - Rafael Folco, Karl Rister, Andrew Theurer

In this article, we present the results of DPDK latency tests conducted on a single node OpenShift (SNO) cluster. The tests were performed using the traffic generator MoonGen, which utilizes the hardware timestamping support for measuring packet latencies as they pass through the network adapters. The results of these tests provide insights into the performance of DPDK in a real-world environment and offer guidance for network architects and administrators seeking to optimize network latency…read more

Running 2500 pods per node on OCP 4.13

August 22, 2023 - Andrew Collins

There is a default PPN of 250. Customers who exceed 250 ask whether they can scale beyond the published maximum of 500. They ask, "How can we better utilize the capacity of our large bare metal machines?"...read more

Bulk API in Automation Controller

August 9, 2023 - Nikhil Jain

Automation controller has a rich ReSTful API. REST stands for Representational State Transfer and is sometimes spelled as “ReST”. It relies on a stateless, client-server, and cacheable communications protocol, usually the HTTP protocol. REST APIs provide access to resources (data entities) via URI paths. You can visit the automation controller REST API in a web browser at: http://<server name>/api/...read more

Red Hat Enterprise Linux achieves significant performance gains with Intel's 4th Generation Xeon Scalable Processors

April 20, 2023 - Michey Mehta, Bill Gray, David Dumas, Douglas Shakshober

Intel recently launched the 4th generation of Intel® Xeon® Scalable processors, a family of high-end, enterprise-focused processors targeted at a diverse range of workloads. To explore how Intel’s new chips measure up, we’ve worked with Intel and others to run benchmarks with Red Hat Enterprise Linux 8.4, 8.5, 8.6, 9.0 and 9.1, as well as CentOS Stream 9.2 (which will become Red Hat Enterprise Linux 9.2)...read more

OpenShift/Kubernetes Chaos Stories

March 15, 2023 - Naga Ravi Chaitanya Elluri

With the increase in adoption and reliance on digital technology and microservices architecture, the uptime of an application has never been more important. Downtime of even a few minutes can lead to huge revenue loss and most importantly trust. This is exactly why we proactively focus on identifying bottlenecks and improving the resilience and performance of OpenShift under chaotic conditions…read more

Enhancing/Maximizing your Scaling capability with Automation Controller 2.3

March 13, 2023 - Nikhil Jain

Red Hat Ansible Automation Platform 2 is the next generation automation platform from Red Hat’s trusted enterprise technology experts. We are excited to announce that the Ansible Automation Platform 2.3 release includes automation controller 4.3…read more

Red Hat new Benchmark results on AMD EPYC4 (Genoa) processors

January 6, 2023 - Red Hat Performance Team

Red Hat has continued to work with our partners to better enable world class performance. Recently, AMD released its EPYC "Genoa" 4th Gen Data Center CPU, known as the AMD EPYC 9004 Series. With a die size of 5nm, AMD increased the core count to 96 cpu and 192 threads / socket with 384 MB L3 cache size…read more

A Guide to Scaling OpenShift Data Science to Hundreds of Users and Notebooks

December 13, 2022 - by Kevin Pouget

Red Hat OpenShift Data Science provides a fully managed cloud service environment for data scientists and developers of intelligent applications. It offers a fully supported environment in which to rapidly develop, train, and test machine learning (ML) models before deploying in production…read more

Run Windows workloads on OpenShift Container Platform

November 30, 2022 - by Krishna Harsha Voora, Venkata Anil Kommaddi, Sai Sindhur Malleni

OpenShift helps bring the power of cloud-native and containerization to your applications, no matter what underlying operating systems they rely on. For use cases that require both Linux and Windows workloads, Red Hat OpenShift allows you to deploy Windows workloads running on Windows server while also supporting traditional Linux workloads…read more

A Guide to Functional and Performance Testing of the NVIDIA DGX A100

June 23, 2022 - by Kevin Pouget

This blog post, part of a series on the DGX-A100 OpenShift launch, presents the functional and performance assessment we performed to validate the behavior of the DGX™ A100 system, including its eight NVIDIA A100 GPUs. This study was performed on OpenShift 4.9 with the GPU computing stack deployed by NVIDIA GPU Operator v1.9…read more

Scaling Automation Controller for API Driven Workloads

June 20, 2022 - Elijah Delee

When scaling automation controller in an enterprise organization, administrators are faced with more clients automating their interactions with its REST API. As with any web application, automation controller has a finite capacity to serve web requests, and web clients can…read more

Performance Improvements in Automation Controller 4.1

February 28, 2022 - Nikhil Jain

With the release of Ansible Automation Platform 2.1, users now have access to the latest control plane – automation controller 4.1. Automation controller 4.1 provides significant performance improvements when compared to its predecessor Ansible Tower 3.8. To put this into context, we used Ansible Tower 3.8 to run jobs, capture various metrics…read more

The Curious Case of the CPU Eating Gunicorn

June 2, 2022 - Gonza Rafuls

We decided to take a first try hands-on approach following the future QUADS roadmap and re-architect our legacy landing/requests portal application, previously a trusty LAMP stack, into a completely rewritten Flask / SQLAlchemy / Gunicorn / Nginx next-gen platform…read more

Entitlement-Free Deployment of the NVIDIA GPU Operator on OpenShift

December 14, 2021 - Kevin Pouget

Version 1.9.0 of the GPU Operator has just landed in OpenShift OperatorHub, with many different updates. We're proud to announce that this version comes with the support of the entitlement-free deployment of NVIDIA GPU Driver…read more

Red Hat collaborates with NVIDIA to deliver record-breaking STAC-A2 Market Risk benchmark

November 9, 2021 - Sebastian Jug

We are happy to announce a record-breaking performance with NVIDIA in the STAC-A2 benchmark, affirming Red Hat OpenShift's ability to run compute heavy, high performance workloads. The Securities Technology Analysis Center (STAC®) facilitates a large group of financial firms and technology vendors…read more

Red Hat Satellite 6.9 with Puma Web Server

September 15, 2021 - Imaanpreet Kaur

Until Red Hat Satellite 6.8, the Passenger web/app server was a core component of Red Hat Satellite. Satellite used Passenger to run Ruby applications such as Foreman. Satellite 6.9 is no longer using the Passenger web server. The Foreman application (main UI and API server) was ported to use the Puma project…read more

Using NVIDIA A100’s Multi-Instance GPU to Run Multiple Workloads in Parallel on a Single GPU

August 26, 2021 - Kevin Pouget

The new Multi-Instance GPU (MIG) feature lets GPUs based on the NVIDIA Ampere architecture run multiple GPU-accelerated CUDA applications in parallel in a fully isolated way. The compute units of the GPU, as well as its memory, can be partitioned into multiple MIG instances…read more

Multi-Instance GPU Support with the GPU Operator v1.7.0

June 15, 2021 - Kevin Pouget

Version 1.7.0 of the GPU Operator has just landed in OpenShift OperatorHub, with many different updates. We are proud to announce that this version comes with the support of the NVIDIA Multi-Instance GPU (MIG) feature for the A100 and A30 Ampere cards…read more

Making Chaos Part of Kubernetes/OpenShift Performance and Scalability Tests

March 17, 2021 - Naga Ravi Chaitanya Elluri

While we know how important performance and scale are, how can we engineer for it when chaos becomes common in complex systems? What role does Chaos/Resiliency testing play during Performance and Scalability evaluation? Let’s look at the methodology that we need to embrace to mimic a real world production environment to find the bottlenecks and fix them before it impacts the users and customers…read more

Demonstrating Performance Capabilities of Red Hat OpenShift for Running Scientific HPC Workloads

November 11, 2020 - David Gray and Kevin Pouget

This blog post is a follow-up to the previous blog post on running GROMACS on Red Hat OpenShift Container Platform (OCP) using the Lustre filesystem. In this post, we will show how we ran two scientific HPC workloads on a 38-node OpenShift cluster using CephFS with OpenShift Container Storage in external mode…read more

A Complete Guide for Running Specfem Scientific HPC Workload on Red Hat OpenShift

November 11, 2020 - Kevin Pouget

Specfem3D_Globe is a scientific high-performance computing (HPC) code that simulates seismic wave propagation, at a global or regional scale (website and repository). It relies on a 3D crustal model and takes into account parameters such as the Earth density, topography/bathymetry, rotation, oceans, or self-gravitation…read more

Running HPC workloads with Red Hat OpenShift Using MPI and Lustre Filesystem

October 29, 2020 - David Gray

The requirements associated with data science and AI/ML applications have pushed organizations toward using highly parallel and scalable hardware that often resemble high performance computing (HPC) infrastructure. HPC has been around for a while and has evolved to include ultra large supercomputers that run massively parallel tasks and operate at exascale (able to perform a billion billion operations per second)...read more

Introduction to Kraken, a Chaos Tool for OpenShift/Kubernetes

October 8, 2020 - Yashashree Suresh and Paige Rubendall

Chaos engineering helps in boosting confidence in a system's resilience by “breaking things on purpose.” While it may seem counterintuitive, it is crucial to deliberately inject failures into a complex system like OpenShift/Kubernetes and check whether the system recovers gracefully…read more


Sobre o autor

Red Hat Performance and Scale Engineering pushes Red Hat products to their limits. Every day we strive to reach greater performance for our customer workloads and scale the products to new levels. Our performance engineers benchmark configurations that range from far edge telco use cases to large scale cloud environments.

Read full bio