Red Hat Performance and Scale Engineering

October 9, 2024Red Hat Performance Team20-minute read

Red Hat's most recent posts about Performance, Scale, Chaos and more.

LATEST BLOGS

A step by step guide to setting up OCP Virtualization on hyperconverged ODF and deploy 10K VMs

October 4, 2024 Guoqing Li

OpenShift Virtualization enables running VMs alongside pods within the same cluster which paves a path for infrastructure modernization. Using hyperconverged ODF as the backend storage for OpenShift Virtualization allows your VMs to exist on the same node where storage is attached, and provides the potential to max out hardware resource utilization and cost savings. This document provides the detailed step by step guide to set up OCP-Virt (OpenShift Virtualization) VMs backed by a hyperconverged ODF (OpenShift Data Foundation) storage system. At the end of this guide, we show you how to create thousands of VMs using a one-line bash command and we also include some scale data comparing PVC and VolumeSnapshot cloning and VM boot time for both Windows and RHEL 9 VMs...read more

Virtualized database I/O performance improvements in RHEL 9.4

September 10, 2024 Sanjay Rao, Stefan Hajnoczi

Databases are sensitive to disk I/O performance. The IOThread Virtqueue Mapping feature introduced in the QEMU virtual machine monitor in Red Hat Enterprise Linux (RHEL) 9.4 is designed to improve disk I/O performance for workloads that submit I/O from many vCPUs. In this article, we will look at how the IOThread Virtqueue Mapping feature can boost performance for database workloads running in QEMU/KVM guests.While RHEL guests have supported multi-queue virtio-blk devices for some time, the QEMU virtual machine monitor handled I/O requests in a single thread. This limits the performance benefit of having multiple queues because the single thread can become a bottleneck.…read more

Use kube-burner to measure Red Hat OpenShift VM and storage deployment at scale

September 4, 2024 Jenifer Abrams

Scale testing is critical for understanding how a cluster will hold up under production load. Generally, you may want to scale test to reach a certain max density as the end goal, but it is often also useful to scale up from smaller batch sizes to observe how performance may change as the overall cluster becomes more loaded. Those of us that work in the area of performance analysis know there are many ways to measure a workload and standardizing on a tool can help provide more comparable results across different configurations and environments... read more

Scaling virtio-blk disk I/O with IOThread Virtqueue Mapping

September 5, 2024 Stefan Hajnoczi Kevin Wolf, Emanuele Giuseppe Esposito, Paolo Bonzini, and Peter Krempa

Modern storage evolved to keep pace with growing numbers of CPUs by providing multiple queues through which I/O requests can be submitted. This allows CPUs to submit I/O requests and handle completion interrupts locally. The result is good performance and scalability on machines with many CPUs.

Although virtio-blk devices in KVM guests have multiple queues by default, they do not take advantage of multi-queue on the host. I/O requests from all queues are processed in a single thread on the host for guests with the <driver io=native …> libvirt domain XML setting. This single thread can become a bottleneck for I/O bound workloads…read more

Generative AI fine-tuning of LLMs: Red Hat and Supermicro showcase outstanding results for efficient Llama-2-70b fine tuning using LoRA in MLPerf Training v4.0

July 26, 2024 Diane Feddema, Dr Nikola Nikolov

New generative AI (gen AI) training results were recently released by MLCommons in MLPerf Training v4.0. Red Hat, in collaboration with Supermicro, published outstanding MLPerf v4.0 Training results for fine-tuning of large language model (LLM) llama-2-70b with LoRA.

LoRA (Low-Rank Adaptation of LLMs) is a cost-saving parameter-efficient fine tuning method that can save many hours of training time and reduce compute requirements. LoRA allows you to fine tune a large model for your specific use case while updating only a small subset of parameters. Red Hat’s llama2-70b with LoRA submission on Supermicro hardware demonstrates the delivery of better performance, within 3.5% to 8.6% of submissions on similar hardware, while providing an improved developer, user and DevOps experience…read more

Unleashing 100GbE network efficiency: SR-IOV in Red Hat OpenShift on OpenStack

July 25, 2024 - Pradipta Sahoo

Single Root I/O Virtualization (SR-IOV) is a technology that allows the isolation of PCI Express (PCIe) resources for better network performance. In this article, we explore a recent study by the Red Hat OpenStack Performance and Scale team, which demonstrated the capabilities of SR-IOV using 100GbE NVIDIA ConnectX-6 adapters within a Red Hat OpenShift on Red Hat OpenStack (ShiftonStack) setup…read more

Scaling Red Hat OpenStack Platform 17.1 to more than 1000+ virtual nodes

July 9, 2024 - Asma Suhani Syed Hameed, Rajesh Pulapakula

As Red Hat OpenStack Platform has evolved in recent years to accommodate a diverse range of customer needs, the demand for scalability has become increasingly vital. Customers depend on Red Hat OpenStack Platform to deliver a resilient and adaptable cloud infrastructure, and as its usage expands, so does the necessity for deploying more extensive clusters.

Over the past years we have undertaken efforts to scale Red Hat Openstack Platform 16.1 to more than 700+ baremetal nodes. This year, the Red Hat Performance & Scale Team has dedicated itself to pushing Red Hat OpenStack's Platform scalability to unprecedented heights. As demand for scaling the Red Hat OpenStack Platform increased, we conducted an exercise to test the scalability of over 1000+ virtual computes. Testing such large scales typically requires substantial hardware resources for baremetal setups. In our endeavor, we achieved a new milestone by successfully scaling to over 1000+ overcloud nodes on Red Hat OpenStack Platform 17.1…read more

Sharing is caring: How to make the most of your GPUs (part 1 - time-slicing)

July 2, 2024 - Carlos Camacho, Kevin Pouget, David Gray, Will McGrath

As artificial intelligence (AI) applications continue to advance, organizations often face a common dilemma: a limited supply of powerful graphics processing unit (GPU) resources, coupled with an increasing demand for their utilization. In this article, we'll explore various strategies for optimizing GPU utilization via oversubscription across workloads in Red Hat OpenShift AI clusters. OpenShift AI is an integrated MLOps platform for building, training, deploying and monitoring predictive and generative AI (GenAI) models at scale across hybrid cloud environments…read more

Scale testing image-based upgrades for single node OpenShift

June 28, 2024 - Alex Kros

Image-based upgrades (IBU) are a developer preview feature in Red Hat OpenShift Container Platform 4.15 that reduce the time required to upgrade a single node OpenShift cluster. The image-based upgrade can perform both Z and Y stream upgrades, include operator upgrades in the image, and rollback to the previous version manually or automatically upon failure. Image-based upgrade can also directly upgrade OpenShift Container Platform 4.y to 4.y+2, whereas a traditional OpenShift upgrade would require two separate upgrades to achieve the same end result (4.Y to 4.Y+1 to 4.Y+2) …read more

How to create and scale 6,000 virtual machines in 7 hours with Red Hat OpenShift Virtualization

June 25, 2024 - Boaz Ben Shabat

In the world of organizational infrastructure, sometimes there’s an urgent need to rapidly scale up. Organizations may have a limited amount of time to stand up new infrastructure, a problem which is compounded by the size of the services in question.

In this learning path, we will explore a large-scale deployment scenario enabling 6,000 virtual machines (VMs) and 15,000 pods. This involves an external Red Hat® Ceph® Storage 12-node cluster and a Red Hat OpenShift® Virtualization 132 node cluster, integrated with an external Ceph Storage cluster…read more

Egress IP Scale Testing in OpenShift Container Platform

June 21, 2024 - Venkata Anil Kommaddi

This blog post explores how kube-burner-ocp, an opinionated wrapper designed on top of kube-burner can be used to simplify performance and scale testing, and leveraged to evaluate egress IP scalability in OpenShift’s default CNI plugin which is OVNKubernetes . We’ll delve into the intricacies of the egress IP feature, its role in traffic management, and how kube-burner and kube-burner-ocp are helping us understand its behavior under load. This blog also serves as a classic example of how the Red Hat Performance and Scale team works with the Development team to understand, test, characterize and improve features with a holistic approach, for the benefit of our customers who reply on OpenShift for their mission critical workloads on-prem and in the cloud…read more

IPsec Performance on Red Hat Enterprise Linux 9: A Performance Analysis of AES-GCM

June 13, 2024 - Otto Sabart, Adam Okuliar

In today's digital landscape, securing information over insecure channels is more crucial than ever. Traditionally, this vital task has been handled by specialized hardware devices known as concentrators, which come with considerable price tags. But what if you could achieve the same level of security and performance using readily available retail hardware? This article explores an exciting, cost-effective alternative: leveraging Red Hat Enterprise Linux 9 on modern, multicore CPUs. We'll dive into various configurations and encryption methods, and reveal how this approach can match the performance of high-end industrial devices. Astonishingly, it's possible to achieve 50 Gbps of IPsec AES-GCM with multiple security associations on commodity hardware…read more

Ensure a scalable and performant environment for ROSA with hosted control planes

May 30, 2024- Russell Zaleski Murali Krishnasamy, David Sanz Moreno, Mohit Sheth

Ensuring that OpenShift is performant and scalable is a core tenant of the OpenShift Performance and Scale team at Red Hat. Prior to its release (and still to this day), ROSA undergoes a vast array of performance and scale testing to ensure that it delivers industry leading performance. These tests run the gamut from control plane and data path focus, to upgrades, to network performance. These tests have been used to help measure and better the performance of “classic” ROSA, but what happens when we move to hosted control planes?... read more

Accelerating generative AI adoption: Red Hat OpenShift AI achieves impressive results in MLPerf inference benchmarks with vLLM runtime

April 24, 2024 - Mustafa Eyceoz, Michey Mehta, Diane Feddema, Ashish Kamra

Large Language Model (LLM) inference has emerged as a crucial technology lately, influencing how enterprises approach AI-driven solutions driving new interest in integrating LLMs into enterprise applications. But when deploying LLMs in production environments, performance becomes paramount, with throughput (measured in tokens generated per second) on a GPU serving as a key metric. In principle, a model with higher throughput can accommodate a large user base for a given hardware infrastructure, while meeting specific latency and accuracy requirements, which ultimately reduces the cost of model deployment for end-users…read more

Red Hat Enterprise Linux Performance Results on 5th Gen Intel® Xeon® Scalable Processors

April 4, 2024 - Bill Gray, David Dumas, Douglas Shakshober, Michey Mehta

Intel recently launched the 5th generation of Intel® Xeon® Scalable processors (Intel Xeon SP), code-named Emerald Rapids; a family of high-end, enterprise-focused processors targeted at a diverse range of workloads. To explore how Intel’s new chips measure up, we’ve worked with Intel and others to run benchmarks with Red Hat Enterprise Linux 8.8 / 9.2 and greater…read more

Optimizing Quay/Clair: Database profiling results

March 19, 2024 - Vishnu Challa

Welcome to the second part of our exploration. In this continuation from our previous article, we will delve deeper into the results of our database profiling efforts and discuss strategies for further optimizing overall application performance…read more

Optimizing Quay/Clair: Profiling, performance, and efficiency

March 19, 2024 - Vishnu Challa

Red Hat Quay (also offered as a service via Quay.io) is a cloud-based container registry service that allows users to store, manage, and distribute container images. It provides a platform for hosting, sharing, and securing container images across multiple environments, including on-premise data centers, public cloud platforms, and hybrid cloud deployments…read more

Save memory with OpenShift Virtualization using Free Page Reporting

March 13, 2024 - Robert Krawitz

OpenShift Virtualization, a feature of Red Hat OpenShift, allows running virtual machines (VMs) alongside containers on the same platform, simplifying management. It allows using VMs in containerized environments by running VMs the same way as any other pod, so that organizations with significant investment in virtualization or who desire the greater isolation provided by VMs with legacy workloads can use them in an orchestrated containerized environment…read more

Test Kubernetes performance and scale with kube-burner

March 4, 2024 - Sai Sindhur Malleni, Vishnu Challa, Raul Sevilla Canavate

Three years ago, we introduced kube-burner to the Kubernetes performance and scale communities. Since then, kube-burner has steadily continued its journey, adding a diverse range of features that help solve unique challenges in performing and analyzing results from performance and scale tests on Kubernetes and Red Hat OpenShift.

Over the last few years, multiple new features and usability improvements were added to kube-burner. In this article, we will go beyond the basics, exploring some new bells and whistles added to the tool recently and laying out our vision for the future…read more

5 ways we work to optimize Red Hat Satellite

March 4, 2024 - Imaanpreet Kaur, Jan Hutař, Pablo Mendez Hernandez

In the ever-evolving landscape of technology, the hunt for optimal performance and scale is a constant challenge. With Red Hat Satellite 6.13 and 6.14 versions, we have embarked on an exciting journey to push the boundaries and elevate our capabilities. In this article, we'll take you behind the scenes to explore how we work to enhance performance and scale our operations…read more

Best practices for OpenShift Data Foundation disaster recovery resource planning

February 22, 2024 - Elvir Kuric

Red Hat OpenShift Data Foundation is a key storage component of Red Hat OpenShift. It offers unified block, file, and object storage capabilities to support a wide range of applications.

One of the new exciting features in OpenShift Data Foundation 4.14 is OpenShift Data Foundation Regional Disaster Recovery (RDR), which offers RDR capabilities for Rados Block Device pools (RBD pools) and Ceph File System (CephFS) (via volsync replication) pools. With RBD images replicated between clusters, OpenShift Data Foundation RDR protects customers from catastrophic failures. With OpenShift Data Foundation RDR, we can…read more

DPDK latency in OpenShift - Part II

February 20, 2024 - Rafael Folco, Karl Rister, Andrew Theurer

In a previous article, we shared the results of the DPDK latency tests conducted on a Single Node Openshift (SNO) cluster. We were able to demonstrate that a packet can be transmitted and received back in only 3 µs, mostly under 7 µs and, in the worst case, 12 µs. These numbers represent the round trip latencies in Openshift for a single queue transmission of a 64 byte packet, forwarding packets using an Intel E810 dual port adapter…read more

Correlating QPS rate with resource utilization in self-managed Red Hat OpenShift with Hosted Control Planes

January 23, 2024 - Guoqing Li

The general availability of hosted control planes (HCP) for self-managed Red Hat OpenShift Virtualization (KubeVirt) is an exciting milestone. However, the true test lies in system performance and scalability, which are both crucial factors that determine success. Understanding and pushing these limits is essential for making informed decisions. This article offers a comprehensive analysis and general sizing insights for consolidating existing bare metal resources using hosted control planes for self-managed OpenShift Virtualization. It delves into the resource usage patterns of the hosted control planes, examining their relationship with the KubeAPIServer QPS rate. Through various experiments, we established the linear regression model between the KubeAPI Server QPS rate and CPU/Memory/ETCD storage utilization, providing valuable insights for efficient resource consolidation and node capacity planning…read more

Continuous performance and scale validation of Red Hat OpenShift AI model-serving stack

January 17, 2024 - Kevin Pouget

The model serving stack included in OpenShift AI is generally available (GA) as of December 2023 (release 2.5), meaning that OpenShift AI is fully operational to deploy and serve inference models via the KServe API. You may have read my colleague David Gray's article about the performance of this model serving stack for large language models (LLMs). This article provides a different look at that same model serving stack. It discusses how we stress tested the model deployment and model serving controllers to confirm that they perform and scale well in single-model, multi-model and many-model environments…read more

Kube-burner: Fanning the flames of innovation in the CNCF Sandbox

January 16, 2024 - Sai Sindhur Malleni

We are thrilled to share some exciting news with you all – kube-burner, the robust performance and scale testing tool for Kubernetes and Red Hat OpenShift, has officially achieved the CNCF Sandbox status! In the evolving landscape of cloud-native technologies, the Cloud Native Computing Foundation (CNCF) serves as a hub for incubating and nurturing innovative projects. Kube-burner has become the first and only performance and scale testing tool to attain this recognition, thereby elevating the importance of performance and scale in the cloud native landscape…read more

Evaluating LLM inference performance on Red Hat OpenShift AI

January 16, 2024 - David Gray

The generative artificial intelligence (AI) landscape has undergone rapid evolution over the past year. As the power of generative large language models (LLMs) grows, organizations increasingly seek to harness their capabilities to meet business needs. Because of the intense computational demands of running LLMs, deploying them on a performant and reliable platform is critical to making cost-effective use of the underlying hardware, especially GPUs.

This article introduces the methodology and results of performance testing the Llama-2 models deployed on the model serving stack included with Red Hat OpenShift AI. OpenShift AI is a flexible, scalable MLOps platform with tools to build, deploy and manage AI-enabled applications. Built using open source technologies, it provides trusted, operationally consistent capabilities for teams to experiment, serve models and deliver innovative apps…read more

Operating Tekton at scale: 10 lessons learned from Red Hat Trusted Application Pipeline

January 11, 2024 - Pradeep Surisetty, Gabe Montero, Pavel Macik, Ann Marie Fred, Jan Hutař

Red Hat Trusted Application Pipeline is built on top of Red Hat OpenShift Pipelines and its upstream project, Tekton. We use Tekton’s Pipelines as Code, Tekton Chains and Tekton Results capabilities to provide a more scalable build environment with an enhanced security posture to power Red Hat's next-generation build systems.

This blog shares some generic learnings applicable to Trusted Application Pipeline or OpenShift Pipelines and any large workload distributing system on OpenShift or Kubernetes…read more

Behind the scenes: Introducing OpenShift Virtualization Performance and Scale

January 9, 2024 - Jenifer Abrams

Red Hat OpenShift Virtualization helps to remove workload barriers by unifying virtual machine (VM) deployment and management alongside containerized applications in a cloud-native manner. As part of the larger Performance and Scale team, we have been deeply involved in the measurement and analysis of VMs running on OpenShift since the early days of the KubeVirt open source project and have helped to drive product maturity through new feature evaluation, workload tuning and scale testing. This article dives into several of our focus areas and shares additional insights into running and tuning VM workloads on OpenShift…read more

KrknChaos is joining CNCF Sandbox

January 9, 2024 - Naga Ravi Chaitanya Elluri, Brian Riordan, Pradeep Surisetty

We are excited to announce that krknChaos, a chaos engineering tool for Kubernetes focused on improving resilience and performance, has been accepted as a Sandbox project by the Cloud Native Computing Foundation (CNCF). Additional details can be found in the proposal. We would like to thank the TAG App Delivery team (to name a few, Josh Gavant and Karena Angell and team), the CNCF Technical Oversight Committee (TOC) for their invaluable guidance and support throughout the process and of course the team and community for their invaluable contributions which are key to making this happen…read more

Supercharging chaos testing using AI

January 8, 2024 - Naga Ravi Chaitanya Elluri, Mudit Verma, Sandeep Hans

There has been a huge increase in demand for running complex systems with tens to hundreds of microservices at massive scale. End users expect 24/7 availability of services they depend on, so even a few minutes of downtime matters. A proactive chaos engineer helps meet user expectations by identifying bottlenecks, hardening services before downtime occurs in a production environment. Chaos engineering is vital to avoid losing trust with your end users…read more

Quantifying performance of Red Hat OpenShift for Machine Learning (ML) Training on Supermicro A+ Servers with MLPerf Training v3.1

November 23, 2023 - Diane Feddema

We are proud to announce the first MLPerf Training submission on Red Hat OpenShift, which is also the first MLPerf training submission on a variant of Kubernetes. Red Hat collaborated with Supermicro on this submission and ran the benchmarks on a Supermicro GPU A+ Server with 8XH100 NVIDIAGPUs. Red Hat OpenShift helps make it easier to run, reproduce and monitor AI/ML workloads, while adding minimal overhead to your training jobs. In this blog we provide the performance numbers of our recent submission to MLPerf v3.1 training...read more

OpenShift Cluster Manager API: Load-testing, breaking, and improving it

October 26, 2023 - Vicente Zepeda Mas

Red Hat OpenShift Cluster Manager (OCM) is a managed service where you can install, modify, operate, and upgrade your Red Hat OpenShift clusters. Because OCM is a cornerstone of Red Hat’s hybrid cloud strategy, we strive to make sure that it is scalable enough to handle peak traffic and find bottlenecks that can be fixed to present a satisfying experience for our customers. The Performance & Scale team discovered a performance problem that affected a core component of the API. In this blog post, we discuss how we identified the problem, how we worked as a cross-functional team to identify and fix it, and the measures we implemented to prevent similar incidents from happening in the future...read more

Data Plane Development Kit (DPDK) latency in Red Hat OpenShift - Part I

October 11, 2023 - Rafael Folco, Karl Rister, Andrew Theurer

In this article, we present the results of DPDK latency tests conducted on a single node OpenShift (SNO) cluster. The tests were performed using the traffic generator MoonGen, which utilizes the hardware timestamping support for measuring packet latencies as they pass through the network adapters. The results of these tests provide insights into the performance of DPDK in a real-world environment and offer guidance for network architects and administrators seeking to optimize network latency…read more

Running 2500 pods per node on OCP 4.13

August 22, 2023 - Andrew Collins

There is a default PPN of 250. Customers who exceed 250 ask whether they can scale beyond the published maximum of 500. They ask, "How can we better utilize the capacity of our large bare metal machines?"...read more

Bulk API in Automation Controller

August 9, 2023 - Nikhil Jain

Automation controller has a rich ReSTful API. REST stands for Representational State Transfer and is sometimes spelled as “ReST”. It relies on a stateless, client-server, and cacheable communications protocol, usually the HTTP protocol. REST APIs provide access to resources (data entities) via URI paths. You can visit the automation controller REST API in a web browser at: http://<server name>/api/...read more

Red Hat Enterprise Linux achieves significant performance gains with Intel's 4th Generation Xeon Scalable Processors

April 20, 2023 - Michey Mehta, Bill Gray, David Dumas, Douglas Shakshober

Intel recently launched the 4th generation of Intel® Xeon® Scalable processors, a family of high-end, enterprise-focused processors targeted at a diverse range of workloads. To explore how Intel’s new chips measure up, we’ve worked with Intel and others to run benchmarks with Red Hat Enterprise Linux 8.4, 8.5, 8.6, 9.0 and 9.1, as well as CentOS Stream 9.2 (which will become Red Hat Enterprise Linux 9.2)...read more

OpenShift/Kubernetes Chaos Stories

March 15, 2023 - Naga Ravi Chaitanya Elluri

With the increase in adoption and reliance on digital technology and microservices architecture, the uptime of an application has never been more important. Downtime of even a few minutes can lead to huge revenue loss and most importantly trust. This is exactly why we proactively focus on identifying bottlenecks and improving the resilience and performance of OpenShift under chaotic conditions…read more

Enhancing/Maximizing your Scaling capability with Automation Controller 2.3

March 13, 2023 - Nikhil Jain

Red Hat Ansible Automation Platform 2 is the next generation automation platform from Red Hat’s trusted enterprise technology experts. We are excited to announce that the Ansible Automation Platform 2.3 release includes automation controller 4.3…read more

Red Hat new Benchmark results on AMD EPYC4 (Genoa) processors

January 6, 2023 - Red Hat Performance Team

Red Hat has continued to work with our partners to better enable world class performance. Recently, AMD released its EPYC "Genoa" 4th Gen Data Center CPU, known as the AMD EPYC 9004 Series. With a die size of 5nm, AMD increased the core count to 96 cpu and 192 threads / socket with 384 MB L3 cache size…read more

A Guide to Scaling OpenShift Data Science to Hundreds of Users and Notebooks

December 13, 2022 - by Kevin Pouget

Red Hat OpenShift Data Science provides a fully managed cloud service environment for data scientists and developers of intelligent applications. It offers a fully supported environment in which to rapidly develop, train, and test machine learning (ML) models before deploying in production…read more

Run Windows workloads on OpenShift Container Platform

November 30, 2022 - by Krishna Harsha Voora, Venkata Anil Kommaddi, Sai Sindhur Malleni

OpenShift helps bring the power of cloud-native and containerization to your applications, no matter what underlying operating systems they rely on. For use cases that require both Linux and Windows workloads, Red Hat OpenShift allows you to deploy Windows workloads running on Windows server while also supporting traditional Linux workloads…read more

A Guide to Functional and Performance Testing of the NVIDIA DGX A100

June 23, 2022 - by Kevin Pouget

This blog post, part of a series on the DGX-A100 OpenShift launch, presents the functional and performance assessment we performed to validate the behavior of the DGX™ A100 system, including its eight NVIDIA A100 GPUs. This study was performed on OpenShift 4.9 with the GPU computing stack deployed by NVIDIA GPU Operator v1.9…read more

Scaling Automation Controller for API Driven Workloads

June 20, 2022 - Elijah Delee

When scaling automation controller in an enterprise organization, administrators are faced with more clients automating their interactions with its REST API. As with any web application, automation controller has a finite capacity to serve web requests, and web clients can…read more

Performance Improvements in Automation Controller 4.1

February 28, 2022 - Nikhil Jain

With the release of Ansible Automation Platform 2.1, users now have access to the latest control plane – automation controller 4.1. Automation controller 4.1 provides significant performance improvements when compared to its predecessor Ansible Tower 3.8. To put this into context, we used Ansible Tower 3.8 to run jobs, capture various metrics…read more

The Curious Case of the CPU Eating Gunicorn

June 2, 2022 - Gonza Rafuls

We decided to take a first try hands-on approach following the future QUADS roadmap and re-architect our legacy landing/requests portal application, previously a trusty LAMP stack, into a completely rewritten Flask / SQLAlchemy / Gunicorn / Nginx next-gen platform…read more

Entitlement-Free Deployment of the NVIDIA GPU Operator on OpenShift

December 14, 2021 - Kevin Pouget

Version 1.9.0 of the GPU Operator has just landed in OpenShift OperatorHub, with many different updates. We're proud to announce that this version comes with the support of the entitlement-free deployment of NVIDIA GPU Driver…read more

Red Hat collaborates with NVIDIA to deliver record-breaking STAC-A2 Market Risk benchmark

November 9, 2021 - Sebastian Jug

We are happy to announce a record-breaking performance with NVIDIA in the STAC-A2 benchmark, affirming Red Hat OpenShift's ability to run compute heavy, high performance workloads. The Securities Technology Analysis Center (STAC®) facilitates a large group of financial firms and technology vendors…read more

Red Hat Satellite 6.9 with Puma Web Server

September 15, 2021 - Imaanpreet Kaur

Until Red Hat Satellite 6.8, the Passenger web/app server was a core component of Red Hat Satellite. Satellite used Passenger to run Ruby applications such as Foreman. Satellite 6.9 is no longer using the Passenger web server. The Foreman application (main UI and API server) was ported to use the Puma project…read more

Using NVIDIA A100’s Multi-Instance GPU to Run Multiple Workloads in Parallel on a Single GPU

August 26, 2021 - Kevin Pouget

The new Multi-Instance GPU (MIG) feature lets GPUs based on the NVIDIA Ampere architecture run multiple GPU-accelerated CUDA applications in parallel in a fully isolated way. The compute units of the GPU, as well as its memory, can be partitioned into multiple MIG instances…read more

Multi-Instance GPU Support with the GPU Operator v1.7.0

June 15, 2021 - Kevin Pouget

Version 1.7.0 of the GPU Operator has just landed in OpenShift OperatorHub, with many different updates. We are proud to announce that this version comes with the support of the NVIDIA Multi-Instance GPU (MIG) feature for the A100 and A30 Ampere cards…read more

Making Chaos Part of Kubernetes/OpenShift Performance and Scalability Tests

March 17, 2021 - Naga Ravi Chaitanya Elluri

While we know how important performance and scale are, how can we engineer for it when chaos becomes common in complex systems? What role does Chaos/Resiliency testing play during Performance and Scalability evaluation? Let’s look at the methodology that we need to embrace to mimic a real world production environment to find the bottlenecks and fix them before it impacts the users and customers…read more

Demonstrating Performance Capabilities of Red Hat OpenShift for Running Scientific HPC Workloads

November 11, 2020 - David Gray and Kevin Pouget

This blog post is a follow-up to the previous blog post on running GROMACS on Red Hat OpenShift Container Platform (OCP) using the Lustre filesystem. In this post, we will show how we ran two scientific HPC workloads on a 38-node OpenShift cluster using CephFS with OpenShift Container Storage in external mode…read more

A Complete Guide for Running Specfem Scientific HPC Workload on Red Hat OpenShift

November 11, 2020 - Kevin Pouget

Specfem3D_Globe is a scientific high-performance computing (HPC) code that simulates seismic wave propagation, at a global or regional scale (website and repository). It relies on a 3D crustal model and takes into account parameters such as the Earth density, topography/bathymetry, rotation, oceans, or self-gravitation…read more

Running HPC workloads with Red Hat OpenShift Using MPI and Lustre Filesystem

October 29, 2020 - David Gray

The requirements associated with data science and AI/ML applications have pushed organizations toward using highly parallel and scalable hardware that often resemble high performance computing (HPC) infrastructure. HPC has been around for a while and has evolved to include ultra large supercomputers that run massively parallel tasks and operate at exascale (able to perform a billion billion operations per second)...read more

Introduction to Kraken, a Chaos Tool for OpenShift/Kubernetes

October 8, 2020 - Yashashree Suresh and Paige Rubendall

Chaos engineering helps in boosting confidence in a system's resilience by “breaking things on purpose.” While it may seem counterintuitive, it is crucial to deliberately inject failures into a complex system like OpenShift/Kubernetes and check whether the system recovers gracefully…read more

About the author

Red Hat Performance Team

Red Hat Performance and Scale Engineering pushes Red Hat products to their limits. Every day we strive to reach greater performance for our customer workloads and scale the products to new levels. Our performance engineers benchmark configurations that range from far edge telco use cases to large scale cloud environments.

We work closely with developers early in the development process to validate that their software design will perform and scale well. We also collaborate with hardware and software partners to ensure that our software is performing and scaling well with their technology, and we engage with customers on innovative deployments where we can apply our expertise to help them get the best performance and scale for their workloads.

We work across the Red Hat product portfolio on a multitude of product configurations and use cases for large scale hybrid cloud environments—including edge-enabled solutions and products, next-generation 5G networks, software-defined vehicles and more.

Read full bio

Browse by channel

Explore all channels

Platform products

Try & buy

Featured

By industry

Featured

Topics

Articles

More to explore

For customers

For partners

About us

Open source

Company details

Recommendations

Select a language

Select a language

Red Hat Performance and Scale Engineering

October 4, 2024 Guoqing Li

September 10, 2024 Sanjay Rao, Stefan Hajnoczi

September 4, 2024 Jenifer Abrams

September 5, 2024 Stefan Hajnoczi Kevin Wolf, Emanuele Giuseppe Esposito, Paolo Bonzini, and Peter Krempa

July 26, 2024 Diane Feddema, Dr Nikola Nikolov

July 25, 2024 - Pradipta Sahoo

July 9, 2024 - Asma Suhani Syed Hameed, Rajesh Pulapakula

July 2, 2024 - Carlos Camacho, Kevin Pouget, David Gray, Will McGrath

June 28, 2024 - Alex Kros

June 25, 2024 - Boaz Ben Shabat

June 21, 2024 - Venkata Anil Kommaddi

June 13, 2024 - Otto Sabart, Adam Okuliar

May 30, 2024- Russell Zaleski Murali Krishnasamy, David Sanz Moreno, Mohit Sheth

April 24, 2024 - Mustafa Eyceoz, Michey Mehta, Diane Feddema, Ashish Kamra

April 4, 2024 - Bill Gray, David Dumas, Douglas Shakshober, Michey Mehta

March 19, 2024 - Vishnu Challa

March 19, 2024 - Vishnu Challa

March 13, 2024 - Robert Krawitz

March 4, 2024 - Sai Sindhur Malleni, Vishnu Challa, Raul Sevilla Canavate

March 4, 2024 - Imaanpreet Kaur, Jan Hutař, Pablo Mendez Hernandez

February 22, 2024 - Elvir Kuric

February 20, 2024 - Rafael Folco, Karl Rister, Andrew Theurer

January 23, 2024 - Guoqing Li

January 17, 2024 - Kevin Pouget

January 16, 2024 - Sai Sindhur Malleni

January 16, 2024 - David Gray

January 11, 2024 - Pradeep Surisetty, Gabe Montero, Pavel Macik, Ann Marie Fred, Jan Hutař

January 9, 2024 - Jenifer Abrams

January 9, 2024 - Naga Ravi Chaitanya Elluri, Brian Riordan, Pradeep Surisetty

January 8, 2024 - Naga Ravi Chaitanya Elluri, Mudit Verma, Sandeep Hans

November 23, 2023 - Diane Feddema

October 26, 2023 - Vicente Zepeda Mas

October 11, 2023 - Rafael Folco, Karl Rister, Andrew Theurer

August 22, 2023 - Andrew Collins

August 9, 2023 - Nikhil Jain

April 20, 2023 - Michey Mehta, Bill Gray, David Dumas, Douglas Shakshober

March 15, 2023 - Naga Ravi Chaitanya Elluri

March 13, 2023 - Nikhil Jain

January 6, 2023 - Red Hat Performance Team

December 13, 2022 - by Kevin Pouget

November 30, 2022 - by Krishna Harsha Voora, Venkata Anil Kommaddi, Sai Sindhur Malleni

June 23, 2022 - by Kevin Pouget

June 20, 2022 - Elijah Delee

February 28, 2022 - Nikhil Jain

June 2, 2022 - Gonza Rafuls

December 14, 2021 - Kevin Pouget

November 9, 2021 - Sebastian Jug

September 15, 2021 - Imaanpreet Kaur

August 26, 2021 - Kevin Pouget

June 15, 2021 - Kevin Pouget

March 17, 2021 - Naga Ravi Chaitanya Elluri

November 11, 2020 - David Gray and Kevin Pouget

November 11, 2020 - Kevin Pouget

October 29, 2020 - David Gray

October 8, 2020 - Yashashree Suresh and Paige Rubendall

About the author

Red Hat Performance Team

More like this

Browse by channel