Red Hat's most recent posts about Performance, Scale, Chaos and more.
A step by step guide to setting up OCP Virtualization on hyperconverged ODF and deploy 10K VMs
October 4, 2024 Guoqing Li
OpenShift Virtualization enables running VMs alongside pods within the same cluster which paves a path for infrastructure modernization. Using hyperconverged ODF as the backend storage for OpenShift Virtualization allows your VMs to exist on the same node where storage is attached, and provides the potential to max out hardware resource utilization and cost savings. This document provides the detailed step by step guide to set up OCP-Virt (OpenShift Virtualization) VMs backed by a hyperconverged ODF (OpenShift Data Foundation) storage system. At the end of this guide, we show you how to create thousands of VMs using a one-line bash command and we also include some scale data comparing PVC and VolumeSnapshot cloning and VM boot time for both Windows and RHEL 9 VMs...read more
Virtualized database I/O performance improvements in RHEL 9.4
September 10, 2024 Sanjay Rao, Stefan Hajnoczi
Databases are sensitive to disk I/O performance. The IOThread Virtqueue Mapping feature introduced in the QEMU virtual machine monitor in Red Hat Enterprise Linux (RHEL) 9.4 is designed to improve disk I/O performance for workloads that submit I/O from many vCPUs. In this article, we will look at how the IOThread Virtqueue Mapping feature can boost performance for database workloads running in QEMU/KVM guests.While RHEL guests have supported multi-queue virtio-blk devices for some time, the QEMU virtual machine monitor handled I/O requests in a single thread. This limits the performance benefit of having multiple queues because the single thread can become a bottleneck.…read more
Use kube-burner to measure Red Hat OpenShift VM and storage deployment at scale
September 4, 2024 Jenifer Abrams
Scale testing is critical for understanding how a cluster will hold up under production load. Generally, you may want to scale test to reach a certain max density as the end goal, but it is often also useful to scale up from smaller batch sizes to observe how performance may change as the overall cluster becomes more loaded. Those of us that work in the area of performance analysis know there are many ways to measure a workload and standardizing on a tool can help provide more comparable results across different configurations and environments... read more
Scaling virtio-blk disk I/O with IOThread Virtqueue Mapping
September 5, 2024 Stefan Hajnoczi Kevin Wolf, Emanuele Giuseppe Esposito, Paolo Bonzini, and Peter Krempa
Modern storage evolved to keep pace with growing numbers of CPUs by providing multiple queues through which I/O requests can be submitted. This allows CPUs to submit I/O requests and handle completion interrupts locally. The result is good performance and scalability on machines with many CPUs.
Although virtio-blk devices in KVM guests have multiple queues by default, they do not take advantage of multi-queue on the host. I/O requests from all queues are processed in a single thread on the host for guests with the <driver io=native …> libvirt domain XML setting. This single thread can become a bottleneck for I/O bound workloads…read more
Generative AI fine-tuning of LLMs: Red Hat and Supermicro showcase outstanding results for efficient Llama-2-70b fine tuning using LoRA in MLPerf Training v4.0
July 26, 2024 Diane Feddema, Dr Nikola Nikolov
New generative AI (gen AI) training results were recently released by MLCommons in MLPerf Training v4.0. Red Hat, in collaboration with Supermicro, published outstanding MLPerf v4.0 Training results for fine-tuning of large language model (LLM) llama-2-70b with LoRA.
LoRA (Low-Rank Adaptation of LLMs) is a cost-saving parameter-efficient fine tuning method that can save many hours of training time and reduce compute requirements. LoRA allows you to fine tune a large model for your specific use case while updating only a small subset of parameters. Red Hat’s llama2-70b with LoRA submission on Supermicro hardware demonstrates the delivery of better performance, within 3.5% to 8.6% of submissions on similar hardware, while providing an improved developer, user and DevOps experience…read more
Unleashing 100GbE network efficiency: SR-IOV in Red Hat OpenShift on OpenStack
July 25, 2024 - Pradipta Sahoo
Single Root I/O Virtualization (SR-IOV) is a technology that allows the isolation of PCI Express (PCIe) resources for better network performance. In this article, we explore a recent study by the Red Hat OpenStack Performance and Scale team, which demonstrated the capabilities of SR-IOV using 100GbE NVIDIA ConnectX-6 adapters within a Red Hat OpenShift on Red Hat OpenStack (ShiftonStack) setup…read more
Scaling Red Hat OpenStack Platform 17.1 to more than 1000+ virtual nodes
July 9, 2024 - Asma Suhani Syed Hameed, Rajesh Pulapakula
As Red Hat OpenStack Platform has evolved in recent years to accommodate a diverse range of customer needs, the demand for scalability has become increasingly vital. Customers depend on Red Hat OpenStack Platform to deliver a resilient and adaptable cloud infrastructure, and as its usage expands, so does the necessity for deploying more extensive clusters.
Over the past years we have undertaken efforts to scale Red Hat Openstack Platform 16.1 to more than 700+ baremetal nodes. This year, the Red Hat Performance & Scale Team has dedicated itself to pushing Red Hat OpenStack's Platform scalability to unprecedented heights. As demand for scaling the Red Hat OpenStack Platform increased, we conducted an exercise to test the scalability of over 1000+ virtual computes. Testing such large scales typically requires substantial hardware resources for baremetal setups. In our endeavor, we achieved a new milestone by successfully scaling to over 1000+ overcloud nodes on Red Hat OpenStack Platform 17.1…read more
Sharing is caring: How to make the most of your GPUs (part 1 - time-slicing)
July 2, 2024 - Carlos Camacho, Kevin Pouget, David Gray, Will McGrath
As artificial intelligence (AI) applications continue to advance, organizations often face a common dilemma: a limited supply of powerful graphics processing unit (GPU) resources, coupled with an increasing demand for their utilization. In this article, we'll explore various strategies for optimizing GPU utilization via oversubscription across workloads in Red Hat OpenShift AI clusters. OpenShift AI is an integrated MLOps platform for building, training, deploying and monitoring predictive and generative AI (GenAI) models at scale across hybrid cloud environments…read more
Scale testing image-based upgrades for single node OpenShift
June 28, 2024 - Alex Kros
Image-based upgrades (IBU) are a developer preview feature in Red Hat OpenShift Container Platform 4.15 that reduce the time required to upgrade a single node OpenShift cluster. The image-based upgrade can perform both Z and Y stream upgrades, include operator upgrades in the image, and rollback to the previous version manually or automatically upon failure. Image-based upgrade can also directly upgrade OpenShift Container Platform 4.y to 4.y+2, whereas a traditional OpenShift upgrade would require two separate upgrades to achieve the same end result (4.Y to 4.Y+1 to 4.Y+2) …read more
How to create and scale 6,000 virtual machines in 7 hours with Red Hat OpenShift Virtualization
June 25, 2024 - Boaz Ben Shabat
In the world of organizational infrastructure, sometimes there’s an urgent need to rapidly scale up. Organizations may have a limited amount of time to stand up new infrastructure, a problem which is compounded by the size of the services in question.
In this learning path, we will explore a large-scale deployment scenario enabling 6,000 virtual machines (VMs) and 15,000 pods. This involves an external Red Hat® Ceph® Storage 12-node cluster and a Red Hat OpenShift® Virtualization 132 node cluster, integrated with an external Ceph Storage cluster…read more
Egress IP Scale Testing in OpenShift Container Platform
June 21, 2024 - Venkata Anil Kommaddi
This blog post explores how kube-burner-ocp, an opinionated wrapper designed on top of kube-burner can be used to simplify performance and scale testing, and leveraged to evaluate egress IP scalability in OpenShift’s default CNI plugin which is OVNKubernetes . We’ll delve into the intricacies of the egress IP feature, its role in traffic management, and how kube-burner and kube-burner-ocp are helping us understand its behavior under load. This blog also serves as a classic example of how the Red Hat Performance and Scale team works with the Development team to understand, test, characterize and improve features with a holistic approach, for the benefit of our customers who reply on OpenShift for their mission critical workloads on-prem and in the cloud…read more
IPsec Performance on Red Hat Enterprise Linux 9: A Performance Analysis of AES-GCM
June 13, 2024 - Otto Sabart, Adam Okuliar
In today's digital landscape, securing information over insecure channels is more crucial than ever. Traditionally, this vital task has been handled by specialized hardware devices known as concentrators, which come with considerable price tags. But what if you could achieve the same level of security and performance using readily available retail hardware? This article explores an exciting, cost-effective alternative: leveraging Red Hat Enterprise Linux 9 on modern, multicore CPUs. We'll dive into various configurations and encryption methods, and reveal how this approach can match the performance of high-end industrial devices. Astonishingly, it's possible to achieve 50 Gbps of IPsec AES-GCM with multiple security associations on commodity hardware…read more
Ensure a scalable and performant environment for ROSA with hosted control planes
May 30, 2024- Russell Zaleski Murali Krishnasamy, David Sanz Moreno, Mohit Sheth
Ensuring that OpenShift is performant and scalable is a core tenant of the OpenShift Performance and Scale team at Red Hat. Prior to its release (and still to this day), ROSA undergoes a vast array of performance and scale testing to ensure that it delivers industry leading performance. These tests run the gamut from control plane and data path focus, to upgrades, to network performance. These tests have been used to help measure and better the performance of “classic” ROSA, but what happens when we move to hosted control planes?... read more
Accelerating generative AI adoption: Red Hat OpenShift AI achieves impressive results in MLPerf inference benchmarks with vLLM runtime
April 24, 2024 - Mustafa Eyceoz, Michey Mehta, Diane Feddema, Ashish Kamra
Large Language Model (LLM) inference has emerged as a crucial technology lately, influencing how enterprises approach AI-driven solutions driving new interest in integrating LLMs into enterprise applications. But when deploying LLMs in production environments, performance becomes paramount, with throughput (measured in tokens generated per second) on a GPU serving as a key metric. In principle, a model with higher throughput can accommodate a large user base for a given hardware infrastructure, while meeting specific latency and accuracy requirements, which ultimately reduces the cost of model deployment for end-users…read more
Red Hat Enterprise Linux Performance Results on 5th Gen Intel® Xeon® Scalable Processors
April 4, 2024 - Bill Gray, David Dumas, Douglas Shakshober, Michey Mehta
Intel recently launched the 5th generation of Intel® Xeon® Scalable processors (Intel Xeon SP), code-named Emerald Rapids; a family of high-end, enterprise-focused processors targeted at a diverse range of workloads. To explore how Intel’s new chips measure up, we’ve worked with Intel and others to run benchmarks with Red Hat Enterprise Linux 8.8 / 9.2 and greater…read more
Optimizing Quay/Clair: Database profiling results
March 19, 2024 - Vishnu Challa
Welcome to the second part of our exploration. In this continuation from our previous article, we will delve deeper into the results of our database profiling efforts and discuss strategies for further optimizing overall application performance…read more
Optimizing Quay/Clair: Profiling, performance, and efficiency
March 19, 2024 - Vishnu Challa
Red Hat Quay (also offered as a service via Quay.io) is a cloud-based container registry service that allows users to store, manage, and distribute container images. It provides a platform for hosting, sharing, and securing container images across multiple environments, including on-premise data centers, public cloud platforms, and hybrid cloud deployments…read more
Save memory with OpenShift Virtualization using Free Page Reporting
March 13, 2024 - Robert Krawitz
OpenShift Virtualization, a feature of Red Hat OpenShift, allows running virtual machines (VMs) alongside containers on the same platform, simplifying management. It allows using VMs in containerized environments by running VMs the same way as any other pod, so that organizations with significant investment in virtualization or who desire the greater isolation provided by VMs with legacy workloads can use them in an orchestrated containerized environment…read more
Test Kubernetes performance and scale with kube-burner
March 4, 2024 - Sai Sindhur Malleni, Vishnu Challa, Raul Sevilla Canavate
Three years ago, we introduced kube-burner to the Kubernetes performance and scale communities. Since then, kube-burner has steadily continued its journey, adding a diverse range of features that help solve unique challenges in performing and analyzing results from performance and scale tests on Kubernetes and Red Hat OpenShift.
Over the last few years, multiple new features and usability improvements were added to kube-burner. In this article, we will go beyond the basics, exploring some new bells and whistles added to the tool recently and laying out our vision for the future…read more
5 ways we work to optimize Red Hat Satellite
March 4, 2024 - Imaanpreet Kaur, Jan Hutař, Pablo Mendez Hernandez
In the ever-evolving landscape of technology, the hunt for optimal performance and scale is a constant challenge. With Red Hat Satellite 6.13 and 6.14 versions, we have embarked on an exciting journey to push the boundaries and elevate our capabilities. In this article, we'll take you behind the scenes to explore how we work to enhance performance and scale our operations…read more
Best practices for OpenShift Data Foundation disaster recovery resource planning
February 22, 2024 - Elvir Kuric
Red Hat OpenShift Data Foundation is a key storage component of Red Hat OpenShift. It offers unified block, file, and object storage capabilities to support a wide range of applications.
One of the new exciting features in OpenShift Data Foundation 4.14 is OpenShift Data Foundation Regional Disaster Recovery (RDR), which offers RDR capabilities for Rados Block Device pools (RBD pools) and Ceph File System (CephFS) (via volsync replication) pools. With RBD images replicated between clusters, OpenShift Data Foundation RDR protects customers from catastrophic failures. With OpenShift Data Foundation RDR, we can…read more
DPDK latency in OpenShift - Part II
February 20, 2024 - Rafael Folco, Karl Rister, Andrew Theurer
In a previous article, we shared the results of the DPDK latency tests conducted on a Single Node Openshift (SNO) cluster. We were able to demonstrate that a packet can be transmitted and received back in only 3 µs, mostly under 7 µs and, in the worst case, 12 µs. These numbers represent the round trip latencies in Openshift for a single queue transmission of a 64 byte packet, forwarding packets using an Intel E810 dual port adapter…read more
Correlating QPS rate with resource utilization in self-managed Red Hat OpenShift with Hosted Control Planes
January 23, 2024 - Guoqing Li
The general availability of hosted control planes (HCP) for self-managed Red Hat OpenShift Virtualization (KubeVirt) is an exciting milestone. However, the true test lies in system performance and scalability, which are both crucial factors that determine success. Understanding and pushing these limits is essential for making informed decisions. This article offers a comprehensive analysis and general sizing insights for consolidating existing bare metal resources using hosted control planes for self-managed OpenShift Virtualization. It delves into the resource usage patterns of the hosted control planes, examining their relationship with the KubeAPIServer QPS rate. Through various experiments, we established the linear regression model between the KubeAPI Server QPS rate and CPU/Memory/ETCD storage utilization, providing valuable insights for efficient resource consolidation and node capacity planning…read more
Continuous performance and scale validation of Red Hat OpenShift AI model-serving stack
January 17, 2024 - Kevin Pouget
The model serving stack included in OpenShift AI is generally available (GA) as of December 2023 (release 2.5), meaning that OpenShift AI is fully operational to deploy and serve inference models via the KServe API. You may have read my colleague David Gray's article about the performance of this model serving stack for large language models (LLMs). This article provides a different look at that same model serving stack. It discusses how we stress tested the model deployment and model serving controllers to confirm that they perform and scale well in single-model, multi-model and many-model environments…read more
Kube-burner: Fanning the flames of innovation in the CNCF Sandbox
January 16, 2024 - Sai Sindhur Malleni
We are thrilled to share some exciting news with you all – kube-burner, the robust performance and scale testing tool for Kubernetes and Red Hat OpenShift, has officially achieved the CNCF Sandbox status! In the evolving landscape of cloud-native technologies, the Cloud Native Computing Foundation (CNCF) serves as a hub for incubating and nurturing innovative projects. Kube-burner has become the first and only performance and scale testing tool to attain this recognition, thereby elevating the importance of performance and scale in the cloud native landscape…read more
Evaluating LLM inference performance on Red Hat OpenShift AI
January 16, 2024 - David Gray
The generative artificial intelligence (AI) landscape has undergone rapid evolution over the past year. As the power of generative large language models (LLMs) grows, organizations increasingly seek to harness their capabilities to meet business needs. Because of the intense computational demands of running LLMs, deploying them on a performant and reliable platform is critical to making cost-effective use of the underlying hardware, especially GPUs.
This article introduces the methodology and results of performance testing the Llama-2 models deployed on the model serving stack included with Red Hat OpenShift AI. OpenShift AI is a flexible, scalable MLOps platform with tools to build, deploy and manage AI-enabled applications. Built using open source technologies, it provides trusted, operationally consistent capabilities for teams to experiment, serve models and deliver innovative apps…read more
Operating Tekton at scale: 10 lessons learned from Red Hat Trusted Application Pipeline
January 11, 2024 - Pradeep Surisetty, Gabe Montero, Pavel Macik, Ann Marie Fred, Jan Hutař
Red Hat Trusted Application Pipeline is built on top of Red Hat OpenShift Pipelines and its upstream project, Tekton. We use Tekton’s Pipelines as Code, Tekton Chains and Tekton Results capabilities to provide a more scalable build environment with an enhanced security posture to power Red Hat's next-generation build systems.
This blog shares some generic learnings applicable to Trusted Application Pipeline or OpenShift Pipelines and any large workload distributing system on OpenShift or Kubernetes…read more
Behind the scenes: Introducing OpenShift Virtualization Performance and Scale
January 9, 2024 - Jenifer Abrams
Red Hat OpenShift Virtualization helps to remove workload barriers by unifying virtual machine (VM) deployment and management alongside containerized applications in a cloud-native manner. As part of the larger Performance and Scale team, we have been deeply involved in the measurement and analysis of VMs running on OpenShift since the early days of the KubeVirt open source project and have helped to drive product maturity through new feature evaluation, workload tuning and scale testing. This article dives into several of our focus areas and shares additional insights into running and tuning VM workloads on OpenShift…read more
KrknChaos is joining CNCF Sandbox
January 9, 2024 - Naga Ravi Chaitanya Elluri, Brian Riordan, Pradeep Surisetty
We are excited to announce that krknChaos, a chaos engineering tool for Kubernetes focused on improving resilience and performance, has been accepted as a Sandbox project by the Cloud Native Computing Foundation (CNCF). Additional details can be found in the proposal. We would like to thank the TAG App Delivery team (to name a few, Josh Gavant and Karena Angell and team), the CNCF Technical Oversight Committee (TOC) for their invaluable guidance and support throughout the process and of course the team and community for their invaluable contributions which are key to making this happen…read more
Supercharging chaos testing using AI
January 8, 2024 - Naga Ravi Chaitanya Elluri, Mudit Verma, Sandeep Hans
There has been a huge increase in demand for running complex systems with tens to hundreds of microservices at massive scale. End users expect 24/7 availability of services they depend on, so even a few minutes of downtime matters. A proactive chaos engineer helps meet user expectations by identifying bottlenecks, hardening services before downtime occurs in a production environment. Chaos engineering is vital to avoid losing trust with your end users…read more
Quantifying performance of Red Hat OpenShift for Machine Learning (ML) Training on Supermicro A+ Servers with MLPerf Training v3.1
November 23, 2023 - Diane Feddema
We are proud to announce the first MLPerf Training submission on Red Hat OpenShift, which is also the first MLPerf training submission on a variant of Kubernetes. Red Hat collaborated with Supermicro on this submission and ran the benchmarks on a Supermicro GPU A+ Server with 8XH100 NVIDIAGPUs. Red Hat OpenShift helps make it easier to run, reproduce and monitor AI/ML workloads, while adding minimal overhead to your training jobs. In this blog we provide the performance numbers of our recent submission to MLPerf v3.1 training...read more
OpenShift Cluster Manager API: Load-testing, breaking, and improving it
October 26, 2023 - Vicente Zepeda Mas
Red Hat OpenShift Cluster Manager (OCM) is a managed service where you can install, modify, operate, and upgrade your Red Hat OpenShift clusters. Because OCM is a cornerstone of Red Hat’s hybrid cloud strategy, we strive to make sure that it is scalable enough to handle peak traffic and find bottlenecks that can be fixed to present a satisfying experience for our customers. The Performance & Scale team discovered a performance problem that affected a core component of the API. In this blog post, we discuss how we identified the problem, how we worked as a cross-functional team to identify and fix it, and the measures we implemented to prevent similar incidents from happening in the future...read more
Data Plane Development Kit (DPDK) latency in Red Hat OpenShift - Part I
October 11, 2023 - Rafael Folco, Karl Rister, Andrew Theurer
In this article, we present the results of DPDK latency tests conducted on a single node OpenShift (SNO) cluster. The tests were performed using the traffic generator MoonGen, which utilizes the hardware timestamping support for measuring packet latencies as they pass through the network adapters. The results of these tests provide insights into the performance of DPDK in a real-world environment and offer guidance for network architects and administrators seeking to optimize network latency…read more
Running 2500 pods per node on OCP 4.13
August 22, 2023 - Andrew Collins
There is a default PPN of 250. Customers who exceed 250 ask whether they can scale beyond the published maximum of 500. They ask, "How can we better utilize the capacity of our large bare metal machines?"...read more
Bulk API in Automation Controller
August 9, 2023 - Nikhil Jain
Automation controller has a rich ReSTful API. REST stands for Representational State Transfer and is sometimes spelled as “ReST”. It relies on a stateless, client-server, and cacheable communications protocol, usually the HTTP protocol. REST APIs provide access to resources (data entities) via URI paths. You can visit the automation controller REST API in a web browser at: http://<server name>/api/...read more
Red Hat Enterprise Linux achieves significant performance gains with Intel's 4th Generation Xeon Scalable Processors
April 20, 2023 - Michey Mehta, Bill Gray, David Dumas, Douglas Shakshober
Intel recently launched the 4th generation of Intel® Xeon® Scalable processors, a family of high-end, enterprise-focused processors targeted at a diverse range of workloads. To explore how Intel’s new chips measure up, we’ve worked with Intel and others to run benchmarks with Red Hat Enterprise Linux 8.4, 8.5, 8.6, 9.0 and 9.1, as well as CentOS Stream 9.2 (which will become Red Hat Enterprise Linux 9.2)...read more
OpenShift/Kubernetes Chaos Stories
March 15, 2023 - Naga Ravi Chaitanya Elluri
With the increase in adoption and reliance on digital technology and microservices architecture, the uptime of an application has never been more important. Downtime of even a few minutes can lead to huge revenue loss and most importantly trust. This is exactly why we proactively focus on identifying bottlenecks and improving the resilience and performance of OpenShift under chaotic conditions…read more
Enhancing/Maximizing your Scaling capability with Automation Controller 2.3
March 13, 2023 - Nikhil Jain
Red Hat Ansible Automation Platform 2 is the next generation automation platform from Red Hat’s trusted enterprise technology experts. We are excited to announce that the Ansible Automation Platform 2.3 release includes automation controller 4.3…read more
Red Hat new Benchmark results on AMD EPYC4 (Genoa) processors
January 6, 2023 - Red Hat Performance Team
Red Hat has continued to work with our partners to better enable world class performance. Recently, AMD released its EPYC "Genoa" 4th Gen Data Center CPU, known as the AMD EPYC 9004 Series. With a die size of 5nm, AMD increased the core count to 96 cpu and 192 threads / socket with 384 MB L3 cache size…read more
A Guide to Scaling OpenShift Data Science to Hundreds of Users and Notebooks
December 13, 2022 - by Kevin Pouget
Red Hat OpenShift Data Science provides a fully managed cloud service environment for data scientists and developers of intelligent applications. It offers a fully supported environment in which to rapidly develop, train, and test machine learning (ML) models before deploying in production…read more
Run Windows workloads on OpenShift Container Platform
November 30, 2022 - by Krishna Harsha Voora, Venkata Anil Kommaddi, Sai Sindhur Malleni
OpenShift helps bring the power of cloud-native and containerization to your applications, no matter what underlying operating systems they rely on. For use cases that require both Linux and Windows workloads, Red Hat OpenShift allows you to deploy Windows workloads running on Windows server while also supporting traditional Linux workloads…read more
A Guide to Functional and Performance Testing of the NVIDIA DGX A100
June 23, 2022 - by Kevin Pouget
This blog post, part of a series on the DGX-A100 OpenShift launch, presents the functional and performance assessment we performed to validate the behavior of the DGX™ A100 system, including its eight NVIDIA A100 GPUs. This study was performed on OpenShift 4.9 with the GPU computing stack deployed by NVIDIA GPU Operator v1.9…read more
Scaling Automation Controller for API Driven Workloads
June 20, 2022 - Elijah Delee
When scaling automation controller in an enterprise organization, administrators are faced with more clients automating their interactions with its REST API. As with any web application, automation controller has a finite capacity to serve web requests, and web clients can…read more
Performance Improvements in Automation Controller 4.1
February 28, 2022 - Nikhil Jain
With the release of Ansible Automation Platform 2.1, users now have access to the latest control plane – automation controller 4.1. Automation controller 4.1 provides significant performance improvements when compared to its predecessor Ansible Tower 3.8. To put this into context, we used Ansible Tower 3.8 to run jobs, capture various metrics…read more
The Curious Case of the CPU Eating Gunicorn
June 2, 2022 - Gonza Rafuls
We decided to take a first try hands-on approach following the future QUADS roadmap and re-architect our legacy landing/requests portal application, previously a trusty LAMP stack, into a completely rewritten Flask / SQLAlchemy / Gunicorn / Nginx next-gen platform…read more
Entitlement-Free Deployment of the NVIDIA GPU Operator on OpenShift
December 14, 2021 - Kevin Pouget
Version 1.9.0 of the GPU Operator has just landed in OpenShift OperatorHub, with many different updates. We're proud to announce that this version comes with the support of the entitlement-free deployment of NVIDIA GPU Driver…read more
Red Hat collaborates with NVIDIA to deliver record-breaking STAC-A2 Market Risk benchmark
November 9, 2021 - Sebastian Jug
We are happy to announce a record-breaking performance with NVIDIA in the STAC-A2 benchmark, affirming Red Hat OpenShift's ability to run compute heavy, high performance workloads. The Securities Technology Analysis Center (STAC®) facilitates a large group of financial firms and technology vendors…read more
Red Hat Satellite 6.9 with Puma Web Server
September 15, 2021 - Imaanpreet Kaur
Until Red Hat Satellite 6.8, the Passenger web/app server was a core component of Red Hat Satellite. Satellite used Passenger to run Ruby applications such as Foreman. Satellite 6.9 is no longer using the Passenger web server. The Foreman application (main UI and API server) was ported to use the Puma project…read more
Using NVIDIA A100’s Multi-Instance GPU to Run Multiple Workloads in Parallel on a Single GPU
August 26, 2021 - Kevin Pouget
The new Multi-Instance GPU (MIG) feature lets GPUs based on the NVIDIA Ampere architecture run multiple GPU-accelerated CUDA applications in parallel in a fully isolated way. The compute units of the GPU, as well as its memory, can be partitioned into multiple MIG instances…read more
Multi-Instance GPU Support with the GPU Operator v1.7.0
June 15, 2021 - Kevin Pouget
Version 1.7.0 of the GPU Operator has just landed in OpenShift OperatorHub, with many different updates. We are proud to announce that this version comes with the support of the NVIDIA Multi-Instance GPU (MIG) feature for the A100 and A30 Ampere cards…read more
Making Chaos Part of Kubernetes/OpenShift Performance and Scalability Tests
March 17, 2021 - Naga Ravi Chaitanya Elluri
While we know how important performance and scale are, how can we engineer for it when chaos becomes common in complex systems? What role does Chaos/Resiliency testing play during Performance and Scalability evaluation? Let’s look at the methodology that we need to embrace to mimic a real world production environment to find the bottlenecks and fix them before it impacts the users and customers…read more
Demonstrating Performance Capabilities of Red Hat OpenShift for Running Scientific HPC Workloads
November 11, 2020 - David Gray and Kevin Pouget
This blog post is a follow-up to the previous blog post on running GROMACS on Red Hat OpenShift Container Platform (OCP) using the Lustre filesystem. In this post, we will show how we ran two scientific HPC workloads on a 38-node OpenShift cluster using CephFS with OpenShift Container Storage in external mode…read more
A Complete Guide for Running Specfem Scientific HPC Workload on Red Hat OpenShift
November 11, 2020 - Kevin Pouget
Specfem3D_Globe is a scientific high-performance computing (HPC) code that simulates seismic wave propagation, at a global or regional scale (website and repository). It relies on a 3D crustal model and takes into account parameters such as the Earth density, topography/bathymetry, rotation, oceans, or self-gravitation…read more
Running HPC workloads with Red Hat OpenShift Using MPI and Lustre Filesystem
October 29, 2020 - David Gray
The requirements associated with data science and AI/ML applications have pushed organizations toward using highly parallel and scalable hardware that often resemble high performance computing (HPC) infrastructure. HPC has been around for a while and has evolved to include ultra large supercomputers that run massively parallel tasks and operate at exascale (able to perform a billion billion operations per second)...read more
Introduction to Kraken, a Chaos Tool for OpenShift/Kubernetes
October 8, 2020 - Yashashree Suresh and Paige Rubendall
Chaos engineering helps in boosting confidence in a system's resilience by “breaking things on purpose.” While it may seem counterintuitive, it is crucial to deliberately inject failures into a complex system like OpenShift/Kubernetes and check whether the system recovers gracefully…read more
About the author
Red Hat Performance and Scale Engineering pushes Red Hat products to their limits. Every day we strive to reach greater performance for our customer workloads and scale the products to new levels. Our performance engineers benchmark configurations that range from far edge telco use cases to large scale cloud environments.
We work closely with developers early in the development process to validate that their software design will perform and scale well. We also collaborate with hardware and software partners to ensure that our software is performing and scaling well with their technology, and we engage with customers on innovative deployments where we can apply our expertise to help them get the best performance and scale for their workloads.
We work across the Red Hat product portfolio on a multitude of product configurations and use cases for large scale hybrid cloud environments—including edge-enabled solutions and products, next-generation 5G networks, software-defined vehicles and more.
More like this
Browse by channel
Automation
The latest on IT automation for tech, teams, and environments
Artificial intelligence
Updates on the platforms that free customers to run AI workloads anywhere
Open hybrid cloud
Explore how we build a more flexible future with hybrid cloud
Security
The latest on how we reduce risks across environments and technologies
Edge computing
Updates on the platforms that simplify operations at the edge
Infrastructure
The latest on the world’s leading enterprise Linux platform
Applications
Inside our solutions to the toughest application challenges
Original shows
Entertaining stories from the makers and leaders in enterprise tech
Products
- Red Hat Enterprise Linux
- Red Hat OpenShift
- Red Hat Ansible Automation Platform
- Cloud services
- See all products
Tools
- Training and certification
- My account
- Customer support
- Developer resources
- Find a partner
- Red Hat Ecosystem Catalog
- Red Hat value calculator
- Documentation
Try, buy, & sell
Communicate
About Red Hat
We’re the world’s leading provider of enterprise open source solutions—including Linux, cloud, container, and Kubernetes. We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.
Select a language
Red Hat legal and privacy links
- About Red Hat
- Jobs
- Events
- Locations
- Contact Red Hat
- Red Hat Blog
- Diversity, equity, and inclusion
- Cool Stuff Store
- Red Hat Summit