Red Hat Performance and Scale Engineering

Analyses from Red Hat Performance and Scale Engineering validate scalability, expose bottlenecks, and optimize Kubernetes, AI, and hybrid cloud workloads.
Red Hat AI
Illustration of icon with circuit lines extending outward

Benchmarking AI inference on CPUs: A transparent blueprint for the enterprise

May 28, 2026    |    Maryam Tahhan, John Harrigan, Anton Ivanov, Paul Power, Luigi Mario Zuccarelli

As enterprises look to optimize the total cost of ownership (TCO) of Large Language Model deployment, utilizing existing enterprise CPU infrastructure alongside GPU resources for specific inference workloads has become a strategic initiative. However, infrastructure teams attempting to validate this face a chaotic benchmarking landscape.

AI inference, Artificial intelligence

Illustration of cloud icons above server

LogAn: Large-scale log analysis with small language models

May 28, 2026    |    Rahul Shetty Aman Vishwakarma

Where Large Language Models (LLMs) meet logs, things can break down. Language models are remarkably good at understanding text. So the natural instinct when debugging a production outage is to dump the logs into an LLM and ask, "what went wrong?" It doesn't scale. This article explains why.

Artificial intelligence, Automation and management

Illustration of icon with circuit lines extending outward

Performance improvements with speculative decoding in vLLM for gpt-oss

April 16, 2026    |    Harshith Umesh

Learn how speculative decoding in vLLM boosts AI inference throughput without impacting output quality. This benchmark of gpt-oss-120B with Eagle3 shows lower latency, scalable performance gains, and up to 19% enterprise cost savings across multiple workloads.

Artificial intelligence

alt text

Red Hat and NVIDIA: Setting standards for high-performance AI inference

April 2, 2026
  |  
Discover how Red Hat and NVIDIA drove industry-leading AI inference results in the MLPerf Inference v6.0 benchmarks through deep engineering co-design. Learn more about our top-tier throughput and latency results across vision, reasoning, and speech models on NVIDIA AI infrastructure.
alt text

Red Hat AI tops MLPerf Inference v6.0 with vLLM on Qwen3-VL, Whisper, and GPT-OSS-120B

Red Hat AI posted top MLPerf Inference v6.0 scores on Whisper, Qwen3-VL, and GPT-OSS-120B using vLLM, llm-d, and OpenShift AI across NVIDIA and AMD GPUs.
Illustration of star

Configure NVIDIA Blackwell GPUs for Red Hat AI workloads | Red Hat Developer

March 16, 2026    |    Erwan Gallen, Tarun Kumar, Antonin Stefanutti, Selbi Nuryyeva, Michey Mehta

Learn how to enable the NVIDIA RTX PRO 4500 Blackwell Server Edition on Red Hat AI for compact, power-efficient AI deployments. This hardware offers inference performance without adding unnecessary operational complexity for Red Hat AI users.

Artificial intelligence, Containers, Edge computing

Illustration of icon with circuit lines extending outward

5 steps to triage vLLM performance

March 9, 2026    |    David Whyte-Gray, Thameem Abbas Ibrahim Bathusha, Michael Goin, Ashish Kamra

Learn how to improve the performance of your vLLM deployments with a diagnostic workflow that isolates latency issues and server saturation. Discover the key metrics to monitor and techniques to alleviate memory pressure.

Red Hat AI, Inference Server, Red Hat AI

Illustration of star

Estimate GPU memory for LLM fine-tuning with Red Hat AI

March 4, 2026    |    Mohib Azam

Learn how to estimate memory requirements for your LLM fine-tuning experiments using Red Hat Training Hub's memory_estimator.py API. This guide covers the memory components, adjusting training setups for specific GPU specifications, and using the memory estimator in your code. Streamline your model fine-tuning process with runtime estimates and automated hyperparameter suggestions.

Artificial intelligence, Data science, Python

Illustration of 2 people icons connected by dots

How to deploy and benchmark vLLM with GuideLLM on Kubernetes

December 24, 2025    |    Harshith Umesh

Learn how to deploy and test the inference capabilities of vLLM on OpenShift using GuideLLM, a specialized performance benchmarking tool.

Artificial intelligence, Kubernetes, Virtualization

Repeating pattern illustration of graph and lightbulb

Integrate a custom AI service with Red Hat Ansible Lightspeed

December 10, 2025    |    Riya Sharma, Elijah DeLee

Get a step-by-step guide to integrating a custom AI service with Red Hat Ansible Lightspeed.

Artificial intelligence, Automation and management, Operators

Illustration of star

Autoscaling vLLM with OpenShift AI model serving: Performance validation

November 26, 2025    |    Alberto Perdomo

This performance analysis compares KServe's SLO-driven KEDA autoscaling approach against Knative's concurrency-based autoscaling for vLLM inference.

Artificial intelligence

Repeating pattern illustration of graph and lightbulb

Deploy an LLM inference service on OpenShift AI

November 3, 2025    |    Riya Sharma, Elijah DeLee

Learn how to deploy LLMs on Red Hat OpenShift AI for Ansible Lightspeed, enabling on-premise inference and optimizing resource utilization.

Artificial intelligence, Automation and management

alt text

Efficient and reproducible LLM inference: Inside Red Hat’s MLPerf Inference v5.1 submissions

As generative AI (gen AI) workloads become central to enterprise applications, benchmarking their inference performance has never been more critical for understanding the limits of their capabilities.
Illustration of icon with circuit lines extending outward

Krkn-AI: A feedback-driven approach to chaos engineering

October 20, 2025    |    Rahul Shetty, Naga Ravi, Chaitanya Elluri

Krkn-AI automates AI-assisted, objective-driven chaos testing for Kubernetes. Discover how it addresses the challenges of reliability in modern systems.

Artificial intelligence, Automation and management, DevOps, Kubernetes, Microservices

Illustration of star

Network performance in distributed training: Maximizing GPU utilization on OpenShift

October 16, 2025    |    Tanya Osokin, Kevin Pouget, Michey Mehta

Maximize return on investment in GPU hardware by investing in the appropriate network infrastructure for high-performance distributed training on OpenShift.

Artificial intelligence, Containers, Kubernetes

Illustration of star

vLLM or llama.cpp: Choosing the right LLM inference engine for your use case

September 30, 2025    |    Harshith Umesh

See how vLLM’s throughput and latency compare to llama.cpp's and discover which tool is right for your specific deployment needs on enterprise-grade hardware.

Artificial intelligence

Repeating pattern illustration of graph and lightbulb

How to set up KServe autoscaling for vLLM with KEDA

September 23, 2025    |    Alberto Perdomo

Walk through how to set up KServe autoscaling by leveraging the power of vLLM, KEDA, and the custom metrics autoscaler operator in Open Data Hub.

Artificial intelligence, Automation and management, Kubernetes, Open source

Illustration of icon with circuit lines extending outward

Ollama vs. vLLM: A deep dive into performance benchmarking

August 8, 2025    |    Harshith Umesh

Learn how vLLM outperforms Ollama in high-performance production deployments, delivering significantly higher throughput and lower latency.

Artificial intelligence, Open source

Illustration of star

How we improved AI inference on macOS Podman containers

June 5, 2025    |    Kevin Pouget

Podman enables developers to run Linux containers on MacOS within virtual machines, including GPU acceleration for improved AI inference performance.

Containers, Developer tools, Virtualization

Repeating pattern illustration of graph and lightbulb

How to run performance and scale validation for OpenShift AI

April 30, 2025    |    Alberto Perdomo, Kevin Pouget

Learn about the Red Hat OpenShift AI model fine-tuning stack and how to run performance and scale validation.

Artificial intelligence

Illustration of star

Performance boosts in vLLM 0.8.1: Switching to the V1 engine

April 28, 2025    |    Robert Shaw, Thameem Abbas Ibrahim Bathusha

Krkn-AI automates AI-assisted, objective-driven chaos testing for Kubernetes. Discover how it addresses the challenges of reliability in modern systems.

Artificial intelligence, Open source

alt text

Supercharge your AI with OpenShift AI and Redis: Unleash speed and scalability

April 4, 2025
  |  
Since the birth of large language models (LLMs) and the release of ChatGPT, artificial intelligence (AI) has gone from being an out-of-reach concept to showing real promise in the business landscape for every industry and business.
alt text

Unlocking the Effective Context Length: Benchmarking the Granite-3.1-8b Model

February 12, 2025
  |  
Exploring the effective context length of the Granite-3.1-8b instruct model and validating its capabilities across various tasks.
Illustration of icon with circuit lines extending outward

RoCE multi-node AI training on Red Hat OpenShift

January 30, 2025    |    Boaz Ben Shabat

Learn how to run distributed AI training on Red Hat OpenShift using RoCE with this step-by-step guide from manual setup to fully automated training.

Artificial intelligence, Operators

Red Hat Enterprise Linux
alt text

Red Hat Enterprise Linux Performance Results on Intel® Xeon® 6 processors

May 21, 2025
  |  
Red Hat Enterprise Linux 10 leverages features of Intel® Xeon® 6 processors including higher cpu count, and faster DDR5 memory.
3D Illustration of server

RHEL for Real Time: CPU throttling and risks

March 26, 2026    |    Dustin Black

This article discusses CPU throttling and risks and RHEL for Real Time, a platform for deadline-oriented applications and time-sensitive workloads.

Developer productivity, Linux, Observability

Red Hat Openshift
Repeating pattern of servers stacking connected to clouds by lines

Best Practice Configuration and Tuning for Linux and Windows VMs

May 6, 2026 | Jenifer Abrams, Joe Mario

In this guide we’ll walk through some critical configuration details and some further “last mile” tuning options when running workloads in both Linux and Windows VMs on OpenShift Virtualization.

Virtualization

Illustration of vitualization box

A deep dive into OpenShift Container Platform 4.20 performance

January 15, 2026    |    Simone Ferlin-Reiter, Raviteja Sahukari

Compare OVN-K, MACVLAN, and SR-IOV performance on OpenShift 4.20. See how control plane churn impacts data plane throughput and stability in telco environments.

APIs, Containers, Kubernetes

Repeating pattern of server box and horizontal lines

How to run performance tests using benchmark-runner

November 18, 2025    |    Robert Krawitz, Jenifer Abrams, Guoqing Li, Eli Battat

Learn how to run performance tests using benchmark-runner on Kubernetes and OpenShift pods and virtual machines.

Automation and management, Containers, Kubernetes, Virtualization

Illustration of 2 people icons connected by dots

High Scale Performance Testing: Virt Density

November 17, 2025    |    Jenifer Abrams

Learn how the OpenShift Virtualization Performance & Scale team tests large-scale clusters, uncovers bottlenecks, and validates performance with real-world scenarios and examples.

APIs, Containers, Kubernetes

Repeating pattern of server box and horizontal lines

How to run I/O workloads on OpenShift Virtualization VMs

October 22, 2026    |    Elvir Kuric

Learn how to run and analyze FIO I/O tests at scale on OpenShift Virtualization VMs to determine optimal storage backend performance.

Linux, Kubernetes, Virtualization

Illustration of cloud icon

A case study in Kubelet regression in OpenShift

October 20, 2025    |    Vishnu Challa

This article discusses the findings of a case study of a Kubelet regression in Red Hat OpenShift during the 1.33 rebase.

GitOps, Kubernetes

Illustration of automation icon

How Red Hat has redefined continuous performance testing

October 15, 2025    |    Joe Talerico

Learn more about the Red Hat OpenShift continuous performance testing journey, and why it’s important to integrate CPT into CI/CD pipelines.

Automation and management, CI/CD

Illustration of cloud surrounded by dots

Scaling OpenShift Network Policies: Results and Takeaways

August 11, 2025    |    Venkata Anil Kommaddi

This blog post will delve into the results of the OpenShift network policies scale testing, evaluating the scalability of network policies and how scaling affects OVS flow programming latency, system resources, and overall performance.

Kubernetes

alt text

Improving performance of multiple I/O threads for OpenShift Virtualization

July 31, 2025
  |  
Red Hat OpenShift Virtualization 4.19 significantly improves performance and speed for I/O intensive workloads like databases.
Illustration of multiple 3 dimensional spheres

OpenShift LACP bonding performance expectations

July 17, 2025    |    Joe Talerico

Before you pick the NIC bond configuration for your on-premise OpenShift deployments, consider the Link Aggregation Control Protocol (LACP) bonding performance.

Kubernetes

Illustration of vitualization box

Feature Introduction: Multiple IOthreads for OpenShift Virtualization

June 23, 2025    |    Jenifer Abrams

Spread VM disk I/O across multiple threads and queues to better use vCPUs and host CPUs during heavy workloads, improving throughput and overall VM performance.

Virtualization

Repeating pattern of server box and horizontal lines

Dynamic VM CPU Workload Rebalancing with Load Aware Descheduler

June 3, 2025    |    Guoqing Li

Evaluate Load Aware Descheduler on OpenShift Virtualization in OCP 4.19, showing how CPU-based VM rebalancing improves cluster performance and resolves node utilization imbalances.

Virtualization

Illustration of vitualization box

Evaluating memory overcommitment in OpenShift Virtualization

April 24, 2025    |    Robert Krawitz

Explore how to configure and tune wasp-agent for controlled use of swap with VMs in Red Hat OpenShift, and the performance implications of memory overcommit.

Virtualization

Repeating pattern of server box and horizontal lines

Boost OpenShift database VM density with memory overcommit

April 28, 2025    |    Sanjay Rao, Jenifer Abrams, Douglas Shakshober

Examine OpenShift Virtualization's ability to maintain workload continuity even in an oversubscribed environment, based on on a study of 4 popular databases.

Databases, Virtualization

Illustration of vitualization box

Scalable Database Performance with OpenShift Virtualization, Out-of-the-Box

February 25, 2025    |    Jenifer Abrams, Robert Krawitz, Peter Lauterbach, Sanjay Rao, Douglas Shakshober

This study proves database throughput in VMs on OpenShift with “out-of-the-box” defaults approaches bare metal performance, without any tuning required.

Databases, Virtualization

Red Hat Ansible Automation Platform
Illustration of icon with circuit lines extending outward

How to enable Ansible Lightspeed intelligent assistant

September 16, 2025    |    Riya Sharma, Elijah DeLee

Learn how to deploy and test the inference capabilities of vLLM on OpenShift using GuideLLM, a specialized performance benchmarking tool.

Artificial intelligence, Automation and management, Serverless

alt text

Monitoring Red Hat Ansible Automation Platform using Performance Co-Pilot

January 30, 2025
  |  
In this article, you’ll learn about the Performance Co-Pilot (PCP) tool and how we take advantage of it to implement system and application monitoring for Red Hat Ansible Automation Platform.
Hybrid Cloud Management
3D Illustration of server

Why your database benchmarking data is probably wrong (and how I fixed mine)

June 5, 2026    |    Krishna Magar

We've all been there. You've spent hours architecting a performance test, convinced you're about to uncover groundbreaking insights. Here's how I identified and eliminated the hidden bottlenecks that were sabotaging my data.

Kubernetes

Illustration of multiple 3 dimensional spheres

Extending the Chaos: A Guide to Building Custom Scenarios for Krkn

October 5, 2025    |    Abhinav Sharma

Go beyond default tests. This guide teaches you how to build custom chaos engineering plugins for Krkn to find and fix hidden weaknesses in Kubernetes.

Kubernetes

Repeating pattern illustration of clouds

BGP dynamic routing with Fast Data Path on RHOSO 18

August 27, 2025    |    Pradipta Sahoo, Spoorthi K, Haresh Khandelwal

This is a performance evaluation of dynamic routing using OVN-BGP-Agent with Fast Data Path on Red Hat OpenStack Services on OpenShift v.18.

Edge computing

Illustration of automation icon

Unleash controlled chaos with krknctl

August 21, 2025    |    Tullio Sebastiani

Discover how krknctl simplifies chaos engineering and empowers users to effectively test and build more resilient systems.

Automation and management, CI/CD, Containers, DevOps, DevSecOps, Linux, Kubernetes, Microservices, Security

Repeating pattern illustration of bubbles and computer monitor connecting

Enhancing system resilience with Krkn chaos dashboard

August 14, 2025    |    Varshini M

Discover how the chaos dashboard is pivotal to chaos engineering, enabling teams to build, test, and improve overall system resilience.

Kubernetes

Illustration of computer monitor

Scaling OpenShift Network Policies: Our Journey in Developing a Robust Workload Testing Tool

August 11, 2025    |    Venkata Anil Kommaddi

Test OpenShift network policy scalability with a workload that generates ACL flows and measures enforcement latency by comparing connection success times to policy creation.

Developer productivity, Developer tools, Kubernetes