Red Hat Performance and Scale Engineering
Benchmarking AI inference on CPUs: A transparent blueprint for the enterprise
May 28, 2026 | Maryam Tahhan, John Harrigan, Anton Ivanov, Paul Power, Luigi Mario Zuccarelli
As enterprises look to optimize the total cost of ownership (TCO) of Large Language Model deployment, utilizing existing enterprise CPU infrastructure alongside GPU resources for specific inference workloads has become a strategic initiative. However, infrastructure teams attempting to validate this face a chaotic benchmarking landscape.
LogAn: Large-scale log analysis with small language models
May 28, 2026 | Rahul Shetty Aman Vishwakarma
Where Large Language Models (LLMs) meet logs, things can break down. Language models are remarkably good at understanding text. So the natural instinct when debugging a production outage is to dump the logs into an LLM and ask, "what went wrong?" It doesn't scale. This article explains why.
Performance improvements with speculative decoding in vLLM for gpt-oss
April 16, 2026 | Harshith Umesh
Learn how speculative decoding in vLLM boosts AI inference throughput without impacting output quality. This benchmark of gpt-oss-120B with Eagle3 shows lower latency, scalable performance gains, and up to 19% enterprise cost savings across multiple workloads.
Red Hat and NVIDIA: Setting standards for high-performance AI inference
Red Hat AI tops MLPerf Inference v6.0 with vLLM on Qwen3-VL, Whisper, and GPT-OSS-120B
Configure NVIDIA Blackwell GPUs for Red Hat AI workloads | Red Hat Developer
March 16, 2026 | Erwan Gallen, Tarun Kumar, Antonin Stefanutti, Selbi Nuryyeva, Michey Mehta
Learn how to enable the NVIDIA RTX PRO 4500 Blackwell Server Edition on Red Hat AI for compact, power-efficient AI deployments. This hardware offers inference performance without adding unnecessary operational complexity for Red Hat AI users.
5 steps to triage vLLM performance
March 9, 2026 | David Whyte-Gray, Thameem Abbas Ibrahim Bathusha, Michael Goin, Ashish Kamra
Learn how to improve the performance of your vLLM deployments with a diagnostic workflow that isolates latency issues and server saturation. Discover the key metrics to monitor and techniques to alleviate memory pressure.
Estimate GPU memory for LLM fine-tuning with Red Hat AI
March 4, 2026 | Mohib Azam
Learn how to estimate memory requirements for your LLM fine-tuning experiments using Red Hat Training Hub's memory_estimator.py API. This guide covers the memory components, adjusting training setups for specific GPU specifications, and using the memory estimator in your code. Streamline your model fine-tuning process with runtime estimates and automated hyperparameter suggestions.
How to deploy and benchmark vLLM with GuideLLM on Kubernetes
December 24, 2025 | Harshith Umesh
Learn how to deploy and test the inference capabilities of vLLM on OpenShift using GuideLLM, a specialized performance benchmarking tool.
Integrate a custom AI service with Red Hat Ansible Lightspeed
December 10, 2025 | Riya Sharma, Elijah DeLee
Get a step-by-step guide to integrating a custom AI service with Red Hat Ansible Lightspeed.
Artificial intelligence, Automation and management, Operators
Autoscaling vLLM with OpenShift AI model serving: Performance validation
November 26, 2025 | Alberto Perdomo
This performance analysis compares KServe's SLO-driven KEDA autoscaling approach against Knative's concurrency-based autoscaling for vLLM inference.
Deploy an LLM inference service on OpenShift AI
November 3, 2025 | Riya Sharma, Elijah DeLee
Learn how to deploy LLMs on Red Hat OpenShift AI for Ansible Lightspeed, enabling on-premise inference and optimizing resource utilization.
Efficient and reproducible LLM inference: Inside Red Hat’s MLPerf Inference v5.1 submissions
Krkn-AI: A feedback-driven approach to chaos engineering
October 20, 2025 | Rahul Shetty, Naga Ravi, Chaitanya Elluri
Krkn-AI automates AI-assisted, objective-driven chaos testing for Kubernetes. Discover how it addresses the challenges of reliability in modern systems.
Artificial intelligence, Automation and management, DevOps, Kubernetes, Microservices
Network performance in distributed training: Maximizing GPU utilization on OpenShift
October 16, 2025 | Tanya Osokin, Kevin Pouget, Michey Mehta
Maximize return on investment in GPU hardware by investing in the appropriate network infrastructure for high-performance distributed training on OpenShift.
vLLM or llama.cpp: Choosing the right LLM inference engine for your use case
September 30, 2025 | Harshith Umesh
See how vLLM’s throughput and latency compare to llama.cpp's and discover which tool is right for your specific deployment needs on enterprise-grade hardware.
How to set up KServe autoscaling for vLLM with KEDA
September 23, 2025 | Alberto Perdomo
Walk through how to set up KServe autoscaling by leveraging the power of vLLM, KEDA, and the custom metrics autoscaler operator in Open Data Hub.
Artificial intelligence, Automation and management, Kubernetes, Open source
Ollama vs. vLLM: A deep dive into performance benchmarking
August 8, 2025 | Harshith Umesh
Learn how vLLM outperforms Ollama in high-performance production deployments, delivering significantly higher throughput and lower latency.
How we improved AI inference on macOS Podman containers
June 5, 2025 | Kevin Pouget
Podman enables developers to run Linux containers on MacOS within virtual machines, including GPU acceleration for improved AI inference performance.
How to run performance and scale validation for OpenShift AI
April 30, 2025 | Alberto Perdomo, Kevin Pouget
Learn about the Red Hat OpenShift AI model fine-tuning stack and how to run performance and scale validation.
Performance boosts in vLLM 0.8.1: Switching to the V1 engine
April 28, 2025 | Robert Shaw, Thameem Abbas Ibrahim Bathusha
Krkn-AI automates AI-assisted, objective-driven chaos testing for Kubernetes. Discover how it addresses the challenges of reliability in modern systems.
Supercharge your AI with OpenShift AI and Redis: Unleash speed and scalability
Unlocking the Effective Context Length: Benchmarking the Granite-3.1-8b Model
RoCE multi-node AI training on Red Hat OpenShift
January 30, 2025 | Boaz Ben Shabat
Learn how to run distributed AI training on Red Hat OpenShift using RoCE with this step-by-step guide from manual setup to fully automated training.
Red Hat Enterprise Linux Performance Results on Intel® Xeon® 6 processors
RHEL for Real Time: CPU throttling and risks
March 26, 2026 | Dustin Black
This article discusses CPU throttling and risks and RHEL for Real Time, a platform for deadline-oriented applications and time-sensitive workloads.
Best Practice Configuration and Tuning for Linux and Windows VMs
In this guide we’ll walk through some critical configuration details and some further “last mile” tuning options when running workloads in both Linux and Windows VMs on OpenShift Virtualization.
A deep dive into OpenShift Container Platform 4.20 performance
January 15, 2026 | Simone Ferlin-Reiter, Raviteja Sahukari
Compare OVN-K, MACVLAN, and SR-IOV performance on OpenShift 4.20. See how control plane churn impacts data plane throughput and stability in telco environments.
How to run performance tests using benchmark-runner
November 18, 2025 | Robert Krawitz, Jenifer Abrams, Guoqing Li, Eli Battat
Learn how to run performance tests using benchmark-runner on Kubernetes and OpenShift pods and virtual machines.
Automation and management, Containers, Kubernetes, Virtualization
High Scale Performance Testing: Virt Density
November 17, 2025 | Jenifer Abrams
Learn how the OpenShift Virtualization Performance & Scale team tests large-scale clusters, uncovers bottlenecks, and validates performance with real-world scenarios and examples.
How to run I/O workloads on OpenShift Virtualization VMs
October 22, 2026 | Elvir Kuric
Learn how to run and analyze FIO I/O tests at scale on OpenShift Virtualization VMs to determine optimal storage backend performance.
A case study in Kubelet regression in OpenShift
October 20, 2025 | Vishnu Challa
This article discusses the findings of a case study of a Kubelet regression in Red Hat OpenShift during the 1.33 rebase.
How Red Hat has redefined continuous performance testing
October 15, 2025 | Joe Talerico
Learn more about the Red Hat OpenShift continuous performance testing journey, and why it’s important to integrate CPT into CI/CD pipelines.
Scaling OpenShift Network Policies: Results and Takeaways
August 11, 2025 | Venkata Anil Kommaddi
This blog post will delve into the results of the OpenShift network policies scale testing, evaluating the scalability of network policies and how scaling affects OVS flow programming latency, system resources, and overall performance.
Improving performance of multiple I/O threads for OpenShift Virtualization
OpenShift LACP bonding performance expectations
July 17, 2025 | Joe Talerico
Before you pick the NIC bond configuration for your on-premise OpenShift deployments, consider the Link Aggregation Control Protocol (LACP) bonding performance.
Feature Introduction: Multiple IOthreads for OpenShift Virtualization
June 23, 2025 | Jenifer Abrams
Spread VM disk I/O across multiple threads and queues to better use vCPUs and host CPUs during heavy workloads, improving throughput and overall VM performance.
Dynamic VM CPU Workload Rebalancing with Load Aware Descheduler
June 3, 2025 | Guoqing Li
Evaluate Load Aware Descheduler on OpenShift Virtualization in OCP 4.19, showing how CPU-based VM rebalancing improves cluster performance and resolves node utilization imbalances.
Evaluating memory overcommitment in OpenShift Virtualization
April 24, 2025 | Robert Krawitz
Explore how to configure and tune wasp-agent for controlled use of swap with VMs in Red Hat OpenShift, and the performance implications of memory overcommit.
Boost OpenShift database VM density with memory overcommit
April 28, 2025 | Sanjay Rao, Jenifer Abrams, Douglas Shakshober
Examine OpenShift Virtualization's ability to maintain workload continuity even in an oversubscribed environment, based on on a study of 4 popular databases.
Scalable Database Performance with OpenShift Virtualization, Out-of-the-Box
February 25, 2025 | Jenifer Abrams, Robert Krawitz, Peter Lauterbach, Sanjay Rao, Douglas Shakshober
This study proves database throughput in VMs on OpenShift with “out-of-the-box” defaults approaches bare metal performance, without any tuning required.
How to enable Ansible Lightspeed intelligent assistant
September 16, 2025 | Riya Sharma, Elijah DeLee
Learn how to deploy and test the inference capabilities of vLLM on OpenShift using GuideLLM, a specialized performance benchmarking tool.
Artificial intelligence, Automation and management, Serverless
Monitoring Red Hat Ansible Automation Platform using Performance Co-Pilot
Why your database benchmarking data is probably wrong (and how I fixed mine)
June 5, 2026 | Krishna Magar
We've all been there. You've spent hours architecting a performance test, convinced you're about to uncover groundbreaking insights. Here's how I identified and eliminated the hidden bottlenecks that were sabotaging my data.
Kubernetes
Extending the Chaos: A Guide to Building Custom Scenarios for Krkn
October 5, 2025 | Abhinav Sharma
Go beyond default tests. This guide teaches you how to build custom chaos engineering plugins for Krkn to find and fix hidden weaknesses in Kubernetes.
Kubernetes
BGP dynamic routing with Fast Data Path on RHOSO 18
August 27, 2025 | Pradipta Sahoo, Spoorthi K, Haresh Khandelwal
This is a performance evaluation of dynamic routing using OVN-BGP-Agent with Fast Data Path on Red Hat OpenStack Services on OpenShift v.18.
Unleash controlled chaos with krknctl
August 21, 2025 | Tullio Sebastiani
Discover how krknctl simplifies chaos engineering and empowers users to effectively test and build more resilient systems.
Automation and management, CI/CD, Containers, DevOps, DevSecOps, Linux, Kubernetes, Microservices, Security
Enhancing system resilience with Krkn chaos dashboard
August 14, 2025 | Varshini M
Discover how the chaos dashboard is pivotal to chaos engineering, enabling teams to build, test, and improve overall system resilience.
Scaling OpenShift Network Policies: Our Journey in Developing a Robust Workload Testing Tool
August 11, 2025 | Venkata Anil Kommaddi
Test OpenShift network policy scalability with a workload that generates ACL flows and measures enforcement latency by comparing connection success times to policy creation.