We are excited to introduce our most recent validated models, designed to empower your deployments. At Red Hat, our goal is to provide the confidence, predictability, and flexibility organizations need to deploy third-party gen AI models across the Red Hat AI platform. This release expands our collection of performance-benchmarked and accuracy-evaluated optimized models, helping you accelerate time to value and select the perfect fit for your enterprise use case.

Red Hat AI’s validated models go beyond a simple list, providing  efficient, enterprise-ready AI. We combine rigorous performance benchmarking and accuracy testing with a comprehensive packaging process designed to deploy with security and simplicity in mind. Each model is scanned for vulnerabilities and integrated into a managed software lifecycle, helping ensure you receive a high-performing and resource-optimized asset that is focused on security, easy to manage, and ready for long-term updates.

What are validated models?

The world of large language models (LLMs) is expanding rapidly, making it difficult for enterprises to choose the right one. Organizations often struggle with AI resource capacity planning and ensuring that a model's performance can be reliably reproduced.

That's where Red Hat's validated models come in. We provide access to a set of ready-to-use, third-party models that run efficiently on vLLM within our platform. We simplify the selection process by performing extensive testing for you. Our model validation process includes:

  • Performance benchmarking using GuideLLM to assess resource requirements and cost on various hardware configurations.
  • Accuracy evaluations using Language Model Evaluation Harness (LM Eval Harness) to measure how models respond to new tasks.
  • Reproducible deployments on vLLM, the high-throughput inference engine, to ensure you can achieve the same results.
  • Security-focused, enterprise-ready packaging using standardized container formats in our production registry to create a version-controlled asset—scanned for vulnerabilities—that simplifies deployment and lifecycle management.

This process provides clear capacity planning guidance, empowering you to right-size deployments, select the optimal hardware, and get to production faster with confidence.

Red Hat’s model optimization capabilities

Deploying powerful LLMs is often limited by the high cost and scarcity of specialized hardware, such as high-VRAM GPUs. To help democratize access and enable enterprises to run these models more affordably—even on smaller or fewer GPUs—Red Hat applies advanced model compression techniques.

This critical optimization process, driven by technologies like LLM Compressor, involves techniques such as quantization (e.g., converting models to INT4, INT8, or FP8 Dynamic formats) that greatly reduce the memory footprint and compute requirements of LLMs while carefully preserving their output quality and accuracy.

The validated models you see in our collection—many of which are pre-compressed and ready for deployment—are examples of this optimization in action. By taking advantage of these assets, Red Hat enables you to:

  • Reduce VRAM usage, making it possible to serve larger models on less expensive or fewer GPU resources.
  • Decrease operational costs by maximizing hardware utilization.
  • Achieve higher throughput and lower latency during the critical inference phase.

These optimized, validated assets are readily available on our public Red Hat AI Hugging Face repository and within the Red Hat container registry at registry.redhat.io, providing a trusted source for deploying high-performance, cost-effective AI.

Meet the most recent validated models

The latest validated models feature a powerful and diverse lineup of models, each optimized and ready for your enterprise workloads.

  • DeepSeek-R1 INT4: An elite coding model ideal for generating, completing, and debugging complex code across multiple programming languages.
  • Qwen 3 8B FP8 Dynamic: A versatile and powerful multilingual model from Alibaba, designed for global chatbot applications and content creation.
  • Kimi K2 Quantized INT4: This model is known for its exceptionally large context window, making it a powerhouse for Retrieval-Augmented Generation (RAG) and analyzing lengthy documents like legal contracts or research papers.
  • Gemma-3n 4B FP8 Dynamic: Google's newest efficient models offer a balance of performance and size for summarization tasks and on-device applications.
  • openai/gpt-oss-120bopenai/gpt-oss-20b: Large (and smaller variant), general-purpose foundation models capable of complex reasoning, nuanced content generation, and advanced problem-solving.
  • Qwen3 Coder 480B-A35B-Instruct-FP8: A massive, enterprise-grade coding assistant designed for the most demanding software development and automation pipelines.
  • Voxtral-Mini-3B-2507 FP8 Dynamic: A nimble and responsive model focused on voice and speech, excellent for building real-time, voice-enabled applications and interactive agents.
  • whisper-large v3 INT4: A state-of-the-art speech-to-text model from OpenAI, designed for highly accurate audio transcription, creating meeting minutes, and enabling voice commands.
  • NVIDIA-Nemotron-Nano-9B-v2: A new general-purpose reasoning and chat model from NVIDIA, it uses a hybrid architecture for AI agent systems, chatbots, and RAG, and is commercially usable.

Get started today

You can access these powerful, deployment-ready AI models today in 2 ways:

  • Hugging Face: Explore the validated models and their details on the Red Hat AI repository.
  • Red Hat Container Registry: Pull the container images to deploy immediately on RHOAI 2.25 or RHAIIS 3.2.2. See the docs

Note: All models are optimized for deployment on vLLM (version 0.10.1.1 or later).

Coming soon

For even tighter integration, these models will be featured in the Red Hat OpenShift AI catalog beginning with the 3.0 release, with its general availability (GA) scheduled for November.

To view complete performance and evaluation data, please connect with your sales representative.

Resource

The adaptable enterprise: Why AI readiness is disruption readiness

This e-book, written by Michael Ferris, Red Hat COO and CSO, navigates the pace of change and technological disruption with AI that faces IT leaders today.

About the author

My name is Rob Greenberg, Principal Product Manager for Red Hat AI, and I came over to Red Hat with the Neural Magic acquisition in January 2025. Prior to joining Red Hat, I spent 3 years at Neural Magic building and delivering tools that accelerate AI inference with optimized, open-source models. I've also had stints as a Digital Product Manager at Rocketbook and as a Technology Consultant at Accenture.

UI_Icon-Red_Hat-Close-A-Black-RGB

Browse by channel

automation icon

Automation

The latest on IT automation for tech, teams, and environments

AI icon

Artificial intelligence

Updates on the platforms that free customers to run AI workloads anywhere

open hybrid cloud icon

Open hybrid cloud

Explore how we build a more flexible future with hybrid cloud

security icon

Security

The latest on how we reduce risks across environments and technologies

edge icon

Edge computing

Updates on the platforms that simplify operations at the edge

Infrastructure icon

Infrastructure

The latest on the world’s leading enterprise Linux platform

application development icon

Applications

Inside our solutions to the toughest application challenges

Virtualization icon

Virtualization

The future of enterprise virtualization for your workloads on-premise or across clouds