Efficiently running powerful AI and Large Language Models (LLMs) is a critical challenge, often leading to high operational costs and performance hurdles. Red Hat AI Inference Server provides effective strategies for organizations to optimize these demanding workloads, and serve optimized models faster. IT operations leaders, Platform engineers, AI developers, and Data Scientists can now leverage this enterprise-grade solution, powered by vLLM, the de facto open-source engine for high-performance inference, to maximize model throughput for any model on any accelerator, across the hybrid cloud.
Sign up this webinar to learn about:
- How to achieve consistent, fast, and cost-effective inference at scale with an enterprise-grade version of vLLM and built-in model optimization capabilities.
- How to quickly and efficiently serve models by leveraging Red Hat AI’s repository of third-party validated and optimized models on Hugging Face.
- How to optimize your custom models with the exact LLM Compressor tool that Red Hat uses to offer optimized versions of third-party models.
In this session you will learn how to enhance inference performance and accelerate your organization's AI ambitions across the hybrid cloud by:
- Using an enterprise-grade version of vLLM and built-in model optimization capabilities.
- Quickly and efficiently serving models by leveraging Red Hat AI’s repository of third-party validated and optimized models on Hugging Face.
- Optimizing your custom models with the exact LLM Compressor tool that Red Hat uses to offer optimized versions of third-party models.
Carlos Condado
Sr. Product Marketing Manager, Red Hat AI Business Unit
Erwan Gallen
Sr. Principal Product Manager, Red Hat AI Business Unit