OVERVIEW
Efficiently running powerful AI and Large Language Models (LLMs) is a critical challenge, often leading to high operational costs and performance hurdles. Red Hat AI Inference Server provides effective strategies for organizations to optimize these demanding workloads, and serve optimized models faster. IT operations leaders, Platform engineers, AI developers, and Data Scientists can now leverage this enterprise-grade solution, powered by vLLM, the de facto open-source engine for high-performance inference, to maximize model throughput for any model on any accelerator, across the hybrid cloud.
Sign up this webinar to learn about:
- How to achieve consistent, fast, and cost-effective inference at scale with an enterprise-grade version of vLLM and built-in model optimization capabilities
- How to quickly and efficiently serve models by leveraging Red Hat AI’s repository of third-party validated and optimized models on Hugging Face
- How to optimize your custom models with the exact LLM Compressor tool that Red Hat uses to offer optimized versions of third-party models
In this session you will learn how to enhance inference performance and accelerate your organization's AI ambitions across the hybrid cloud by:
- Using an enterprise-grade version of vLLM and built-in model optimization capabilities.
- Quickly and efficiently serving models by leveraging Red Hat AI’s repository of third-party validated and optimized models on Hugging Face
- Optimizing your custom models with the exact LLM Compressor tool that Red Hat uses to offer optimized versions of third-party models
Any questions? please contact Elisa Navarro
Carlos Condado
Senior Product Marketing Manager, Red Hat AI
Carlos Condado, Senior Product Marketing Manager for Red Hat AI, possesses over a decade of experience bridging business value and technical innovation. His deep understanding of the practical challenges in operationalizing enterprise solutions will help experience a clear understanding on how to positively impact your organization's AI strategies by optimizing inference.
Erwan Gallen
Senior Principal Product Manager, Generative AI, Red Hat
Erwan Gallen is Senior Principal Product Manager, Generative AI, at Red Hat, where he follows Red Hat AI Inference Server product and manages hardware-accelerator enablement across OpenShift, RHEL AI, and OpenShift AI. His remit covers strategy, roadmap, and lifecycle management for GPUs, NPUs, and emerging silicon, ensuring customers can run state-of-the-art generative workloads seamlessly in hybrid clouds