Virtual event

vLLM Office Hours #27: Intro to llm-d for scaling LLM inference on Kubernetes

Jump to section

In this session, we’ll introduce LLM-D, a new Kubernetes-native framework for distributed LLM inference co-designed with Inference Gateway (IGW) and built on vLLM. Learn how LLM-D simplifies horizontally scaling LLMs across multiple GPUs and nodes, supports efficient model sharding and routing, and enables dynamic workload distribution. We’ll walk through its architecture, how it integrates with vLLM, and what makes it ideal for production-scale AI systems. Join us to explore how LLM-D unlocks the next level of LLM serving performance and flexibility.

Agenda
 

TimeSession
2:00 - 3:00vLLM Office Hours #27: Intro to llm-d for scaling LLM inference on Kubernetes

Michael Goin

Michael Goin

vLLM Committer and Principal Software Engineer, Red Hat

Robert Shaw

vLLM Committer and Director of Engineering, Red Hat