Virtual event

vLLM Office Hours #27: Intro to llm-d for scaling LLM inference on Kubernetes

動画公開日： 2025 年 6 月 12 日

In this session, we’ll introduce LLM-D, a new Kubernetes-native framework for distributed LLM inference co-designed with Inference Gateway (IGW) and built on vLLM. Learn how LLM-D simplifies horizontally scaling LLMs across multiple GPUs and nodes, supports efficient model sharding and routing, and enables dynamic workload distribution. We’ll walk through its architecture, how it integrates with vLLM, and what makes it ideal for production-scale AI systems. Join us to explore how LLM-D unlocks the next level of LLM serving performance and flexibility.

Agenda

Time	Session
2:00 - 3:00	vLLM Office Hours #27: Intro to llm-d for scaling LLM inference on Kubernetes

Michael Goin

vLLM Committer and Principal Software Engineer, Red Hat

Robert Shaw

vLLM Committer and Director of Engineering, Red Hat

Red Hat のアプローチ

製品

参加 & 学習

プラットフォーム・ソリューション

ユースケース

業種別ソリューション

クラウド・テクノロジーを見る

プラットフォーム製品

注目のコースと認定

試す & 買う

サービス & サポート

トレーニング & 認定

注目のコースと認定

コンサルティング

スキルを構築する

その他の学習方法

開発者向け

お客様向け

パートナー向け

信頼できるパートナーを活用してソリューションを構築する

トピックを探す

情報を見つける

詳しく知る

おすすめのコンテンツ

[[name]]

vLLM Office Hours #27: Intro to llm-d for scaling LLM inference on Kubernetes

Michael Goin

Robert Shaw

プラットフォーム

ツール

試用、購入、販売

コミュニケーション

Red Hat について

Change page language

Red Hat legal and privacy links

Red Hat legal and privacy links