What is llm-d?

URL 복사

llm-d is a Kubernetes-native, open source framework that speeds up distributed large language model (LLM) inference at scale. 

This means when an AI model receives complicated queries with a lot of data, llm-d provides a framework that makes processing faster. 

llm-d was created by Google, NVIDIA, IBM Research, and CoreWeave. Its open source community contributes updates to improve the technology.

How Red Hat AI speeds up inference

LLM prompts can be complex and nonuniform. They typically require extensive computational resources and storage to process large amounts of data. 

llm-d has a modular architecture that can support the increasing resource demands of sophisticated and larger reasoning models like LLMs

A modular architecture allows all the different parts of the AI workload to work either together or separately, depending on the model's needs. This helps the model inference faster.

Imagine llm-d is like a marathon race: Each runner is in control of their own pace. You may cross the finish line at a different time than others, but everyone finishes when they’re ready. If everyone had to cross the finish line at the same time, you’d be tied to various unique needs of other runners, like endurance, water breaks, or time spent training. That would make things complicated. 

A modular architecture lets pieces of the inference process work at their own pace to reach the best result as quickly as possible. It makes it easier to fix or update specific processes independently, too.

This specific way of processing models allows llm-d to handle the demands of LLM inference at scale. It also empowers users to go beyond single-server deployments and use generative AI (gen AI) inference across the enterprise.

How does distributed inference work?  

The llm-d modular architecture is made up of: 

  • Kubernetes: an open source container-orchestration platform that automates many of the manual processes involved in deploying, managing, and scaling containerized applications.
  • vLLM: an open source inference server that speeds up the outputs of gen AI applications.
  • Inference Gateway (IGW): a Kubernetes Gateway API extension that hosts features like model routing, serving priority, and “smart” load-balancing capabilities. 

This accessible, modular architecture makes llm-d an ideal platform for distributed LLM inference at scale.

What is operationalized AI?

AI 기술 구현의 4가지 핵심 고려 사항

Blog post

llm-d의 개념, 그리고 llm-d가 필요한 이유

최근 주목을 끄는 추세가 있습니다. 그것은 바로 대규모 언어 모델(Large Language Model, LLM) 인프라를 사내에 도입하는 조직이 늘고 있다는 사실입니다.

적응형 엔터프라이즈: AI 준비성은 곧 위기 대응력

Red Hat의 COO 겸 CSO인 Michael Ferris가 쓴 이 e-Book은 오늘날 IT 리더들이 직면한 AI의 변화와 기술적 위기의 속도를 살펴봅니다.

추가 자료

AI 보안이란?

AI 보안은 AI 워크로드를 취약하게 만들거나, 데이터를 변조하거나, 민감한 정보를 탈취하는 것을 목적으로 하는 악의적인 공격으로부터 AI 애플리케이션을 방어합니다.

vLLM이란?

vLLM은 언어 모델이 계산을 더욱 효율적으로 수행할 수 있게 돕는 오픈소스 코드의 집합입니다.

설명 가능한 AI란?

설명 가능한 AI(XAI) 기술은 머신 러닝(ML) 라이프사이클 중에 적용되어, 사람이 더 잘 이해할 수 있고 투명한 AI 출력을 생성합니다.

AI/ML 리소스

주요 제품

  • Red Hat AI

    하이브리드 클라우드 인프라에서 AI 솔루션의 개발과 배포를 가속화하는 유연한 솔루션.

관련 기사