This session breaks down the journey of moving from fragmented deployments to a scalable, centralized AI service using Red Hat OpenShift AI (RHOAI). We will go behind the scenes of a real-world PoC to show how to deliver high-performance models efficiently across multiple departments, balancing developer speed with enterprise-grade governance.
Key Takeaways:
- The Model Decision Framework: How to identify and validate general-purpose models (e.g., Mistral 24B vs. 7B) using data-driven benchmarks to meet diverse business needs.
- Optimizing Infrastructure: Leveraging GPUs and vLLM to maximize performance through techniques like quantization and memory allocation tuning.
- Resource Sharing at Scale: Best practices for configuring OpenShift namespaces, RBAC, and quotas to allow multiple teams to share GPU resources.
- Enterprise Integration: Strategies for exposing optimized models via secure APIs.
Join us to learn how to transform your AI strategy from a series of experiments into a highly available service that can be leveraged by various different teams in your organization.
Speaker
Anne Faulhaber
Technical Account Manager, Red Hat