What is Kubeflow?

Published May 22, 2023•4-minute read

Kubeflow is a Kubernetes-native, open-source framework for developing, managing, and running machine learning (ML) workloads. Kubeflow is an AI/ML platform that brings together several tools covering the main AI/ML use cases: data exploration, data pipelines, model training, and model serving.

Kubeflow allows data scientists to access those capabilities via a portal which provides high-level abstractions to interact with these tools. This means data scientists do not need to be concerned about having to learn low-level details of how Kubernetes plugs into each of these tools. Kubeflow itself is specifically designed to run on Kubernetes and fully embraces many of the key concepts, including the operator model.

Explore Red Hat AI

Kubeflow solves many of the challenges involved in orchestrating machine learning pipelines by providing a set of tools and APIs that simplify the process of training and deploying ML models at scale. A “pipeline” denotes an ML workflow, including the components of the workflow and how those components interact. Kubeflow is able to accommodate the needs of multiple teams in one project and allows those teams to work from any infrastructure. This means that data scientists can train and serve ML models from the cloud of their choice, including IBM Cloud, Google Cloud, Amazon’s AWS, or Azure.

Overall, Kubeflow standardizes machine learning operations (MLOps) by organizing projects while leveraging the power of cloud computing. Some of the key use cases for Kubeflow include data preparation, model training, evaluation, optimization, and deployment.

Learn more about MLOPs on OpenShift

Kubernetes is key to accelerating the ML lifecycle as these technologies provide data scientists the much needed agility, flexibility, portability, and scalability to train, test, and deploy ML models.

Scalability: Kubernetes allows users to scale ML workloads up or down, depending on demand. This ensures that machine learning pipelines can accommodate large-scale processing and training without interfering with other elements of the project.

Efficiency: Kubernetes optimizes resource allocation by scheduling workloads onto nodes based on their availability and capacity. By ensuring that computing resources are being utilized with intention, users can expect a reduction in cost and an increase in performance.

Portability: Kubernetes provides a standardized, platform-agnostic environment that allows data scientists to develop one ML pipeline and deploy it across multiple environments and cloud platforms. This means not having to worry about compatibility issues and vendor lock-in.

Fault tolerance: With built-in fault tolerance and self-healing capabilities, users can trust Kubernetes to keep ML pipelines running even in the event of a hardware or software failure.

The Kubeflow Central Dashboard offers an authenticated web interface for accessing Kubeflow and its ecosystem components. Serving as a centralized hub, it aggregates the user interfaces of various tools and services within the cluster, providing a unified access point for managing your machine learning platform.
Kubeflow integrates with Jupyter Notebooks, providing an interactive environment for data exploration, experimentation, and model development. Notebooks support various programming languages, including Python, R, and Scala, and allow users to create and execute ML workflows in a collaborative and reproducible manner.
Kubeflow Pipelines enable users to define and execute complex ML workflows as directed acyclic graphs (DAGs). Kubeflow Pipelines provide a way to orchestrate and automate the end-to-end process of data preprocessing, model training, evaluation, and deployment which promotes reproducibility, scalability, and collaboration in ML projects. The Kubeflow Pipelines SDK is a collection of Python packages, allowing users to define and execute their machine learning workflows with precision and efficiency.
The Kubeflow Training Operator provides tools for training machine learning models at scale. This includes support for distributed training using frameworks like TensorFlow, PyTorch, and XGBoost. Users can leverage Kubernetes' scalability and resource management capabilities to train models efficiently across clusters of machines.
Kubeflow Serving allows users to deploy trained ML models as scalable, production-ready services. It provides a consistent interface for serving models using popular frameworks like TensorFlow Serving, Seldon Core, or custom inference servers. Models can be deployed in real-time or batch processing scenarios, serving predictions over HTTP endpoints.
Kubeflow Metadata is a centralized repository for tracking and managing metadata associated with ML experiments, runs, and artifacts. It provides a consistent view of ML metadata across the entire workflow, enabling reproducibility, collaboration, and governance in ML projects.

In addition, Kubeflow provides web-based user interfaces (UIs) for monitoring and managing ML experiments, model training jobs, and inference services. These UIs offer visualizations, metrics, and logs to help users track the progress of their ML workflows, troubleshoot issues, and make informed decisions.

As it embraces the Kubernetes operator model, Kubeflow is extensible and supports customization to adapt to specific use cases and environments. Users can integrate additional components, such as data preprocessing tools, feature stores, monitoring solutions, and external data sources, to enhance the capabilities of their ML workflows.

Red Hat’s close involvement with the Kubeflow community—from managing new releases and upgrading security features to developing design and future code contributions—helps us build better, stronger solutions that really work.

Red Hat® AI is our portfolio of AI products built on solutions our customers already trust. This foundation helps our products remain reliable, flexible, and scalable.

Red Hat AI can help organizations:

Adopt and innovate with AI quickly.
Break down the complexities of delivering AI solutions.
Deploy anywhere.

Explore Red Hat AI

Bridging Kubeflow and AI capabilities

Red Hat AI offers Granite family LLMs, bring-your-own model flexibility, and an integrated MLOps platform for managing the AI/ML lifecycle across hybrid cloud environments and the edge.

Red Hat OpenShift AI is based on KubeFlow pipelines: a visual editor for creating and automating data science pipelines and experimentation that helps developers streamline data exploration and model training, validation and storing.

The combination of data science pipelines and MLOps increases teams’ operational efficiency when delivering models and AI-enabled applications to development, testing, and production environments across the organization.

Learn how Red Hat OpenShift AI can help

Keep reading

What is explainable AI?

Explainable AI (XAI) techniques, applied during the machine learning (ML) lifecycle, make AI outputs more understandable and transparent to humans.

Agentic AI vs. generative AI

Agentic AI and generative AI explained: Learn how each works, their unique strengths, and how they can collaborate for smarter solutions.

How vLLM accelerates AI inference: 3 enterprise use cases

This article highlights 3 real-world examples of how well-known companies are successfully using vLLM.

What is Kubeflow?

Red Hat resources

Bridging Kubeflow and AI capabilities

The official Red Hat blog

All Red Hat product trials

Keep reading

What is explainable AI?

Agentic AI vs. generative AI

How vLLM accelerates AI inference: 3 enterprise use cases

Artificial intelligence resources

Platforms

Tools

Try, buy, & sell

Communicate

About Red Hat

Change page language

Red Hat legal and privacy links

Red Hat legal and privacy links