E-book

Open source data pipelines for intelligent applications

Learn how open source platforms, including Kubernetes and Ceph, can help you increase data agility for AI/ML

Scalable data processing has allowed organizations to derive value from their data in ways that were unimaginable only a decade ago. Recent advances in artificial intelligence and machine learning are poised to provide organizations benefits of a similar or greater magnitude. The platforms that lend to effective processing of data at scale are evolving to give organizations greater flexibility and reduce the time to meaningful insights. Organizations today are already reaping the benefits of these approaches for applications such as clickstream analysis, recommendation engines, fraud detection, and genome sequencing.

Hybrid cloud platforms are quickly replacing vertically integrated frameworks while specialized data processing infrastructure is giving way to more general purpose infrastructure that solves similar problems for a broader application landscape. For example, scheduling is shifting from YARN to Kubernetes, storage is shifting from HDFS to distributed logs and shared object storage, and instead of contorting all problems to MapReduce we now have a veritable cornucopia of data processing, analytic, and machine learning engines, frameworks, and programming models.

This report provides data engineers and scientists insight into how Kubernetes provides a platform for building data platforms that increase an organization’s data agility. The execution environment for today’s applications and application architectures is not a single system, instead it is a system of systems. Kubernetes tackles many of the challenges inherent to deploying applications and applications architectures in a distributed way, including but not limited to service scheduling and discovery, batch execution, load balancing, and self-healing. By ensuring data continuity from the device edge, and at core sites from datacenter to cloud, data scientists are able to create, update, and enact upon data throughout the life cycle, reducing time to meaningful insight and driving more business value.

Readers will discover:

  • How data platforms are evolving to meet the flexibility and agility required by today’s organizations.
  • How Kubernetes has changed the way we process big data and why businesses must adapt.
  • How to design scalable data storage and artificial intelligence applications for private, public, and multicloud infrastructures.

Download the e-book to learn more.