Webinar

Large-scale data processing with Docling and Ray Data on Red Hat OpenShift AI

Jump to section

Building generative AI applications like retrieval-augmented generation (RAG) seems straightforward in demos, but in production reality, it's harsh when teams must process tens of thousands of complex PDFs documents packed with tables, multi-column layouts, and charts before a single question can be answered. 

Many AI projects spend the majority of their time wrestling with data preparation rather than building intelligent applications. In this webinar, we show how Red Hat OpenShift AI provides a unified platform for large-scale document processing by combining two powerful open source projects: Docling for high-fidelity document parsing and Ray Data for distributed streaming execution. 

OpenShift AI orchestrates Ray clusters through KubeRay, dynamically scaling from a handful of nodes to hundreds while coordinating CPU-intensive parsing and GPU-accelerated embedding in a single, streamlined pipeline. 

We explore real-world scenarios, from interactive "chat with my docs" applications to batch processing of organizational knowledge bases with thousands of documents, and show how this architecture supports autonomous, multi-step agentic AI workflows at enterprise scale. The session includes a live demo covering the end-to-end flow: ingesting raw documents, parsing with Docling across distributed Ray workers, generating embeddings, and querying results through a RAG application, all running on OpenShift AI. Whether you are modernizing document-heavy workflows in financial services, healthcare, legal, or government, you will leave with a clear picture of what is possible when open source innovation meets enterprise-grade AI infrastructure.

In this demo session, our Red Hat experts will walk through how to: 

  • Simplify large-scale distributed document processing with Ray Data and Docling on OpenShift AI.
  • Orchestrate Ray clusters with KubeRay for dynamic scaling across CPU and GPU workloads. Build both real-time and batch document ingestion pipelines using proven architectural patterns.
  • Extend this foundation to support agentic AI and retrieval-augmented fine-tuning (RAFT) workflows. 

See a live end-to-end demo from raw document ingestion to RAG-powered querying, all running on OpenShift AI.

 

Live event date: Wednesday, June 3, 2026 | 10:00 a.m. ET

On-demand event: Available for one year afterward


 

Speakers


Ana Biazetti

Ana Biazetti

Senior Principal Software Engineer, Red Hat

Ana Biazetti is a senior architect at Red Hat Openshift AI product organization, focusing on Model Customization, Fine Tuning and Distributed Training.

Robbie Jerrom

Robbie Jerrom

Senior Principal Technologist AI, Red Hat

As a principal technologist for AI at Red Hat with over 30 years of experience, Robbie works to support enterprise AI adoption through open source innovation. His focus is on cloud-native technologies, Kubernetes, and AI platforms, helping to deliver scalable and secure solutions using Red Hat AI.

Robbie is deeply committed to open source, open source AI, and open data, believing in the power of transparency, collaboration, and inclusivity to advance technology in meaningful ways. His work involves exploring private generative AI, traditional machine learning, and enhancing platform capabilities to support open and hybrid cloud solutions for AI. His focus is on helping organizations adopt ethical and sustainable AI technologies that make a real impact.