Order from chaos: Red Hat and Starburst come together to simplify data access

23 novembre 2020Michael St-Jean4 minuti (tempo di lettura)

Enterprises rely on data to bring order to their organizations through automation, business process management and optimization, and increased intelligence that leads to better decision making. Yet data can be difficult to access, especially when it exists in many places.

Today, data can be found in data centers, the cloud, vendor environments, and in traditional and software-defined data sources. Data ingested from the network edge may be aggregated at remote locations, transactional databases and data warehouses typically live in the core datacenter, while cloud-native applications generally store data in a private and/or public cloud. Data stores can be found in distributed, hybrid cloud, traditional, and modern applications—in many cases within the same organization.

These organizations want to be able to access and analyze this data, from any source, at any scale, and across any infrastructure. They want a distributed query engine that allows them to access data no matter where it resides, whatever form it takes, and whenever they need it.

That’s why we’re thrilled to announce that Starburst Enterprise for Presto, an open source distributed SQL query engine, can now be deployed on Red Hat OpenShift, Red Hat’s industry-leading enterprise Kubernetes platform. Now, you can execute fast queries across your data lake, and can even federate queries across different sources.

Accessing data wherever, whatever, whenever

Traditional data warehouse products approach data silo challenges with outdated, monolithic solutions that breed inefficiency and, ultimately, don’t help business analysts run analytics quickly on their data. This prevents organizations from making better and more timely decisions that can improve performance and competitiveness.

Starburst Enterprise for Presto addresses these data silos and speed of access challenges. The engine harnesses the value of open source Presto and adds enterprise-grade tools and 24-hour support to help organizations meet their big data needs at scale. It provides distributed query support for popular data sources including Apache Cassandra, Apache Hive (HDFS), S3-compatible storage (Object buckets), Microsoft SQL Server, MySQL, and PostgreSQL.

Let’s say your billing app is using PostgreSQL, but your general ledger is consolidated in SQL Server. You need to be able to query both. Normally, you might extract, transfer, and load (ETL) and then pipeline all of the data you need into a data warehouse.

With Starburst Enterprise for Presto you can perform federated data queries across your different data sources—whether structured, semi-structured, or unstructured—even using different protocols. You can also perform in-situ data analysis across various file systems, databases, and object stores delivered in storage platforms such as Red Hat OpenShift Container Storage, Red Hat Ceph Storage, and more.

A single way to manage all data

Starburst Fig 1

With the combination of Starburst Enterprise for Presto and Red Hat OpenShift, organizations have a powerful architecture to manage their data in a straightforward and cost-effective way. Operators delivered with Red Hat OpenShift Container Platform provide greater agility through automated installation, upgrades, and lifecycle management throughout the container stack. Starburst has delivered an Operator that allows you to simplify the administration of a Presto cluster and:

Auto-configure, tune, and manage Presto clusters based on the hardware you’ve provisioned. You no longer need to manually size and configure your environment or determine how many Java Virtual Machines you need to set up and install given your hardware and compute constraints. Starburst’s Red Hat OpenShift Operator automatically determines what to deploy into your environment. It identifies the appropriate hardware and provisions instances while automatically managing updates to your environment.

Elastically scale your Presto Worker cluster based on query load. You can specify thresholds for Presto Worker pods with the Kubernetes Horizontal Pod Autoscaler (HPA). As the number of queries increases, the HPA is designed to automatically spin up additional pods based on system constraints, to avoid taking too many resources from other applications.

Create a load balancer for high availability. The Presto coordinator server parses statements and coordinates the execution of queries. By creating a load balancer in Red Hat OpenShift, you can specify virtual IP addresses so if a service fails in one pod it automatically restarts in another, significantly minimizing downtime.

Monitor hardware and software layers. Red Hat OpenShift integrates with Prometheus, the open source project for cluster monitoring. Prometheus monitors what a pod is doing and automatically alerts Kubernetes if a pod is down. With this information, pods can automatically be commissioned or decommissioned as necessary.

Scale down and close queries. The HPA will let you slowly decommission underutilized Presto Worker pods and free up system resources for other tasks without interruption of service.

Starburst Enterprise for Presto also lets you make use of data storage platforms typically deployed with OpenShift. For example, your application developers may be making use of SQL and NoSQL databases running in Red Hat OpenShift Container Storage, or you might be pulling data in from data lakes running on Red Hat Ceph Storage. Archives stored in Parquet files on Red Hat Ceph Storage are also useful for data analytics.

In addition, Starburst Enterprise for Presto makes data stored in traditional data stores—such as transactional relational databases, data warehouses, or Hadoop—accessible to applications integrated with modern software-defined and cloud-native data stores.

All of this results in a time-saving, more efficient and cost-effective approach to data management.

Giving stakeholders fast, easy, efficient data access

While your business saves time and money, your users can also benefit from the integration of Starburst Enterprise for Presto and Red Hat OpenShift:

Data scientists can execute high-speed queries to get to the data they need, regardless of where it rests. They can more easily join datasets via parallel connections, connect multiple sources, and analyze data more quickly.

Data directors and data architects can also execute high-speed queries to give business analysts the information they need now. They can also optimize their cloud spend by separating compute and storage while eliminating the time spent on complex ETL processes and data preparation. And, they can build modern data architectures that are open, flexible, and capable of existing within any stack.

Modernizing data infrastructures

With Starburst Enterprise for Presto on Red Hat Openshift, you can modernize your data infrastructure at your own pace while maintaining your ability to answer your most critical business questions. You’ll be better able to cost-effectively and efficiently deploy compute resources to perform rapid reporting, business intelligence analysis, and SQL queries and get answers quickly without having to worry about where data is stored, what format it’s in, or whether it’s in flight or at rest.

To learn more about Red Hat OpenShift and Starburst Enterprise for Presto, review the overview document and register for our webinar to see a demo and hear about how the Red Hat and Starburst are simplifying data access. Red Hat will also be a Platinum sponsor of Starburst’s Datanova event.

Sull'autore

Michael St-Jean

Global Technical Alliance Executive

Michael St-Jean is a Technical Alliance executive focused on building joint solutions with partners that accelerate time to value for organizations' strategic technology initiatives. For over two decades, Michael has worked with cross-functional teams helping organizations solve complex business challenges with innovative technology solutions and strategies.

Read full bio