Distributed cloud and hybrid cloud applications are increasingly popular, but the transition to cloud deployments cannot take place instantly. As a result, many organizations rely on a combination of traditional applications as well as more modern applications to run their business and make critical decisions. Likewise, most employ a combination of both traditional and modern data sources with data scattered across datacenters, cloud, and vendor environments.
For example, an analyst might need to combine data from a PostgreSQL app on a Kubernetes persistent volume (PV), general ledger running in Microsoft SQL Server, and archived client data in an object store. Enterprises need a reliable and consistent user and operational experience that lets them develop applications and analyze data from diverse sources rapidly while managing infrastructure effectively.
Starburst Enterprise provides a modern solution built on the open source Trino (formerly known as PrestoSQL) distributed SQL query engine. Trino was designed and written for interactive analytic queries against data sources of all sizes, ranging from gigabytes to petabytes. It approaches the speed of commercial data warehouses while scaling to the size of large organizations. Starburst Enterprise adds the tools and 24x7 support that organizations need for big data access at scale.
The Starburst Enterprise platform provides distributed query support for varied data sources, including:
- NoSQL systems (MongoDB, Cassandra, Redis).
- SQL databases (Microsoft SQL Server, MySQL, PostgreSQL).
- Data warehouses (IBM Db2 Warehouse, Teradata, Oracle Exadata, Snowflake).
- Hive (HDFS, Cloudera, MapR).
- Data services (Kafka, Elasticsearch, OpenShift Data Foundation).
- Cloud object storage (AWS S3, ADLS, Azure Blob, Ceph, IBM COS).
Starburst on Red Hat OpenShift Container Platform
Operating Trino at scale can present challenges, especially for those attempting to size and configure their environments manually. Organizations need ways to achieve petabyte scale with autoscaling and to decommission resources gracefully when they are no longer needed. Combining Starburst Enterprise with Red Hat OpenShift Container Platform addresses these needs by offering automation, high availability, elasticity, and monitoring for Trino clusters.
Kubernetes operators delivered with Red Hat OpenShift Container Platform automate installation, upgrades, and life-cycle management throughout the container stack. A Kubernetes operator for Starburst Enterprise lets Red Hat OpenShift greatly simplify the administration of a Starburst Enterprise cluster. Together, these operators offer benefits that include:
Automation. Red Hat OpenShift and Starburst Enterprise operators provide automatic configuration, tuning, and management of Starburst Enterprise clusters. Red Hat OpenShift operators determine what to deploy, including identifying the hardware and provisioning new instances. Starburst Enterprise operators manage updates to the environment.
High availability. Continuous operation of the Trino coordinator is essential. Using liveness probes, the Red Hat OpenShift load balancer can keep services like the Trino coordinator in an always-on state.
Elastic scalability. Red Hat OpenShift can automatically scale the Trino worker cluster based on query load. Using the Kubernetes Horizontal Pod Autoscaler (HPA), organizations can specify thresholds for Trino worker pods. As the number of queries increases, the HPA will automatically spin up additional worker pods based on specified system constraints.
Graceful scale-down and decommissioning. With Red Hat OpenShift, reduced load does not imply system downtime or killed queries. The Kubernetes HPA will gracefully decommission unused Trino worker pods and free system resources for other tasks without service interruptions.
Monitoring for all hardware and software layers. Prometheus, the cluster monitoring service for Red Hat OpenShift, delivers metrics and alerts that inform Kubernetes orchestration and populate the Red Hat OpenShift dashboard. Prometheus informs Kubernetes if a pod is offline and provides metrics to the HPA to let it know whether to commission or decommission additional Trino pods.
Support for Red Hat Data Services. Starburst Enterprise lets organizations use data storage platforms associated with Red Hat OpenShift, like Red Hat OpenShift Data Foundation and Red Hat OpenShift Data Science. Application developers can make use of SQL and NoSQL databases backed by OpenShift-friendly data services. Businesses can pull data from data lakes running on Red Hat Ceph Storage archives. Parquet files on Red Hat Ceph Storage are also useful for data analytics.
Figure 1 illustrates how Starburst Enterprise works with Red Hat OpenShift Container Platform and related Red Hat Data Services platforms.