Red Hat Blog
You’ve heard the adage that every company now is a software company. The fuel that drives it is data.
By the same token, many enterprises are considering cloud-native technologies based on Kubernetes and microservices for business innovation. However many enterprises dealing with extremely large data sets have not been able to run data analytics applications on the same IT infrastructure running the rest of their workloads.
With the Red Hat data analytics infrastructure, analytics workloads can join other enterprise applications on Red Hat infrastructure, whether virtualized or containerized.
The core premise of the solution is to release the lock on data by being more accessible to more data engineers, with the agility of dynamic cluster provisioning and without data duplication, collection, and de-staging to help data retain its integrity and provide a single source of truth.
What is different about our approach you ask? Disaggregating compute from storage, to enable dynamic provisioning of compute clusters which rely on a shared data repository.
Data analytics -- and the infrastructure to support it -- is a topic of discussion for many different teams within an organization.
The most common complaint we hear from data engineering teams is the lack of agility. Specifically, the inability to dynamically provision clusters with the right resources, versions, and data. And delayed results inherent from sharing the same static analytics clusters with other teams with different objectives.
Interestingly, the most common complaint we hear from data platform teams is the difficulty maintaining consistency between multiple, analytics cluster silos which they’ve deployed to meet the disparate needs of disparate data engineering teams.
And finally, the most common complaint we hear from IT infrastructure teams is the inability of analytics clusters to use the general-purpose IT infrastructure they’ve deployed for their other workloads. The end result can be delayed and inaccurate business insights. By separating the compute and storage layers, analytics teams can now have access to their own clusters - tailored to their needs - with the ability to share common data sets.
To the cloud and beyond
As enterprises grapple with the effects of data gravity and lock in, they can look beyond public cloud vendors to meet their analytics needs and yet maintain a cloud-like experience for their internal stakeholder.
But what if we could offer customers a solution designed to enable the best of both worlds? Enterprises can deploy on-premise infrastructure designed to allow customers to enjoy a cloud-like, on-demand experience for analytics cluster provisioning, by de-coupling analytics compute servers from analytics storage.
Red Hat-based private cloud platforms enjoy widespread deployment across enterprises. Accordingly, several organizations have moved their analytics workloads to Red Hat OpenStack Platform with shared object storage provided by Red Hat Ceph Storage underneath.
This approach to building data infrastructure can give data analytics teams the agility to spin-up their own clusters using a common private cloud infrastructure, without the unnecessary cost and complexity of duplicating data sets in non-shared HDFS silos.
The Kubernetes factor
Conversations around IT infrastructure may not be complete without a mention of Kubernetes. Analytics workloads have historically run on their own, dedicated hardware , outside the enterprise's general purpose private cloud or Kubernetes infrastructure.
Our customers often want the agility of running analytics workloads on Kubernetes orchestrated containers, for the same reasons they're migrating their other workloads to Red Hat OpenShift Container Platform.
One way to enable agility and help prevent data set duplication is to provide on-demand analytics cluster provisioning based on a shared storage repository.
Additionally, OpenShift offers an innovative platform for intelligent applications, which collect and learn from data to provide better functionality with longevity and popularity. The radanalytics.io open-source community enables intelligent applications on OpenShift by providing images and tooling to manage Apache Spark, Jupyter notebooks, and TensorFlow training and serving. Combined with Red Hat OpenShift Container Platform and high-performance scalable storage in Red Hat Ceph Storage, radanalytics.io empowers developers and operators to deploy and manage intelligent applications that can scale elastically as compute or storage demands increase.
Building a data ecosystem
Red Hat builds platforms designed to be reliable, long-lasting with strong security features. It’s what we do. This work to make analytics workloads first-class citizens in our infrastructure platform story is an extension of our work from the past 25 years.
The most recent example of this was illustrated by news from Hortonworks, IBM, and Red Hat earlier this week. Hortonworks and IBM announced that they will support products, including HDP and IBM Cloud Private for Data, on Red Hat OpenShift Container Platform.
It’s important to the success of our platform to have partners like IBM and Hortonworks, not only open solution architectures to our technologies, but also share our view about the disaggregation of compute and storage for analytics workloads.
Come see us at the Strata conference this week, or learn more at redhat.com/bigdata.