Big Data: An Ideal Workload for Open Hybrid Cloud

February 20, 2013

Bryan Che, general manager, Cloud Business, Red Hat

Big Data Is More Than Hadoop

Many people today associate big data with Hadoop analytics – and, to be sure, Hadoop is an important technology for this. However, just as the Linux operating system is so much more than just the Linux kernel, a big data environment is so much more than just a Hadoop cluster.

 

In fact, from a Red Hat perspective, you need three primary things for a robust big data deployment:

  • Big data infrastructure

  • Big data analytics tools

  • Big data application platform

 

Red Hat is working to deliver enterprise big data solutions that integrate these three areas across an open hybrid cloud. In this post, I'll focus on how Red Hat plans to deliver a scalable, cloud-based big data infrastructure.

 

Big Data Infrastructure

Big data infrastructure needs to provide scalable compute and storage infrastructure, and an open hybrid Infrastructure-as-a-Service (IaaS) cloud provides an ideal architecture for this. Here are the elements Red Hat is building out to run big data workloads in the cloud:

 

Scalable Storage

Spinning up compute capacity in the cloud is important to big data – and I’ll explain more about this below – but first and foremost, big data requires scalable data storage that grows alongside compute.

 

Red Hat Storage provides scale-out storage that can extend into an open hybrid cloud. Leveraging the GlusterFS distributed filesystem, here is how it does so:

  • Red Hat Storage is a pure software solution that runs on top of standard Red Hat Enterprise Linux with the XFS filesystem. This means Red Hat Storage can run anywhere Red Hat Enterprise Linux runsincluding across physical systems, virtualized infrastructure and private or public clouds

  • Red Hat Storage provides a global namespace, even across multiple data centers and across hybrid clouds. This allows a hybrid cloud in which virtual machines in a public cloud can operate on the exact same data as virtual machines in a private cloud

  • Red Hat is working on a Red Hat Storage Hadoop plugin that it will contribute to the Apache Hadoop community. As a result, big data workloads with Hadoop analytics will be able to leverage Red Hat Storage as the underlying data store and span across hybrid clouds

 

Scalable Compute

Red Hat is also a leader in the OpenStack IaaS project and is working to deliver an enterprise OpenStack distribution to market (currently available as preview to anyone with a Red Hat Enterprise Linux subscription). OpenStack aims to provide the ability to build a large private cloud that can host big data compute workloads. As big data compute needs of an organization grow, OpenStack will be able to elastically expand cloud-based computing capacity through the provisioning of new virtual machines.

Image1.png

In order for OpenStack compute capacity to adjust dynamically according to big data needs and policy, though, it needs cloud operations management tools. Red Hat's recent acquisition, ManageIQ, provides these capabilities. ManageIQ includes rich monitoring and analytics tools to determine what is happening to cloud infrastructure. For example, it can determine when a particular cloud provider is saturated in certain resources. ManageIQ also includes the ability to create policies and provides orchestration tools to automate responses to events and policies. Combined, these capabilities can enable an enterprise to leverage ManageIQ's features to auto-flex OpenStack-based capacity for big data computations.

Image2.png

As large as today's data centers are, a single one is often not enough for for big data workloads. Data can also reside in more than one place—requiring that associated computing does as well. As a result, many enterprises span multiple data centers as well as private and public clouds. Red Hat's CloudForms product aggregates multiple, disparate providers into uniform hybrid clouds. By leveraging CloudForms on top of OpenStack as well as public clouds, enterprises can deploy a big data compute platform that scales, not just within one OpenStack deployment, but across an entire hybrid cloud spanning multiple data centers. Furthermore, because CloudForms aggregates capacity across a variety of different cloud technology providers such as Red Hat, VMware, and Amazon AWS, enterprises can use both existing and new compute capacity without being locked into a single technology provider or platform. Red Hat is in the process of integrating ManageIQ and CloudForms into a next-generation version of CloudForms. This single cloud management platform is designed to be able to aggregate and operate across open hybrid clouds in one interface.

Image3.png

 

Open Hybrid Cloud Infrastructure for Big Data

Now let's bring it all together. Here's how Red Hat plans to bring its scalable compute and storage capabilities together in one open hybrid cloud:

  • Because Red Hat Storage can run in a virtual machine, we can make it available as a resource both in OpenStack and in a public cloud

  • As CloudForms and ManageIQ orchestrate the scaling out of compute capacity in an open hybrid cloud, they can simultaneously do so for storage capacity as well by spinning up additional virtual machines running Red Hat Storage

  • All this compute and storage can work seamlessly together across data center and firewall boundaries, because Red Hat Storage provides a global namespace

Image4.png

 

Big Data: An Ideal Workload for Open Hybrid Cloud

Big data, by its very nature, requires big, scalable infrastructure to run. An open hybrid cloud that spans multiple resource providers in private and public clouds, while simultaneously scaling out both compute and storage capacity, provides an extremely powerful platform for big data workloads. Red Hat is focused on delivering this type of infrastructure for enterprises to run big dataand all their other workloadsacross an open hybrid cloud.

 

In follow-up posts, I'll discuss why an open hybrid cloud makes sense for big data analytics and big data application platforms and how Red Hat is working to deliver those as well.

Back to top