Containers have grabbed so much attention because they demonstrated a way to solve the software packaging problem that the IT industry had been poking and prodding at for a very long time. Linux package management, application virtualization (in all its myriad forms), and virtual machines had all taken cuts at making it easier to bundle and install software along with its dependencies. But it was the container image format and runtime that is now standardized under the Open Container Initiative (OCI) that made real headway toward making applications portable across different systems and environments.
Containers have also both benefited from and helped reinforce the shift toward cloud-native application patterns such as microservices. However, because the most purist approaches to cloud-native architectures de-emphasized stateful applications, the benefits of containers for storage portability haven’t received as much attention. That’s an oversight. Because it turns out that the ability to persist storage and make it portable matters, especially in the hybrid cloud environments spanning public clouds, private clouds, and traditional IT that are increasingly the norm.
One important reason that data portability matters is "data gravity," a term coined by Dave McCrory. He’s since fleshed it out in more detail, but the basic concept is pretty simple. Because of network bandwidth limits, latency, costs, and other considerations, data "wants" to be near the applications analyzing, transforming, or otherwise working on it. This is a familiar idea in computer science. Non-Uniform Memory Access (NUMA) architectures—which describes pretty much all computer systems today to a greater or lesser degree—have similarly had to manage the physical locality of memory relative to the processors accessing that memory.
Likewise, especially for applications that need fast access to data or that need to operate on large data sets, you need to think about where the data is sitting relative to the application using that data. And, if you decide to move an application from on-premise to a public cloud for rapid scalability or other reasons, you may find you need to move the data as well.
But moving data runs into some roadblocks. Networking limits and costs were and are one limitation; they’re a big part of data gravity in the first place. However, traditional proprietary data storage imposed its own restrictions. You can’t just fire up a storage array at an arbitrary public cloud provider to match the one in your own datacenter.
Enter software-defined storage.
As the name implies, software-defined storage decouples storage software from hardware. It lets you abstract and pool storage capacity across on-premise and cloud environments to scale independently of specific hardware components. Fundamentally, traditional storage was built for applications developed in the 1970s and 1980s. Software-defined storage is geared to support the applications of today and tomorrow, applications that look and behave nothing like the applications of the past. Among these are rapid scalability, especially for high volume unstructured data that may need to expand rapidly.
However, with respect to data portability specifically, one of the biggest benefits of software-defined storage like Gluster is that the storage software itself runs on generic industry standard hardware and virtualized infrastructure. This means that you can spin up storage wherever it makes the most sense for reasons of cost, performance, or flexibility.
Containerizing the storage
What remains is to simplify the deployment of persistent software-defined storage. It turns out that containers are the answer to this as well. In effect, storage can be treated just like a containerized application within a Kubernetes cluster—Kubernetes being the orchestration tool that groups containerized application components into a complete application.
With this approach, storage containers are deployed alongside other containers within the Kubernetes nodes. Rather than simply accessing ephemeral storage from within the container, this model deploys storage in its own containers, alongside the containerized application. For example, storage containers can implement a Red Hat Gluster Storage brick to create a highly-available GlusterFS volume that handles the storage resources present on each server node.
Depending on system configuration, some nodes might only run storage containers, some might only run containerized applications, and some nodes might run a mixture of both. Using Kubernetes with its support for persistent storage as the overall coordination tool, additional storage containers could be easily started to accommodate storage demand, or to recover from a failed node. For instance, Kubernetes might start additional containerized web servers in response to demand or load, but might restart both application and storage containers in the event of a hardware failure.
Kubernetes manages this through Persistent Volumes (PV). PV is a resource in the cluster just like a node is a cluster resource. PVs are a plugin related to the Kubernetes Volumes abstraction, but have a lifecycle independent of any individual pod that uses the PV. This allows for clustered storage that doesn’t depend on the availability or health of any specific application container.
Modular apps plus data
The emerging model for cloud-native application designs is one in which components communicate through well-documented interfaces. Whether or not a given project adopts a pure "microservices" approach, applications are generally becoming more modular and service-oriented. Dependencies are explicitly declared and isolated. Scaling is horizontal. Configurations are stored in the environment. Components can be started quickly and gracefully shutdown.
Containers are a great match with these cloud-native requirements. However, they’re also a great match for providing persistent storage for hybrid cloud-native applications. Not all cloud apps are stateless or can depend on a fixed backing store that’s somewhere else. Data has gravity and it matters where the data lives.
About the authors
Gordon Haff is a technology evangelist and has been at Red Hat for more than 10 years. Prior to Red Hat, as an IT industry analyst, Gordon wrote hundreds of research notes, was frequently quoted in publications such as The New York Times on a wide range of IT topics, and advised clients on product and marketing strategies.