피드 구독

Data is often the elephant in the room. It is obvious that applications are useless without data, that data is no less important now than it was at the dawn of computing, and that there’s no end in sight to the exponential growth of data. The term “exponential” is tossed about rather flippantly these days — it’s easy to lose sight of its basic mathematical implications — but some analysts suggest that more data will be created in the next three years than has been created in the last thirty.

Most people in technology are familiar with Moore’s Law, originally an observation that the number of transistors on a chip doubles every two years, which roughly translates into compute capability doubling commensurately. The specific phenomenon of transistor density doubling held for many years but eventually flattened as various physical asymptotes were approached. However, pulling the camera back and looking at the bigger picture, compute capability continued its trajectory thanks to other contributing factors such as better parallelization.

So what does this mean for data and storage? One important analogy to consider is that, just like we couldn’t continue to get exponential growth in compute by simply increasing transistor density, a given enterprise will not likely succeed in getting sustained value from its growing data simply by adding more storage arrays on its network. Different ways of dealing with data, analogous to parallelization and other ways of accelerating compute despite flattened transistor density, are needed going forward.

Data Services in the Cloud-Native World

Enter the world of cloud-native data services. “Cloud-native” is perhaps a bit of an overloaded term in industry buzz-speak, but it is at this point reasonably well established as implying the use of fine-grained modularization (containers) and a means of automating the orchestration of large numbers of modules (Kubernetes). 

Containers have enabled developers to structure applications as composites of many small modules (microservices), bringing benefits such as easier, more rapid incremental innovation with less risk and disruption, as well as greater operational flexibility and resilience when capacity and placement needs evolve. Red Hat OpenShift brings all this together in a Kubernetes-based enterprise cloud platform for development and operations.

With large numbers of small immutable workloads being constantly spun up and down in a microservices environment, the assumption of static, long-running data connections becomes problematic.

In the old world of monolith-to-monolith, application-to-database applications, the overhead to establish a connection wasn’t a big deal. Now there is an impedance mismatch between monolithic data stores and distributed, fine-grained workloads. 

Technologies like Ceph (and its enterprise counterpart Red Hat OpenShift Container Storage) bridge this gap and match existing and new storage hardware through a software-defined abstraction that enables microservices to get the fast, automatic attach and detach they need. 

Data at Rest, Data in Motion, and Data in Action

But it’s not just about connecting to simple storage. Of course, the need for traditional storage functions such backups, replication, and security don’t go away in a cloud-native data services world, they are just initiated and managed in new ways — in many cases much more automatically. 

This is where Ceph’s software-defined storage capabilities are a powerful complement for Kubernetes’ machinery for dynamically provisioning workloads with the right persistence functionality. Many of these capabilities are about data “at rest.”

Applications often pull data from multiple sources to carry out a task, and increasingly such aggregation is expected on demand — last night’s batch job is already stale. This is an area where the data services approach really shines — developers can rely on Kubernetes automation to dynamically connect data sources, sometimes streaming with Apache Kafka, sometimes triggering serverless functions with events, to handle data “in motion.”

When that disparate data has been brought together it can have impact. A data service can populate that list of recommended next actions. A trained model can help identify whether a lung X-ray indicates potential cancer. A continuously learning model can help a self-driving car avoid a pedestrian. This is data “in action.”

The Future: AI/ML

Even in our COVID-impacted reality, machine learning continues to be a strong driver of expansion in need for data capabilities, both in terms of raw capacity and in new functionality. Model training entails aggregating large amounts of data (the larger the better) in a temporary structure. A mature learning environment likely has a sophisticated data pipeline that feeds a training regimen executed on a regular basis for continuous model refinement. All of this motivates the need for a new sort of data processing platform. 

Red Hat has been incubating such a platform in the Open Data Hub open source project. Open Data Hub combines Ceph, Kubeflow, Apache Spark, Jupyter, Kafka, Seldon, Argo CD, and other open source projects to create a comprehensive yet pluggable and configurable environment to support a variety of machine learning use cases. We use it today underneath Red Hat Insights, and it has been used by Red Hat Consulting in a number of customer deployments. Look for continued development in this area!

Conclusion

For operations folks, storage has long been a critical infrastructure element to get right. That is even more true today. For developers, storage has long been something buried deep in the infrastructure that they probably didn’t care about (until it broke). Today, the mutually reinforcing drivers of microservices and machine learning demand a new approach, with data capabilities expressed as cloud-native data services that empower the developer and delight the operator. 

Onward to the open hybrid cloud!


저자 소개

Imaginative but reality-grounded product exec with a passion for surfacing the relevant essence of complex technology. Strong technical understanding complemented by ability to explain, excite, and lead. Driven toward challenge and the unknown.

Read full bio
UI_Icon-Red_Hat-Close-A-Black-RGB

채널별 검색

automation icon

오토메이션

기술, 팀, 인프라를 위한 IT 자동화 최신 동향

AI icon

인공지능

고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트

open hybrid cloud icon

오픈 하이브리드 클라우드

하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요

security icon

보안

환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보

edge icon

엣지 컴퓨팅

엣지에서의 운영을 단순화하는 플랫폼 업데이트

Infrastructure icon

인프라

세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보

application development icon

애플리케이션

복잡한 애플리케이션에 대한 솔루션 더 보기

Original series icon

오리지널 쇼

엔터프라이즈 기술 분야의 제작자와 리더가 전하는 흥미로운 스토리