Data is often the elephant in the room. It is obvious that applications are useless without data, that data is no less important now than it was at the dawn of computing, and that there’s no end in sight to the exponential growth of data. The term “exponential” is tossed about rather flippantly these days — it’s easy to lose sight of its basic mathematical implications — but some analysts suggest that more data will be created in the next three years than has been created in the last thirty.
Most people in technology are familiar with Moore’s Law, originally an observation that the number of transistors on a chip doubles every two years, which roughly translates into compute capability doubling commensurately. The specific phenomenon of transistor density doubling held for many years but eventually flattened as various physical asymptotes were approached. However, pulling the camera back and looking at the bigger picture, compute capability continued its trajectory thanks to other contributing factors such as better parallelization.
So what does this mean for data and storage? One important analogy to consider is that, just like we couldn’t continue to get exponential growth in compute by simply increasing transistor density, a given enterprise will not likely succeed in getting sustained value from its growing data simply by adding more storage arrays on its network. Different ways of dealing with data, analogous to parallelization and other ways of accelerating compute despite flattened transistor density, are needed going forward.
Data Services in the Cloud-Native World
Enter the world of cloud-native data services. “Cloud-native” is perhaps a bit of an overloaded term in industry buzz-speak, but it is at this point reasonably well established as implying the use of fine-grained modularization (containers) and a means of automating the orchestration of large numbers of modules (Kubernetes).
Containers have enabled developers to structure applications as composites of many small modules (microservices), bringing benefits such as easier, more rapid incremental innovation with less risk and disruption, as well as greater operational flexibility and resilience when capacity and placement needs evolve. Red Hat OpenShift brings all this together in a Kubernetes-based enterprise cloud platform for development and operations.
With large numbers of small immutable workloads being constantly spun up and down in a microservices environment, the assumption of static, long-running data connections becomes problematic.
In the old world of monolith-to-monolith, application-to-database applications, the overhead to establish a connection wasn’t a big deal. Now there is an impedance mismatch between monolithic data stores and distributed, fine-grained workloads.
Technologies like Ceph (and its enterprise counterpart Red Hat OpenShift Container Storage) bridge this gap and match existing and new storage hardware through a software-defined abstraction that enables microservices to get the fast, automatic attach and detach they need.
Data at Rest, Data in Motion, and Data in Action
But it’s not just about connecting to simple storage. Of course, the need for traditional storage functions such backups, replication, and security don’t go away in a cloud-native data services world, they are just initiated and managed in new ways — in many cases much more automatically.
This is where Ceph’s software-defined storage capabilities are a powerful complement for Kubernetes’ machinery for dynamically provisioning workloads with the right persistence functionality. Many of these capabilities are about data “at rest.”
Applications often pull data from multiple sources to carry out a task, and increasingly such aggregation is expected on demand — last night’s batch job is already stale. This is an area where the data services approach really shines — developers can rely on Kubernetes automation to dynamically connect data sources, sometimes streaming with Apache Kafka, sometimes triggering serverless functions with events, to handle data “in motion.”
When that disparate data has been brought together it can have impact. A data service can populate that list of recommended next actions. A trained model can help identify whether a lung X-ray indicates potential cancer. A continuously learning model can help a self-driving car avoid a pedestrian. This is data “in action.”
The Future: AI/ML
Even in our COVID-impacted reality, machine learning continues to be a strong driver of expansion in need for data capabilities, both in terms of raw capacity and in new functionality. Model training entails aggregating large amounts of data (the larger the better) in a temporary structure. A mature learning environment likely has a sophisticated data pipeline that feeds a training regimen executed on a regular basis for continuous model refinement. All of this motivates the need for a new sort of data processing platform.
Red Hat has been incubating such a platform in the Open Data Hub open source project. Open Data Hub combines Ceph, Kubeflow, Apache Spark, Jupyter, Kafka, Seldon, Argo CD, and other open source projects to create a comprehensive yet pluggable and configurable environment to support a variety of machine learning use cases. We use it today underneath Red Hat Insights, and it has been used by Red Hat Consulting in a number of customer deployments. Look for continued development in this area!
Conclusion
For operations folks, storage has long been a critical infrastructure element to get right. That is even more true today. For developers, storage has long been something buried deep in the infrastructure that they probably didn’t care about (until it broke). Today, the mutually reinforcing drivers of microservices and machine learning demand a new approach, with data capabilities expressed as cloud-native data services that empower the developer and delight the operator.
Onward to the open hybrid cloud!
저자 소개
Imaginative but reality-grounded product exec with a passion for surfacing the relevant essence of complex technology. Strong technical understanding complemented by ability to explain, excite, and lead. Driven toward challenge and the unknown.
채널별 검색
오토메이션
기술, 팀, 인프라를 위한 IT 자동화 최신 동향
인공지능
고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트
오픈 하이브리드 클라우드
하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요
보안
환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보
엣지 컴퓨팅
엣지에서의 운영을 단순화하는 플랫폼 업데이트
인프라
세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보
애플리케이션
복잡한 애플리케이션에 대한 솔루션 더 보기
오리지널 쇼
엔터프라이즈 기술 분야의 제작자와 리더가 전하는 흥미로운 스토리
제품
- Red Hat Enterprise Linux
- Red Hat OpenShift Enterprise
- Red Hat Ansible Automation Platform
- 클라우드 서비스
- 모든 제품 보기
툴
체험, 구매 & 영업
커뮤니케이션
Red Hat 소개
Red Hat은 Linux, 클라우드, 컨테이너, 쿠버네티스 등을 포함한 글로벌 엔터프라이즈 오픈소스 솔루션 공급업체입니다. Red Hat은 코어 데이터센터에서 네트워크 엣지에 이르기까지 다양한 플랫폼과 환경에서 기업의 업무 편의성을 높여 주는 강화된 기능의 솔루션을 제공합니다.