This Storage Tutorial was filmed live at Spark Summit East.
Our host, Brian Chang, is joined by Peter Wang, president of Continuum, along with show regulars Irshad Raihan and Greg Kleiman of Red Hat Big Data. Peter fills the group in about what buzz he is hearing at the conference as well as what sorts of big data use cases he’s seeing best supported on Spark. Read on for an excerpt of the conversation, but check out the video for the full discussion.
What is Continuum analytics?
Continuum Analytics supports the use of open source data science tools, primarily around the python programming language. Many of the core libraries in Python for data and scientific computing were written by principles at Continuum, and we’ve been heavily involved in PyData and promoting the use of Python for data and analytics
Tell our viewers about what is Spark and what are you hearing about Spark at this conference?
It’s very exciting, this is my first time at the summit! I’m really excited to see the energy around the technology stack and around the things happening with Spark. The most interesting thing for me is, the Python world has been involved in high end, very large data science and data analytics workloads for a long time, but the rise of Hadoop was a separate sort of thing. Python and R were outsiders in the Hadoop ecosystem. What we’re seeing with Spark that’s interesting, is that they are working really hard to ensure Python and R are native in the technology stack. It goes to the design of the underlying components in Spark even, whether it is a scheduler, or the resilient data structure, all these things in Spark are exposed nicely in Python.
There’s great energy, great buzz here. The show floor is certainly smaller than Strata+Hadoop, so you feel like this is an event that will grow as time goes on. A lot of the energy behind Spark is, because it has taken the storage efficiencies of Hadoop and made that more accessible to a wider audience. A lot of people were not thrilled about doing MapReduce jobs in Java, they’d rather do them in Python, but that connection was tenuous. But now with Spark and with Python behind a first class citizen in the Spark ecosystem, a lot of people, at least the Python folks I’ve talked to with Hadoop workloads, they are excited about that.
Tell us about those high-end workloads you talked about?
There are a lot of people doing traditional cluster level workloads using Red Hat in the cluster, and use Python to drive the computation. As Hadoop has emerged and Spark has emerged on top of Hadoop, we’re seeing a lot of these people doing exploratory data science and analytics with Python on a workstation, but then they have to port to larger scale equipment. There’s a workflow impediment, a mismatch there, between the workload they can do on their machine, which doesn’t have a petabyte of storage attached to it. After they do the work on the subset and do the work on scale, that moving back and forth, we’ve built tools like Anaconda cluster that eases the transitions, but the actual storage of the bits….at the end of the day, we all know that when you do computation at scale you have to move code to data.
So where the data sits, that’s an important place. How the data is formatted, what file systems, what walled gardens are built around it, those limit what you can do. It’s unfortunate. Your storage should be flexible, it should give you scale and resiliency without limiting what you can do.
Watch the video for more of the conversation!
저자 소개
유사한 검색 결과
채널별 검색
오토메이션
기술, 팀, 인프라를 위한 IT 자동화 최신 동향
인공지능
고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트
오픈 하이브리드 클라우드
하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요
보안
환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보
엣지 컴퓨팅
엣지에서의 운영을 단순화하는 플랫폼 업데이트
인프라
세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보
애플리케이션
복잡한 애플리케이션에 대한 솔루션 더 보기
오리지널 쇼
엔터프라이즈 기술 분야의 제작자와 리더가 전하는 흥미로운 스토리
제품
- Red Hat Enterprise Linux
- Red Hat OpenShift Enterprise
- Red Hat Ansible Automation Platform
- 클라우드 서비스
- 모든 제품 보기
툴
체험, 구매 & 영업
커뮤니케이션
Red Hat 소개
Red Hat은 Linux, 클라우드, 컨테이너, 쿠버네티스 등을 포함한 글로벌 엔터프라이즈 오픈소스 솔루션 공급업체입니다. Red Hat은 코어 데이터센터에서 네트워크 엣지에 이르기까지 다양한 플랫폼과 환경에서 기업의 업무 편의성을 높여 주는 강화된 기능의 솔루션을 제공합니다.