订阅内容

by Steve Watt, Chief Architect, Big Data, Red Hat

Red Hat and Continuum Analytics are pleased to announce a new solution that allows customers to deploy PySpark on top of Red Hat Storage GlusterFS. If you're attending Strata, you are encouraged to swing by the Red Hat Booth to grab a solution brief that describes how the solution is put together and how you can set it up. However, for those of you that are not at Strata, here's the overview -- and be sure to check out the technology brief, here.

Continuum Analytics are the makers of Anaconda, a leading Python distribution. At Strata, Continuum Analytics are announcing a new product, Anaconda Cluster, which is a highly-scalable cluster resource management tool. Red Hat Storage GlusterFS is a cost effective, easily scalable, POSIX compliant, distributed filesystem that runs on industry standard servers. Given that accessing data in HDFS from Python can be cumbersome, Red Hat and Continuum Analytics have built a solution that enables Anaconda Cluster to deploy PySpark on GlusterFS. This collocated solution keeps life simple for Python developers by providing a Python interface to Apache Spark that is able to read and write data on a distributed filesystem that looks and works like the local filesystems that they are used to. Furthermore, given that both Python and GlusterFS are written in C, this allows easy access to data from Python applications whether they are running on-premise or in the cloud.

If you'd like to try it out, please check out the demo posted in the video below and its accompanying tutorial: https://github.com/wattsteve/pyspark-tutorial


关于作者

按频道浏览

automation icon

自动化

有关技术、团队和环境 IT 自动化的最新信息

AI icon

人工智能

平台更新使客户可以在任何地方运行人工智能工作负载

open hybrid cloud icon

开放混合云

了解我们如何利用混合云构建更灵活的未来

security icon

安全防护

有关我们如何跨环境和技术减少风险的最新信息

edge icon

边缘计算

简化边缘运维的平台更新

Infrastructure icon

基础架构

全球领先企业 Linux 平台的最新动态

application development icon

应用领域

我们针对最严峻的应用挑战的解决方案

Original series icon

原创节目

关于企业技术领域的创客和领导者们有趣的故事