This Storage Tutorial was filmed live at Spark Summit East.
Our host, Brian Chang, is joined by Peter Wang, president of Continuum, along with show regulars Irshad Raihan and Greg Kleiman of Red Hat Big Data. Peter fills the group in about what buzz he is hearing at the conference as well as what sorts of big data use cases he’s seeing best supported on Spark. Read on for an excerpt of the conversation, but check out the video for the full discussion.
What is Continuum analytics?
Continuum Analytics supports the use of open source data science tools, primarily around the python programming language. Many of the core libraries in Python for data and scientific computing were written by principles at Continuum, and we’ve been heavily involved in PyData and promoting the use of Python for data and analytics
Tell our viewers about what is Spark and what are you hearing about Spark at this conference?
It’s very exciting, this is my first time at the summit! I’m really excited to see the energy around the technology stack and around the things happening with Spark. The most interesting thing for me is, the Python world has been involved in high end, very large data science and data analytics workloads for a long time, but the rise of Hadoop was a separate sort of thing. Python and R were outsiders in the Hadoop ecosystem. What we’re seeing with Spark that’s interesting, is that they are working really hard to ensure Python and R are native in the technology stack. It goes to the design of the underlying components in Spark even, whether it is a scheduler, or the resilient data structure, all these things in Spark are exposed nicely in Python.
There’s great energy, great buzz here. The show floor is certainly smaller than Strata+Hadoop, so you feel like this is an event that will grow as time goes on. A lot of the energy behind Spark is, because it has taken the storage efficiencies of Hadoop and made that more accessible to a wider audience. A lot of people were not thrilled about doing MapReduce jobs in Java, they’d rather do them in Python, but that connection was tenuous. But now with Spark and with Python behind a first class citizen in the Spark ecosystem, a lot of people, at least the Python folks I’ve talked to with Hadoop workloads, they are excited about that.
Tell us about those high-end workloads you talked about?
There are a lot of people doing traditional cluster level workloads using Red Hat in the cluster, and use Python to drive the computation. As Hadoop has emerged and Spark has emerged on top of Hadoop, we’re seeing a lot of these people doing exploratory data science and analytics with Python on a workstation, but then they have to port to larger scale equipment. There’s a workflow impediment, a mismatch there, between the workload they can do on their machine, which doesn’t have a petabyte of storage attached to it. After they do the work on the subset and do the work on scale, that moving back and forth, we’ve built tools like Anaconda cluster that eases the transitions, but the actual storage of the bits….at the end of the day, we all know that when you do computation at scale you have to move code to data.
So where the data sits, that’s an important place. How the data is formatted, what file systems, what walled gardens are built around it, those limit what you can do. It’s unfortunate. Your storage should be flexible, it should give you scale and resiliency without limiting what you can do.
Watch the video for more of the conversation!
Sobre o autor
Navegue por canal
Automação
Últimas novidades em automação de TI para empresas de tecnologia, equipes e ambientes
Inteligência artificial
Descubra as atualizações nas plataformas que proporcionam aos clientes executar suas cargas de trabalho de IA em qualquer ambiente
Nuvem híbrida aberta
Veja como construímos um futuro mais flexível com a nuvem híbrida
Segurança
Veja as últimas novidades sobre como reduzimos riscos em ambientes e tecnologias
Edge computing
Saiba quais são as atualizações nas plataformas que simplificam as operações na borda
Infraestrutura
Saiba o que há de mais recente na plataforma Linux empresarial líder mundial
Aplicações
Conheça nossas soluções desenvolvidas para ajudar você a superar os desafios mais complexos de aplicações
Programas originais
Veja as histórias divertidas de criadores e líderes em tecnologia empresarial
Produtos
- Red Hat Enterprise Linux
- Red Hat OpenShift
- Red Hat Ansible Automation Platform
- Red Hat Cloud Services
- Veja todos os produtos
Ferramentas
- Treinamento e certificação
- Minha conta
- Suporte ao cliente
- Recursos para desenvolvedores
- Encontre um parceiro
- Red Hat Ecosystem Catalog
- Calculadora de valor Red Hat
- Documentação
Experimente, compre, venda
Comunicação
- Contate o setor de vendas
- Fale com o Atendimento ao Cliente
- Contate o setor de treinamento
- Redes sociais
Sobre a Red Hat
A Red Hat é a líder mundial em soluções empresariais open source como Linux, nuvem, containers e Kubernetes. Fornecemos soluções robustas que facilitam o trabalho em diversas plataformas e ambientes, do datacenter principal até a borda da rede.
Selecione um idioma
Red Hat legal and privacy links
- Sobre a Red Hat
- Oportunidades de emprego
- Eventos
- Escritórios
- Fale com a Red Hat
- Blog da Red Hat
- Diversidade, equidade e inclusão
- Cool Stuff Store
- Red Hat Summit