Artificial intelligence (AI) and machine learning (ML) drive much of the world around us, from the apps on our phones to electric cars on the highway. Allowing such things to run as accurately as possible takes huge amounts of data to be collected and understood. At the helm of that critical information are data scientists. So, what’s a day on the job look like for data scientists at Red Hat?
Don Chesworth, Principal Data Scientist, gives you a glimpse into his day-to-day in a short video (aptly named “A Day in the Life of a Red Hat Data Scientist”) that’s now available on our website. Isabel Zimmerman, Data Science Intern, provides a look at some of the tools she uses on the job in “Using Open Data Hub as a Red Hat Data Scientist.” We’ll cover some of the highlights in this post.
Data scientists turn data into business insights
It’s been nearly a decade since Harvard Business Analytics identified data science as one of the hottest jobs of the 21st century, and the technology supporting people in this role has come a long way. Data scientists not only had to come to the table with an innate curiosity, but they had to also “fashion their own tools” to analyze data and visualize it for stakeholders.
Today, tools available in the Open Data Hub and Red Hat OpenShift help data experts focus on understanding and analyzing data instead of managing infrastructure.
Zimmerman explains that a data scientist isn't just someone who trains models, they also turn data into business insights. “Businesses don't have a one-size-fits-all method for machine learning systems,” she says.
“A well architected model may be useful for getting insights into data, but oftentimes in order to gain business value, models have to be deployed as part of a larger intelligent application that's constantly learning from data and making inferences on dynamic data streams.”
Data scientists can find a one-stop, end-to-end platform with Open Data Hub
Open Data Hub is an AI/ML platform that brings together different open source AI tools into a one-stop install. The click of a button starts Red Hat OpenShift with the Open Data Hub Operator already installed.
Within the platform, data scientists can create models using Jupyter Notebooks and select from popular tools like Apache Spark for developing models. While the data science workflow normally ends when the model is built and validated, it's still important to monitor the model to make sure that it stays healthy. Prometheus, another tool available in Open Data Hub, forwards the data to Grafana so data scientists can build dashboards to keep an eye on the model’s health and performance.
In her video, Zimmerman demonstrates how to build, deploy and monitor ML models using Open Data Hub. Open Data Hub can also host the model outside of the Jupyter Notebook for easy access for both the data scientist and the rest of the team, which will include software engineers or front end developers.
The tools available on Open Data Hub help data scientists like Zimmerman deploy models without having to be a front end developer or having to start a data science workflow with a model deployed through the solid operator. From data ingestion to model creation, testing, and visualization, Open Data Hub makes it easier for data scientists to do their jobs.
Open Data Hub also provides data scientists an opportunity to contribute upstream
Since the platform is open source, anybody can contribute code. Chesworth notes that what’s exciting about being a data scientist at Red Hat are “things like contributing code upstream and focusing on the hybrid and containerized in your code is highly encouraged.”
He has a recommender system and containerized that code. It's portable and can be run on his local machine, on a bare metal server, on the cloud, and on Red Hat OpenShift. He also runs it with Open Data Hub.
His code is set up in a way that it can use a CPU, a GPU or multi GPUs. Chesworth noticed that in containerizing ML and distributing, containers are built to be nimble. But because of that, there's very little shared memory space on a container. “You have to jump through quite a few hoops to increase that shared memory size,” he says.
Working with the Open Data Hub team, he submitted improvements for changing Red Hat OpenShift shared memory size across multiple GPUs. Chesworth explains, “I worked with the Open Data Hub team, and they contributed upstream to CRI-O and made a change to make it a lot easier to change your shared memory size. That change went into CRI-O 1.20, which then went into Kubernetes 1.20.”
As an open source company, many Red Hatters work to support and contribute to community projects like the Open Data Hub, which lays the foundation for our internal data science and AI platform.
A day in the life, and more
Time is valuable for data scientists. Tools available through the Open Data Hub help them do data science without also balancing the role of cloud architect or front end developer. This can open more time to solve critical business needs.
“The Open Data Hub simplifies the end-to-end machine learning workflow, and gives me the tools I need to put my model into production,” says Zimmerman.
To learn more about what a Red Hat data scientist does, we invite you to check out these two recently released videos. From AI/ML to containers, there’s even more to discover from our subject matter experts. Just stop by the Red Hat video library and have a look, and be sure to subscribe to the Red Hat channel on YouTube for more!
저자 소개
As the Managing Editor of the Red Hat Blog, Thanh Wong works with technical subject matter experts to develop and edit content for publication. She is fascinated with learning about new technologies and processes, and she's vested in sharing how they can help solve problems for enterprise environments. Outside of Red Hat, Wong hears a lot about the command line from her system administrator husband. Together, they're raising a young daughter and live in Maryland.
채널별 검색
오토메이션
기술, 팀, 인프라를 위한 IT 자동화 최신 동향
인공지능
고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트
오픈 하이브리드 클라우드
하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요
보안
환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보
엣지 컴퓨팅
엣지에서의 운영을 단순화하는 플랫폼 업데이트
인프라
세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보
애플리케이션
복잡한 애플리케이션에 대한 솔루션 더 보기
오리지널 쇼
엔터프라이즈 기술 분야의 제작자와 리더가 전하는 흥미로운 스토리
제품
- Red Hat Enterprise Linux
- Red Hat OpenShift Enterprise
- Red Hat Ansible Automation Platform
- 클라우드 서비스
- 모든 제품 보기
툴
체험, 구매 & 영업
커뮤니케이션
Red Hat 소개
Red Hat은 Linux, 클라우드, 컨테이너, 쿠버네티스 등을 포함한 글로벌 엔터프라이즈 오픈소스 솔루션 공급업체입니다. Red Hat은 코어 데이터센터에서 네트워크 엣지에 이르기까지 다양한 플랫폼과 환경에서 기업의 업무 편의성을 높여 주는 강화된 기능의 솔루션을 제공합니다.