Artificial intelligence (AI) and machine learning (ML) drive much of the world around us, from the apps on our phones to electric cars on the highway. Allowing such things to run as accurately as possible takes huge amounts of data to be collected and understood. At the helm of that critical information are data scientists. So, what’s a day on the job look like for data scientists at Red Hat?
Don Chesworth, Principal Data Scientist, gives you a glimpse into his day-to-day in a short video (aptly named "A Day in the Life of a Red Hat Data Scientist") that’s now available on our website. Isabel Zimmerman, Data Science Intern, provides a look at some of the tools she uses on the job in "Using Open Data Hub as a Red Hat Data Scientist." We’ll cover some of the highlights in this post.
Data scientists turn data into business insights
It’s been nearly a decade since Harvard Business Analytics identified data science as one of the hottest jobs of the 21st century, and the technology supporting people in this role has come a long way. Data scientists not only had to come to the table with an innate curiosity, but they had to also "fashion their own tools" to analyze data and visualize it for stakeholders.
Today, tools available in the Open Data Hub and Red Hat OpenShift help data experts focus on understanding and analyzing data instead of managing infrastructure.
Zimmerman explains that a data scientist isn't just someone who trains models, they also turn data into business insights. "Businesses don't have a one-size-fits-all method for machine learning systems," she says.
"A well architected model may be useful for getting insights into data, but oftentimes in order to gain business value, models have to be deployed as part of a larger intelligent application that's constantly learning from data and making inferences on dynamic data streams."
Data scientists can find a one-stop, end-to-end platform with Open Data Hub
Open Data Hub is an AI/ML platform that brings together different open source AI tools into a one-stop install. The click of a button starts Red Hat OpenShift with the Open Data Hub Operator already installed.
Within the platform, data scientists can create models using Jupyter Notebooks and select from popular tools like Apache Spark for developing models. While the data science workflow normally ends when the model is built and validated, it's still important to monitor the model to make sure that it stays healthy. Prometheus, another tool available in Open Data Hub, forwards the data to Grafana so data scientists can build dashboards to keep an eye on the model’s health and performance.
In her video, Zimmerman demonstrates how to build, deploy and monitor ML models using Open Data Hub. Open Data Hub can also host the model outside of the Jupyter Notebook for easy access for both the data scientist and the rest of the team, which will include software engineers or front end developers.
The tools available on Open Data Hub help data scientists like Zimmerman deploy models without having to be a front end developer or having to start a data science workflow with a model deployed through the solid operator. From data ingestion to model creation, testing, and visualization, Open Data Hub makes it easier for data scientists to do their jobs.
Open Data Hub also provides data scientists an opportunity to contribute upstream
Since the platform is open source, anybody can contribute code. Chesworth notes that what’s exciting about being a data scientist at Red Hat are "things like contributing code upstream and focusing on the hybrid and containerized in your code is highly encouraged."
He has a recommender system and containerized that code. It's portable and can be run on his local machine, on a bare metal server, on the cloud, and on Red Hat OpenShift. He also runs it with Open Data Hub.
His code is set up in a way that it can use a CPU, a GPU or multi GPUs. Chesworth noticed that in containerizing ML and distributing, containers are built to be nimble. But because of that, there's very little shared memory space on a container. "You have to jump through quite a few hoops to increase that shared memory size," he says.
Working with the Open Data Hub team, he submitted improvements for changing Red Hat OpenShift shared memory size across multiple GPUs. Chesworth explains, "I worked with the Open Data Hub team, and they contributed upstream to CRI-O and made a change to make it a lot easier to change your shared memory size. That change went into CRI-O 1.20, which then went into Kubernetes 1.20."
As an open source company, many Red Hatters work to support and contribute to community projects like the Open Data Hub, which lays the foundation for our internal data science and AI platform.
A day in the life, and more
Time is valuable for data scientists. Tools available through the Open Data Hub help them do data science without also balancing the role of cloud architect or front end developer. This can open more time to solve critical business needs.
"The Open Data Hub simplifies the end-to-end machine learning workflow, and gives me the tools I need to put my model into production," says Zimmerman.
To learn more about what a Red Hat data scientist does, we invite you to check out these two recently released videos. From AI/ML to containers, there’s even more to discover from our subject matter experts. Just stop by the Red Hat video library and have a look, and be sure to subscribe to the Red Hat channel on YouTube for more!
About the author
As the Managing Editor of the Red Hat Blog, Thanh Wong works with technical subject matter experts to develop and edit content for publication. She is fascinated with learning about new technologies and processes, and she's vested in sharing how they can help solve problems for enterprise environments. Outside of Red Hat, Wong hears a lot about the command line from her system administrator husband. Together, they're raising a young daughter and live in Maryland.