Machine Learning Model Drift & MLOps Pipelines
Say you're a ride-sharing company and you wanna create an ML model to get an idea of how many people you'll be serving on a given day. You start collecting data on January 1st. You pull 100 days worth of data and train your model on that. After 300 days, you triple your data set, and then you make a prediction for December 31st. That should be your most optimized model. More data always means a better model, right? Not necessarily.
00:27 — INTRO ANIMATION
When it comes to training machine learning models, it doesn't matter that you've tripled your data if you missed the context of seasonal usage patterns. It would be like trying to predict how much snow you're gonna have in the winter by looking at how much snow fell in the summer. Model Drift, which is often referred to as model decay, is the gradual loss of a model's prediction power due to changes in context. Data can and will change over time. The accuracy of a machine learning model is at its best when the training data matches real-world data. The model deteriorates as the world it is trained to predict, changes. Now, think about operating ML models at the edge. When you have a large amount of anything, it's hard to keep track of everything. With edge devices, you may have a model deployed thousands of times. You have to update and redeploy thousands of instances and things quickly become complicated. We have software pipelines to operate at scale. ML systems are software systems, and to realize their full potential, we need ML pipelines. To talk about the implications of model drift and operating machine learning at scale, let's talk with Kavitha Prasad from Intel. Hey, Kavitha! Thanks for chatting with me today.
Hey, Chris, it's great to be here and thanks for this opportunity, appreciate it.
Well, I've been thinking about the way we train machine learning models. And when we talk about the challenge of model drift, I've learned a lot about how we improve software with thinking about the changes in source code, the build process and the deployment process with DevOps. What benefits are we getting from applying life cycle management like that to machine learning?
So, the MLOps is in a lot of ways similar to DevOps Chris, but there are few changes that needs to be kept in mind, right? Like besides just the code versioning that you talked about, you need a place to save the data and also save the model versions because that is different from DevOps.The other key thing is, your software code does not degrade like your machine learning models degrade. So you have to keep that into account, which is again, slightly different from DevOps. So by deploying these MLOps, there are a lot of advantages, right? Like, you can have the data scientist focus more on model development, while the IT infrastructure, or the IT resources, can focus on going and deploying this because this is a fixed, this is an MLOps pipeline. And quality of predictions also improve because now you have, they've brought in the concepts of retraining into this entire system. So your quality of predictions improve. And last but not the least, the customer ease of use is very, very critical so that this MLOps makes the development to deployment much easier and also takes care of the data and model validation and evaluation of its performance in production, and where you have deployed. So you get meaningful insights, so there's a lot of advantages to these MLOps being deployed in the AI use cases.
That makes sense, and as you were saying that I was thinking, well I still see some connections, for example data as source models as what you deploy. And I feel like this becomes really important as we scale out and start thinking about deployment of machine learning models to the edge, where inference is happening. So maybe, do you see more ML use cases moving to the edge or is that just in Chris's fantasy land?
Oh, no, it is, it is absolutely moving to the edge Chris. It is going to be a hybrid world where your ML or AI is going to be deployed, you know, developed in cloud, deployed on the edge, monitored on the edge, and maintained on the cloud. Or it could be a complete cloud native development and deployment. So there are so many scenarios, but to your point, there are a lot of factors that are actually driving the ML inference to the edge. Because one, think about few conditions, right? Number one is the amount of data that is getting generated is huge, and 5G is actually enabling a lot of data that needs to get generated. Because every sensor, every cell phone, every device is generating a lot of data. And it's not easy to get all the data to the cloud and then do your inference there and send it back because that's a huge cost. Two, there are applications that are latency sensitive. You wanna make sure you are able to make the decisions very close to where the data is getting generated. There is a need of IT, OT convergence where you are actually making certain decisions on how a robot moves or how, you know, the sorting and picking happens, which is your OT, now you need to combine that with IT. And the last but the not least is the importance with regards to the privacy and security of data that is getting generated right? In that context, you wanna do the inference on the edge, not just inference, you wanna implement concepts like federated learning on the edge, where you're retraining on the data that is generated on the edge and only send the necessary data back to the cloud, so you can retain the larger models. But to your point, a lot of workflows are moving to the edge. So you will see this hybrid platform going forward, which is what we call the edge to cloud continuum that is gonna continue to happen.
I love that you're drawing in sort of future thinking around not just the deployment model, but federated learning and I know tools like homomorphic encryption can become really important here, as we think about the next generation of generating insights from data. And I'm struck by this one thought that you mentioned around the difference between machine learning and model drift, and the more static nature of software. I thought, well, maybe there's one place where bit rot is real, the Y2K bug. And we had real change in the environment that meant software actually had to change to match the new environment. And, you know, maybe that was the beginning of DevOps for us. But I really appreciate your time today Kavitha, just giving us the sense of, where model drift happens, why it happens, how we can manage that and then the importance of inference at the edge. Thank you.
Thank you so much Chris.
You can't expect to train an ML model and then just walk away from it. Especially when models are deployed into the physical world where conditions can change rapidly. And as we expand the use of predictive models using real time data, the ability to monitor models for drift and manage your data and ML pipeline by evolving DevOps practices to MLOps, s fundamental to success. Especially, when you consider distributed deployments all the way to the edge.
07:38 — OUTRO ANIMATION
Meet the guests
Vice President & General ManagerDatacenter, AI, andCloud Execution and Strategy Intel
What is edge machine learning?
Edge machine learning refers to the process of running machine learning (ML) models on an edge device to collect, process, and recognize patterns within collections of raw data.Read the article
Red Hat AI/ML and MLOps Customer Success Stories
Today, unlocking the business power of AI/ML is no longer an item on the future roadmap. It is a differentiator that can make the difference here and now.Read the blog
More like this
Building a Foundation for AI Models
If training AI/ML models didn't demand so much time and data, more organizations could use them. That's why foundation models could be game changers.
Bringing Deep Learning to Enterprise Applications
To realize the power of AI/ML in enterprise environments, users need an inference engine to run on their hardware. Two open toolkits from Intel do precisely that.
How Do Roads Become Smarter?
Smart road technology can make travel safer, easier, and more efficient. But how can it make travel enjoyable?
Check out our podcasts
Want to hear more tales from the tech world? Red Hat’s award-winning podcasts feature remarkable stories from makers, coders, and leaders across the industry.
Presented by Red Hat
For 25 years, Red Hat has been bringing open source technologies to the enterprise. From the operating system to containers, we believe in building better technology together–and celebrating the unsung heroes who are remaking our world from the command line up.