You Need Ops to AIOps
Say someone offered you a million dollars or a magical penny that doubles in value for 31 days. The immediate gain is appealing, but the long-term reward is ultimately the better choice. Technology and innovation can help us create scalable and resilient systems more efficiently. And with every dollar we save, we create an opportunity to invest in our business. But without having to rely on magic, what's going to create our next wave of exponential efficiency?
00:32 — INTRO ANIMATION
We can collect and analyze data about our systems to make decisions. We can create self-healing infrastructure, and event driven automation. And we can adopt a DevOps mindset to gain agility and rapidly deliver services to customers without compromising quality. But to tap into that exponential efficiency, obviously we turn our attention to AIOps.
[Background voice] AIOps.
What? Come on, of course, it's AI that breaks. All right, let's break this down. By AIOps we mean AI plus DevOps, or the next step and how we manage our systems and the services running on them. By AI we mean intelligence and by intelligence, we mean data. Well, that's a machine, right? But we can't just start throwing data into models and expect them to be intelligent. We need clean and curated data, and not just data, but our human expertise guides the training and refinement of models to ensure they're giving the best recommendations. It's easy to get caught up in the hype of AI, but we can't approach it as a plug and play technology with immediate returns. There are a few things we really have to consider and understand to set AIOps up for success. To hear more about what we need to anticipate, let's talk to Marcel Hild. Hey Marcel, how you doing?
Hey Chris, what's up?
So there are a lot of definitions of AIOps, but we know what we're trying to achieve. It's about helping teams and systems operate better through the usage of data and data analysis, really much in the way humans do today, but leveraging AI. So what are the things we need to do just to get started?
The first thing that you need to understand is that before you can do AI, you need to Ops. So understand the problem domain really well, understand what SRE means, what SLIs, SLOs means, understand how you operate your systems. In the end, AIOps is not a product that you buy off the shelf, but it's more like a capability you build in your teams. So if you look at the data science tooling that is applied here, like baselining, finding a common base of a time series data in your monitoring data, or correlation, how does A correlate with B or predicting the future? And you use that for anomaly detection where you say, "oh, I predicted the future of this, but it didn't appear so, so it must be an anomaly." That's all table stakes for data scientists. The tools that you get are just tools. They don't come with any embedded intelligence. So you will always train those tools on the observations, on the data that you are making in your own data center.
That's an important insight. And I know we have expert systems today, leveraging automation, so that we can take events from event driven automation and do remediation in a self-healing infrastructure. But as we go to AIOps, and I feel like there's a leap there, and we've seen advances in other parts of AI to create foundational models for whole portions of AI, like GPT-3 for natural language processing, or look at image processing with ImageNet, what are we doing for IT systems?
So I think the current state of the art is that we get better at using the data at hand, that we collect in our own environment. Maybe we get more input features and we get faster and more accurate. But what is lacking is building the knowledge that is derived maybe at the vendor, maybe at some other site, and take that knowledge and make it accessible to the community, to other sites, to other customers, so that not everybody has to learn from scratch. How failure of a database looks like or how an outage of a cluster looks like. Like if you train an ImageNet model to identify cats, and now you want to identify your own cats at home, which hasn't been seen by the model yet, it will still identify this cat.
That's a great corollary. And I see open source and community collaboration as a fantastic way to build that collective knowledge and then distribute that through with even open source projects. And the transition from people doing all this work to including machine learning models and AI to help, that's a cultural shift. That's a fundamental change in how we do, people process part of any technology, which is always a hard part of the transition. So how do you see that impacting AIOps and what are the things that you look out for?
Going through every revolution, people fear that their job would be gone, but in the end, we ended up with having machines doing the chores for us and us being more in the driver's seat. And the same is true for the operational domain. So if you look at root cause analysis, the machine will not tell you actually what the root cause is, but it will still need engineers to find out the root cause. Now you have an AI which actually remembers all the thousands of cases it's solved before and actually tells you, maybe you look there and maybe the root cause is over there. So I think our life will be more fun. And you have better tools to command. I mean, who doesn't like powerful tools?
Well, having personally sat through hours of sifting through logs with grep, sed and awk to find root causes, I love the leveraging of machines to really help get work done rapidly and focus on the key areas where human creativity comes to bear. And I think it's that machine augmented human intelligence that helps us do a better job of operating systems. This has been great, Marcel. Thank you so much for your time.
It's been a pleasure.
It's important to understand that AIOps is not a replacement for DevOps. It's an evolution of operations with all the same responsibilities, but it augments what we do with automation and machine learning. It's not just about collecting more data or faster processing. It's about applying the right tools to the right problems in the right ways, and making sure that the skills and the operations teams are put to the best use. It's not the machines that are intelligent, it's humans, machine augmented human intelligence.
07:48 — OUTRO ANIMATION
Meet the guest
Senior Manager AIOps, AI CoE, Office of the CTO Red Hat
Red Hat OpenShift Data Science
Data scientists and developers can rapidly develop, train, test, and iterate ML/DL models with full support, allowing them to focus on their modeling and application development without waiting for infrastructure provisioning.Read the brief
How can developers and data scientists collaborate?
Here are the top five things you need to know when working with data scientists and building AI-driven intelligent applications.Read the checklist
More like this
Machine Learning Model Drift & MLOps Pipelines
Like houseplants, machine learning models require some attention to thrive. That's where MLOps and ML pipelines come in.
DevOps_Tear Down That Wall
As the race to deliver applications ramps up, the wall between development and operations comes crashing down. But what is DevOps, really?
How Bad Is Betting Wrong On The Future?
We speak to experts in the DevOps space about betting wrong on the future, how development projects go awry, and what teams can do to get things back on track.
Check out our podcasts
Want to hear more tales from the tech world? Red Hat’s award-winning podcasts feature remarkable stories from makers, coders, and leaders across the industry.
Presented by Red Hat
For 25 years, Red Hat has been bringing open source technologies to the enterprise. From the operating system to containers, we believe in building better technology together–and celebrating the unsung heroes who are remaking our world from the command line up.