Abonnez-vous à notre blog

Red Hat is continually innovating and part of that innovation includes researching and striving to solve the problems our customers face. That innovation is driven in part through the Office of the CTO and includes Red Hat OpenShift, Red Hat OpenShift Container Storage and use cases such as the Open Hybrid Cloud, Artificial Intelligence and Machine Learning. We recently interviewed Michael Clifford, Data Scientist in the office of the CTO, here at Red Hat about these very topics.

Your title is Data Scientist, right?

That's correct.

What's that mean in terms of working with OpenShift 4, and with the hybrid cloud?

Working in this domain is really twofold.

If we want to provide infrastructure for other companies that want to do machine learning workloads, we're working as the beta testers.

Then, on the other side of it is a question: how do we actually implement some kind of intelligence into the applications that are running on the OpenShift Container Platform?

For example, one of the main, cool features of OpenShift 4 is the automatic updates that happen. But, how do you actually know when an update is happening automatically on hundreds of thousands of servers at a time? You need some kind of intelligent automation to manage that process.

That's one of the things we worked on early, both testing out how other users would use our infrastructure from the data scientist perspective as well as implementing the intelligent applications that run behind some of that infrastructure.

So your role is to analyze what's happening, then come up with ways to make it less disruptive. Is that accurate?

Exactly. During an update, we say, “Oh, something strange happening during this update, let's roll back before anything breaks.”

What are some of the data science tools you use to detect that?

Basically, you're ingesting all the data from all the updates that have occurred in the past. Then the machine learning model essentially learns what it looks like when a thing is updating normally. As a new update happens we continually compare it to our model, and if something starts to really deviate in any significant way, we say, “okay, let's flag this, roll it back.”

And you're talking about hundreds of thousands of updates to monitor.

One of the things about working in the AIOps area is that even though there's a lot of data, it's sometimes not very clean data. With a lot of data science projects, people have this idea that you get a file that's very cleanly defined, and you can do your exploratory analysis and all kinds of other stuff on it. With these live, machine-generated, real-time data sets, things can be all over the place.

So the bigger challenge with this particular project is less the machine learning algorithm that's put into place, than the infrastructure required to parse the data — to get enough data that's meaningful, and convert it to a format that is actually usable and ingestible by machine learning tools.

What's generating this hard-to-manage data?

The data wasn't generated with machine learning in mind. There's a lot of post-processing and pre-processing that has to happen between capturing all this massive amount of data, turning it into a format that can actually be used for intelligence.

With that kind of data, is it harder to decide what is useful data for machine learning, and what is just something that has to be managed?

A lot of times, especially with this type of stuff, you will have to go back and talk to a subject matter expert. Like somebody who's actually working on the OpenShift 4 updates, and you can say, “This variable seems like it would be very informative. Is this something that we should use?” And they'll say, “No, this is generated by something that you're trying to predict anyways, so it'll be a circular prediction.”

I think that's just a big part of the practice of data science — a lot of looking at the data, but also talking with subject matter experts to determine the right thing to do.

Thanks Michael.

Thank you.


À propos des auteurs

Parcourir par canal

automation icon

Automatisation

Les dernières actualités en matière de plateforme d'automatisation qui couvre la technologie, les équipes et les environnements

AI icon

Intelligence artificielle

Actualité sur les plateformes qui permettent aux clients d'exécuter des charges de travail d'IA sur tout type d'environnement

cloud services icon

Services cloud

En savoir plus sur notre gamme de services cloud gérés

security icon

Sécurité

Les dernières actualités sur la façon dont nous réduisons les risques dans tous les environnements et technologies

edge icon

Edge computing

Actualité sur les plateformes qui simplifient les opérations en périphérie

Infrastructure icon

Infrastructure

Les dernières nouveautés sur la plateforme Linux d'entreprise leader au monde

application development icon

Applications

À l’intérieur de nos solutions aux défis d’application les plus difficiles

Original series icon

Programmes originaux

Histoires passionnantes de créateurs et de leaders de technologies d'entreprise