Red Hat is continually innovating and part of that innovation includes researching and striving to solve the problems our customers face. That innovation is driven in part through the Office of the CTO and includes Red Hat OpenShift, Red Hat OpenShift Container Storage and use cases such as the Open Hybrid Cloud, Artificial Intelligence and Machine Learning. We recently interviewed Michael Clifford, Data Scientist in the office of the CTO, here at Red Hat about these very topics.
Your title is Data Scientist, right?
That's correct.
What's that mean in terms of working with OpenShift 4, and with the hybrid cloud?
Working in this domain is really twofold.
If we want to provide infrastructure for other companies that want to do machine learning workloads, we're working as the beta testers.
Then, on the other side of it is a question: how do we actually implement some kind of intelligence into the applications that are running on the OpenShift Container Platform?
For example, one of the main, cool features of OpenShift 4 is the automatic updates that happen. But, how do you actually know when an update is happening automatically on hundreds of thousands of servers at a time? You need some kind of intelligent automation to manage that process.
That's one of the things we worked on early, both testing out how other users would use our infrastructure — from the data scientist perspective — as well as implementing the intelligent applications that run behind some of that infrastructure.
So your role is to analyze what's happening, then come up with ways to make it less disruptive. Is that accurate?
Exactly. During an update, we say, “Oh, something strange happening during this update, let's roll back before anything breaks.”
What are some of the data science tools you use to detect that?
Basically, you're ingesting all the data from all the updates that have occurred in the past. Then the machine learning model essentially learns what it looks like when a thing is updating normally. As a new update happens we continually compare it to our model, and if something starts to really deviate in any significant way, we say, “okay, let's flag this, roll it back.”
And you're talking about hundreds of thousands of updates to monitor.
One of the things about working in the AIOps area is that even though there's a lot of data, it's sometimes not very clean data. With a lot of data science projects, people have this idea that you get a file that's very cleanly defined, and you can do your exploratory analysis and all kinds of other stuff on it. With these live, machine-generated, real-time data sets, things can be all over the place.
So the bigger challenge with this particular project is less the machine learning algorithm that's put into place, than the infrastructure required to parse the data — to get enough data that's meaningful, and convert it to a format that is actually usable and ingestible by machine learning tools.
What's generating this hard-to-manage data?
The data wasn't generated with machine learning in mind. There's a lot of post-processing and pre-processing that has to happen between capturing all this massive amount of data, turning it into a format that can actually be used for intelligence.
With that kind of data, is it harder to decide what is useful data for machine learning, and what is just something that has to be managed?
A lot of times, especially with this type of stuff, you will have to go back and talk to a subject matter expert. Like somebody who's actually working on the OpenShift 4 updates, and you can say, “This variable seems like it would be very informative. Is this something that we should use?” And they'll say, “No, this is generated by something that you're trying to predict anyways, so it'll be a circular prediction.”
I think that's just a big part of the practice of data science — a lot of looking at the data, but also talking with subject matter experts to determine the right thing to do.
Thanks Michael.
Thank you.
À propos des auteurs
Parcourir par canal
Automatisation
Les dernières nouveautés en matière d'automatisation informatique pour les technologies, les équipes et les environnements
Intelligence artificielle
Actualité sur les plateformes qui permettent aux clients d'exécuter des charges de travail d'IA sur tout type d'environnement
Cloud hybride ouvert
Découvrez comment créer un avenir flexible grâce au cloud hybride
Sécurité
Les dernières actualités sur la façon dont nous réduisons les risques dans tous les environnements et technologies
Edge computing
Actualité sur les plateformes qui simplifient les opérations en périphérie
Infrastructure
Les dernières nouveautés sur la plateforme Linux d'entreprise leader au monde
Applications
À l’intérieur de nos solutions aux défis d’application les plus difficiles
Programmes originaux
Histoires passionnantes de créateurs et de leaders de technologies d'entreprise
Produits
- Red Hat Enterprise Linux
- Red Hat OpenShift
- Red Hat Ansible Automation Platform
- Services cloud
- Voir tous les produits
Outils
- Formation et certification
- Mon compte
- Assistance client
- Ressources développeurs
- Rechercher un partenaire
- Red Hat Ecosystem Catalog
- Calculateur de valeur Red Hat
- Documentation
Essayer, acheter et vendre
Communication
- Contacter le service commercial
- Contactez notre service clientèle
- Contacter le service de formation
- Réseaux sociaux
À propos de Red Hat
Premier éditeur mondial de solutions Open Source pour les entreprises, nous fournissons des technologies Linux, cloud, de conteneurs et Kubernetes. Nous proposons des solutions stables qui aident les entreprises à jongler avec les divers environnements et plateformes, du cœur du datacenter à la périphérie du réseau.
Sélectionner une langue
Red Hat legal and privacy links
- À propos de Red Hat
- Carrières
- Événements
- Bureaux
- Contacter Red Hat
- Lire le blog Red Hat
- Diversité, équité et inclusion
- Cool Stuff Store
- Red Hat Summit