Red Hat Insights powers a number of services that improve your operational, business and security experiences with Red Hat products, including Red Hat OpenShift. In this article, we’ll discuss a newly available preview feature available within Insights Advisor that helps provide a safer upgrade of your OpenShift fleet.
Managing upgrades in complex production Kubernetes environments is a challenging task. Over 60 independently-working components usually form the infrastructure of such environments. Each component has a different operational state and configuration that may cause minor and major version upgrades to fail.
The feature utilizes Machine Learning (ML) concepts to compare the last two hours of a cluster’s state against the known history of failed upgrade conditions observed in the fleet of all clusters connected to Red Hat via the OpenShift Remote Health monitoring feature.
Upgrade Risks will show you a checklist of known risks present in your cluster including failing operator conditions, alerts and other metrics, and also provide instructions on how to remove these blockers on your path to a smoother upgrade.
How does the feature work under the hood?
The Upgrade Risks feature uses data sent to Red Hat via Remote Health monitoring feature. This data is ingested by Prometheus in a cleaned up form, limiting the data set to only the necessary information which is used for continuous real time ML model training.
The validated model is then used to display most recent results to Red Hat customers.
Here's a high level architecture diagram:
The information includes blocking conditions for your upgrade as well as proposed actions, and is available in Insights Advisor. The same information is also available to Red Hat Technical Account Managers who can assist you with upgrades to a larger fleet, including planning remediation steps for blocking conditions.
What data does Red Hat and IBM Research use?
The dataset is built joining three sources:
- The upgrades attempted by all the connected clusters and their outcome (succeeded or failed).
- The alerts fired by Operators and
- Failing Operator Conditions (FOCs)1 firing clusters right before these attempts, including cluster version (we are mostly interested in y-version2 such as 4.10, 4.11, etc.)
1. FOCs are OpenShift operators reporting not being available or being degraded. More info about operator conditions can be found here.
2. Given that for OpenShift we use x.y.z to refer to a version, y-version or y-upgrade would be 4.10 or 4.11 for example. Z-version would be 4.10.31 or 4.11.2.
What should you do?
We recommend using the Upgrade Risks feature to generate a checklist of things to fix before any cluster upgrade.
As mentioned above, this feature is available as a preview in Insights Advisor on Hybrid Cloud Console. If you’re running your clusters connected to Red Hat infrastructure, the Red Hat Insights services are all available as a part of your subscription. This feature is automatically available, and you can start using it today.
Please send us your suggestions through the feedback form within Insights (you'll see it in the screenshot at the top of the screen on the lower right - the purple feedback button).
About the author
Red Hatter since 2010, Dosek's professional career started with virtualization technologies and transformed via variety of roles at Red Hat through to hybrid cloud. His focus is at improving product experience with assistance of Red Hat Insights.