How do you know if something bad is happening in your cluster? How do you know that a node is down, an application isn’t responding, or the storage backing a PVC has “disappeared”? If your answer to any of those is “when the users tell us there’s an error”, then it may be time to reevaluate your monitoring and alerting strategy.
Fortunately, OpenShift has built-in tools for doing just this. With only a small amount of work you can ensure that you’re receiving the proper alerts and warnings so that you can, hopefully, avoid any sticky situations. This week we are joined by Brian Gottfried, from Red Hat Consulting, to focus on Alertmanager, how to configure it and how to customize the settings to avoid both too many alerts and not enough.
As always, please see the list below for additional links to specific topics, questions, and supporting materials for the episode!
If you’re interested in more streaming content, please subscribe to the OpenShift.tv streaming calendar to see the upcoming episode topics and to receive any schedule changes. If you have questions or topic suggestions for the Ask an OpenShift Admin Office Hour, please contact us via Discord, Twitter, or come join us live, Wednesdays at 11am EDT / 1500 UTC, on YouTube and Twitch.
Episode 31 recorded stream:
Use this link to jump directly to where we start talking about today’s topic.
Supporting links for today:
- A question from Twitter about disconnected installs and using an ImageContentSourcePolicy (ICSP). While the ICSP is necessary to map image locations from their original, connected, locations to the new disconnected, there are some things on the roadmap to make disconnected a better overall experience. We also talked about disconnected installs in episode 13 if you want more information.
- Another Twitter inspired topic this week: load balancers for OpenShift. There’s a number of options available for load balancing OpenShift API and Ingress traffic, the best one is the one that works for you!
Questions answered during the stream:
- Can the timezone be set for the cluster? Unfortunately not, but you can track the RFE here.
- What is the architecture and components of the OpenShift Monitoring service? The architecture diagram used can be found in the docs here. There are multiple components, including Prometheus for data export and collection, Thanos for aggregation and reduction, and when using Advanced Cluster Manager, Observatorium for historical views.
- Alert fatigue is real and you should be careful of it when configuring your system!
- How do I enable user workload monitoring? It’s done by adding a ConfigMap to the openshift-monitoring namespace, see the docs here.
- What role does Thanos play? It’s important for aggregating metrics across multiple Prometheus instances, for example when user workload monitoring is enabled in the cluster.
- What persistent storage should I use for long term data retention? Brian answers this during the stream, the docs explain how to configure persistent storage.
- Is user workload monitoring with Istio and mTLS going to be supported? This is most likely because you cannot modify alerts and monitoring in the system namespaces.
- Is there a way to get individual container metrics from a Pod with multiple containers? You would need to configure each container to expose a different metrics endpoint, then configure ServiceMonitors for each of them.
- How do I configure AlertManager to send to Mattermost? You’ll need to use a webhook via a community plugin for AlertManager or create your own.
- Is it possible to configure per-namespace alert addresses? Yes, this should be possible using labels. There’s an example here.
Sull'autore
Altri risultati simili a questo
Key considerations for 2026 planning: Insights from IDC
Red Hat and Sylva unify the future for telco cloud
Edge computing covered and diced | Technically Speaking
Ricerca per canale
Automazione
Novità sull'automazione IT di tecnologie, team e ambienti
Intelligenza artificiale
Aggiornamenti sulle piattaforme che consentono alle aziende di eseguire carichi di lavoro IA ovunque
Hybrid cloud open source
Scopri come affrontare il futuro in modo più agile grazie al cloud ibrido
Sicurezza
Le ultime novità sulle nostre soluzioni per ridurre i rischi nelle tecnologie e negli ambienti
Edge computing
Aggiornamenti sulle piattaforme che semplificano l'operatività edge
Infrastruttura
Le ultime novità sulla piattaforma Linux aziendale leader a livello mondiale
Applicazioni
Approfondimenti sulle nostre soluzioni alle sfide applicative più difficili
Virtualizzazione
Il futuro della virtualizzazione negli ambienti aziendali per i carichi di lavoro on premise o nel cloud