How do you know if something bad is happening in your cluster? How do you know that a node is down, an application isn’t responding, or the storage backing a PVC has “disappeared”? If your answer to any of those is “when the users tell us there’s an error”, then it may be time to reevaluate your monitoring and alerting strategy.

Fortunately, OpenShift has built-in tools for doing just this. With only a small amount of work you can ensure that you’re receiving the proper alerts and warnings so that you can, hopefully, avoid any sticky situations. This week we are joined by Brian Gottfried, from Red Hat Consulting, to focus on Alertmanager, how to configure it and how to customize the settings to avoid both too many alerts and not enough.

As always, please see the list below for additional links to specific topics, questions, and supporting materials for the episode!

If you’re interested in more streaming content, please subscribe to the OpenShift.tv streaming calendar to see the upcoming episode topics and to receive any schedule changes. If you have questions or topic suggestions for the Ask an OpenShift Admin Office Hour, please contact us via Discord, Twitter, or come join us live, Wednesdays at 11am EDT / 1500 UTC, on YouTube and Twitch.

Episode 31 recorded stream:

 

 

Use this link to jump directly to where we start talking about today’s topic.

Supporting links for today:

Questions answered during the stream:


关于作者

UI_Icon-Red_Hat-Close-A-Black-RGB

按频道浏览

automation icon

自动化

有关技术、团队和环境 IT 自动化的最新信息

AI icon

人工智能

平台更新使客户可以在任何地方运行人工智能工作负载

open hybrid cloud icon

开放混合云

了解我们如何利用混合云构建更灵活的未来

security icon

安全防护

有关我们如何跨环境和技术减少风险的最新信息

edge icon

边缘计算

简化边缘运维的平台更新

Infrastructure icon

基础架构

全球领先企业 Linux 平台的最新动态

application development icon

应用领域

我们针对最严峻的应用挑战的解决方案

Virtualization icon

虚拟化

适用于您的本地或跨云工作负载的企业虚拟化的未来