订阅内容

Topology Spread Constraints

OpenShift Monitoring is a platform for monitoring and observability that is built on top of the Kubernetes container orchestration platform. It provides a comprehensive set of monitoring and alerting capabilities that allow you to monitor the health and performance of your applications running on OpenShift.

Since OpenShift 4.10, the monitoring component replicas are deployed with hard anti-affinity. This avoids the risk of a single node outage disrupting the cluster's monitoring functionality.

In OpenShift Monitoring 4.12, users have the ability to specify topology spread constraints for Prometheus, Alertmanager, and Thanos Ruler in addition to the existing hard anti-affinity settings. Topology spread constraints allow you to specify more complex rules that control the placement of these components on your cluster. For example, you might want to ensure that Prometheus instances are distributed across different failure domains in your cluster to further reduce the risk of a single point of failure. You can specify topology spread constraints using the openshift_monitoring_prometheus_topology_spread_constraints, openshift_monitoring_alertmanager_topology_spread_constraints, and openshift_monitoring_thanos_ruler_topology_spread_constraints variables in your OpenShift Monitoring installation configuration.

Overall, the ability to specify topology spread constraints can help improve the resiliency and availability of your monitoring and alerting infrastructure.

By using topology spread constraints, you can control the placement of pods across your cluster in order to achieve various goals. For example, you can use topology spread constraints to distribute pods evenly across different failure domains (such as zones or regions) in order to reduce the risk of a single point of failure. This can improve the resiliency of your applications and infrastructure.

Topology spread constraints can also be useful for improving network latency in certain scenarios. For example, if you have applications that need to communicate with each other over long distances, you can use topology spread constraints to ensure that the relevant pods are placed in the same zone or region in order to minimize network latency.

Overall, topology spread constraints provide you with a powerful tool for controlling the placement of pods within your cluster, which can help you optimize the performance and reliability of your applications.

Affinity

In OpenShift Observability, you can use affinity and topology constraints to control the placement of pods within your cluster. This can help you optimize the performance and reliability of your applications.

The central element of a topology spread constraint definition is the topology key. The topology key is a node label that associates a node with a particular facet of a cluster's topology. We recommend using well-known label names such as kubernetes.io/hostname and topology.kubernetes.io/region but any label will work. All nodes that have the same value for a particular topology key are considered to be in the same domain.

The label selector field specifies which existing pods are to be considered when a new pod should be scheduled. Other than that, only two more details must be specified: What should the scheduler do if it can not satisfy the constraints (whenUnsatisfiable) and whether the scheduler should tolerate any imbalance (maxSkew).

# Be sure that Alertmanager instances are evenly distributed across two failure domains (e.g., two different zones)
openshift_monitoring_alertmanager_topology_spread_constraints:

- topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
maxSkew: 1
labelSelector:
  matchExpressions:
  - key: app
    operator: In
    values:
    - alertmanager

# Be sure that Thanos Ruler instances are evenly distributed across three failure domains (e.g., three different regions)
openshift_monitoring_thanos_ruler_topology_spread_constraints:

- topologyKey: topology.kubernetes.io/region
whenUnsatisfiable: DoNotSchedule
maxSkew: 1
labelSelector:
  matchExpressions:
  - key: app
    operator: In
    values:
    - thanos-ruler

 

In these examples, the topologyKey field specifies the infrastructure level at which the topology spread constraint is applied (e.g., hostname, zone, region). The whenUnsatisfiable field specifies what should happen when it is not possible to satisfy the topology spread constraint (e.g., DoNotSchedule means that the pod should not be scheduled if the constraint cannot be satisfied). The maxSkew field specifies the maximum allowed imbalance between the number of pods scheduled in each topology. Finally, the labelSelector field specifies a label selector that is used to select the pods that the topology spread constraint should apply to.

Other updates for OpenShift Monitoring 4.12

In OpenShift Monitoring 4.12, admins have the ability to create new alerting rules based on platform metrics. This feature is available in Tech Preview, which means that it is still under development and may change in future releases.

Having the ability to create alerting rules based on platform metrics can be very useful for improving the management of alert rules. It allows admins to set up alerts that are triggered by specific metric values, which can help them detect and troubleshoot issues more quickly. This can be especially useful for monitoring the health and performance of applications running on OpenShift.

For more information check out the OpenShift Platform 4.12 release notes


关于作者

Roger Florén, a dynamic and forward-thinking leader, currently serves as the Principal Product Manager at Red Hat, specializing in Observability. His journey in the tech industry is marked by high performance and ambition, transitioning from a senior developer role to a principal product manager. With a strong foundation in technical skills, Roger is constantly driven by curiosity and innovation. At Red Hat, Roger leads the Observability platform team, working closely with in-cluster monitoring teams and contributing to the development of products like Prometheus, AlertManager, Thanos and Observatorium. His expertise extends to coaching, product strategy, interpersonal skills, technical design, IT strategy and agile project management.

Read full bio
UI_Icon-Red_Hat-Close-A-Black-RGB

按频道浏览

automation icon

自动化

有关技术、团队和环境 IT 自动化的最新信息

AI icon

人工智能

平台更新使客户可以在任何地方运行人工智能工作负载

open hybrid cloud icon

开放混合云

了解我们如何利用混合云构建更灵活的未来

security icon

安全防护

有关我们如何跨环境和技术减少风险的最新信息

edge icon

边缘计算

简化边缘运维的平台更新

Infrastructure icon

基础架构

全球领先企业 Linux 平台的最新动态

application development icon

应用领域

我们针对最严峻的应用挑战的解决方案

Original series icon

原创节目

关于企业技术领域的创客和领导者们有趣的故事