피드 구독

Overview

The policy framework in Red Hat Advanced Cluster Management for Kubernetes (RHACM) is designed to help you to determine what clusters each policy is distributed to, and see what the current compliance status is on each cluster. In addition, the framework stores a limited amount of compliance history for each distributed policy, allowing you to see when the compliance may have changed. However, the limited amount of history and the format of the data can make it difficult to view compliance trends and compare them across multiple policies.

In the 2.4 release of RHACM, the framework supports the new metric, policy_governance_info on the hub cluster which is used as an additional way to record compliance and distribution information. This new metric is collected by the default OpenShift monitoring stack, which exposes Prometheus queries to view the data.

In this blog we discuss the capabilities of this new metric, and show example queries to demonstrate its usefulness.

Metric overview

When a user creates a policy in the framework, they are creating a root policy. The framework watches for root policies, as well as PlacementRules and PlacementBindings, to determine where to create propagated policies in order to distribute the policy to managed clusters. For each active policy, both root and propagated, the framework reports a metric of 0 if the policy is compliant, and 1 if it is noncompliant. The metric uses the following labels:

  • type: The values for this label can be either root or propagated.
  • policy: The name of the associated root policy.
  • policy_namespace: The namespace on the hub cluster where the root policy was defined.
  • cluster_namespace: The namespace for the cluster where the policy is distributed.

These labels and values are simple, but they enable queries that can show many things happening in the cluster that are otherwise difficult to track.

Note that if the metrics are not needed, or there are any concerns about performance or security, this feature can be turned off. Just set the DISABLE_REPORT_METRICS environment variable to true in the propagator deployment.

Visualizing New Policies

Let's imagine a situation where there are many new policies added over a period of time. Not all of the clusters in the fleet are compliant with every new policy; the goal is to have a picture of the cluster compliance over time. In particular, let's see how the total number of distributed policies have changed, compared to how many non-compliant policies are present.

Since the metric is 1 when the policy is non-compliant, let's compare with == 1 in the query to restrict the count to just the non-compliant policies. With that in mind, these queries compare the non-compliant policies with the total number of policies, over time.

  • count(policy_governance_info{type="propagated"} == 1)
  • count(policy_governance_info{type="propagated"})

In a fleet where this is simulated, you might receive the following plot:

all_policies_over_time

Notice that the number of policies regularly increased over time due to our example simulation. However, there is a sudden spike in policies around 4:45pm on the graph. What happened there? In our simulation, we added a new cluster to the fleet at that time, and existing policies were suddenly distributed to it.

Some of this information can be inferred from other data provided by the governance policy framework. However, to observe the spike without these metrics, it is required to look through all of the policies and examine many timestamps. This is tedious, to say the least, but with this graph it is very straightforward.

Watching a Specific Cluster

The previous plot displays a view of all the distributed policies in the fleet. It might be beneficial to watch a specific cluster more closely. For example, let's monitor a high-priority production cluster that must be completely compliant. The metric label cluster_namespace supports filtering the view, similar to the following queries:

  • count(policy_governance_info{cluster_namespace="cluster1"} == 1)
  • count(policy_governance_info{cluster_namespace="cluster1"})

Applying this to the simulated fleet, the following plot is displayed:

cluster_policies_over_time

Since the queries are limited to one cluster, the spike in policies as a result from the cluster being added is not displayed. You can also observe more easily that some of the policies were remediated near the end. In this simulation, the policies were manually remediated, but it is also possible to resolve them through other integrations like Ansible playbooks. Again, this information is available in other ways using the framework, but it is more tedious to compile it into one picture as mentioned.

Policy Trends

Let's simulate something more complicated: a situation where two policies are somehow interfering with each other.

Maybe to remediate one policy, an administrator is taking an action that causes another policy to become non-compliant. Then, an automated process might "fix" the newly non-compliant policy in a way that reverts the administrator's change. The administrator might notice this situation and guess that the two policies are interfering, but how can they confirm this?

The following queries count the compliance of two policies and put them on one plot:

  • count(policy_governance_info{type="root",policy="mypolicy-1"} == 0)
  • count(policy_governance_info{type="root",policy="mypolicy-2"} == 0)

interfering_policies

In this case it becomes clear that the policies are never both compliant, which indicates that there is a strong correlation between the queries. In other cases, a plot like this can help observe that a policy is always non-compliant at a certain time of day, or a specific day of the week. These trends can point to any further investigation in the right direction.

Conclusion

The policy_governance_info metric from the policy framework (added in RHACM 2.4) provides a new way to view policy compliance and distribution in a fleet of clusters. The data that it provides can be used to examine compliance trends in a way that was not previously possible. By enabling queries to filter by cluster, policy, and compliance, questions that users have about the cluster compliance can be answered more quickly than ever. For more information about the policy_governance_info, see Governance metric from the product documentation. We hope that this blog is helpful and enables users to view a multitude of things happening in their clusters.


저자 소개

UI_Icon-Red_Hat-Close-A-Black-RGB

채널별 검색

automation icon

오토메이션

기술, 팀, 인프라를 위한 IT 자동화 최신 동향

AI icon

인공지능

고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트

open hybrid cloud icon

오픈 하이브리드 클라우드

하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요

security icon

보안

환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보

edge icon

엣지 컴퓨팅

엣지에서의 운영을 단순화하는 플랫폼 업데이트

Infrastructure icon

인프라

세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보

application development icon

애플리케이션

복잡한 애플리케이션에 대한 솔루션 더 보기

Original series icon

오리지널 쇼

엔터프라이즈 기술 분야의 제작자와 리더가 전하는 흥미로운 스토리