Introduction

Red Hat Advanced Cluster Management (RHACM) provides users a policy framework to check the compliance of their managed clusters, where they have the option to automatically remediate many violations. In the previously RHACM version, users could check which policies were generating violations by viewing the statuses of each policy template. Currently in RHACM 2.4, violations are collected and aggregated by a PolicyReport, a custom resource that is created on every managed cluster. These resources were introduced in RHACM 2.3 to store violations created by the insights client, but now it also automatically pulls in governance violations as well. Policy reports can be viewed to provide a quick overview of all violations found in a managed cluster, and also produce metrics and alerts that can be used to configure those violations to be sent to incident management systems.

PolicyReport Integration with RHACM Governance

Let's take a look at a sample PolicyReport that is created on a cluster with one policy that has a violation. In the following example, the policy-pod policy, is configured to look for a nginx-pod pod that is not present on the cluster. The policy is available as a template in the Specifications drop-down menu on the policy creation page:

apiVersion: wgpolicyk8s.io/v1alpha2
kind: PolicyReport
metadata:
name: local-cluster-policyreport
namespace: local-cluster
results:
- category: PR.PT Protective Technology
message: 'NonCompliant; violation - pods not found: [nginx-pod] in namespace default missing'
policy: default.policy-pod
properties:
created_at: "2021-10-07T17:37:13Z"
total_risk: "1"
result: fail
source: grc
timestamp:
nanos: 1666061482
seconds: 1633628707
scope:
kind: cluster
name: local-cluster
namespace: local-cluster
summary:
error: 0
fail: 1
pass: 0
skip: 0
warn: 0

All violations on the cluster generate an item in the results list as seen in the example. There is one entry for each non-compliant policy on the cluster; entries for compliant policies are ignored to avoid cluttering in the PolicyReport. Let's dive deeper into the specific fields of the result object:

  • category: The category specified in the original parent policy where the violation is coming from.
  • message: The violation that is causing the policy to be flagged as non-compliant.
  • policy: The name and namespace of the parent policy that generated the violation.
  • properties.created_at: The creation timestamp of the policy, not the time when the violation occured.
  • properties.total_risk: This field is used by the PolicyReport metrics collector to determine the severity of a policy, which can be set with the following values:
    • low severity: total_risk=1
    • medium severity: total_risk=2
    • high severity: total_risk=3
    • critical severity: total_risk=4
  • result: This field can be ignored because the PolicyReport only picks up policies that are generating violations, it is expected to be set to fail.
  • source: This is used to distinguish whether a violation is coming from Insights (expected value is insights) or RHACM Governance (expected value is grc).
  • timestamp: The time when the violation was added to the PolicyReport.

By default, the insights client is set to poll for violations every 30 minutes, so policy violations may take up to 30 minutes to appear in the PolicyReport results field. If you want the violations for a cluster to be processed more frequently, you can specify an integer in the POLL_INTERVAL environment variable in the insights client deployment (policyreport-xxxxx-insights-client).

Configuring Violations to be Sent to Incident Management Systems

Description of policyreport_info metric

In addition to viewing the PolicyReport object from the CLI, the insights client exposes the items in the results list as a metric called policyreport_info. This metric can be viewed from the Metrics tab in the Openshift console for a cluster, and in Prometheus. View the following policy report sample:

acmpolicyreportmetricblog

The insights client passes the following fields to the metric:

  • managed_cluster_id: The name of the managed cluster where the violation is reported.
  • category: The category of the policy.
  • policy: The name of the policy that reported the violation.
  • result: Similar to the results field of the PolicyReport, the parameter value is always fail.
  • severity: This is mapped to the total_risk field in the PolicyReport results.

Integration with RHACM Observability and Incident Management Systems

The RHACM observability component is already set up to process PolicyReports; the alert feature for PolicyReports was initially added in RHACM 2.3. With the integration of the governance framework and PolicyReports in 2.4, users are now able to configure alerts on policy violations that can be sent to incident management systems, like Slack. In order to alert on any violations produced by a policy, set the severity of the policy to critical and follow the steps previously outlined in in the alerting blog to set up alerting on your cluster.

Conclusion

In this blog, we explored how to view governance violations in PolicyReports, and how those PolicyReport violations generate metrics that can be picked up by an Alertmanager. With the information in this blog and alerting set up on your cluster, you can create policies and have their violations be sent to incident management systems.


저자 소개

UI_Icon-Red_Hat-Close-A-Black-RGB

채널별 검색

automation icon

오토메이션

기술, 팀, 인프라를 위한 IT 자동화 최신 동향

AI icon

인공지능

고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트

open hybrid cloud icon

오픈 하이브리드 클라우드

하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요

security icon

보안

환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보

edge icon

엣지 컴퓨팅

엣지에서의 운영을 단순화하는 플랫폼 업데이트

Infrastructure icon

인프라

세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보

application development icon

애플리케이션

복잡한 애플리케이션에 대한 솔루션 더 보기

Virtualization icon

가상화

온프레미스와 클라우드 환경에서 워크로드를 유연하게 운영하기 위한 엔터프라이즈 가상화의 미래