Red Hat Advanced Cluster Management (RHACM) provides users a policy framework to check the compliance of their managed clusters, where they have the option to automatically remediate many violations. In the previously RHACM version, users could check which policies were generating violations by viewing the statuses of each policy template. Currently in RHACM 2.4, violations are collected and aggregated by a
PolicyReport, a custom resource that is created on every managed cluster. These resources were introduced in RHACM 2.3 to store violations created by the insights client, but now it also automatically pulls in governance violations as well. Policy reports can be viewed to provide a quick overview of all violations found in a managed cluster, and also produce metrics and alerts that can be used to configure those violations to be sent to incident management systems.
PolicyReport Integration with RHACM Governance
Let's take a look at a sample
PolicyReport that is created on a cluster with one policy that has a violation. In the following example, the
policy-pod policy, is configured to look for a
nginx-pod pod that is not present on the cluster. The policy is available as a template in the Specifications drop-down menu on the policy creation page:
- category: PR.PT Protective Technology
message: 'NonCompliant; violation - pods not found: [nginx-pod] in namespace default missing'
All violations on the cluster generate an item in the
results list as seen in the example. There is one entry for each non-compliant policy on the cluster; entries for compliant policies are ignored to avoid cluttering in the
PolicyReport. Let's dive deeper into the specific fields of the
category: The category specified in the original parent policy where the violation is coming from.
message: The violation that is causing the policy to be flagged as non-compliant.
policy: The name and namespace of the parent policy that generated the violation.
properties.created_at: The creation timestamp of the policy, not the time when the violation occured.
properties.total_risk: This field is used by the
PolicyReportmetrics collector to determine the severity of a policy, which can be set with the following values:
result: This field can be ignored because the
PolicyReportonly picks up policies that are generating violations, it is expected to be set to
source: This is used to distinguish whether a violation is coming from Insights (expected value is
insights) or RHACM Governance (expected value is
timestamp: The time when the violation was added to the
By default, the insights client is set to poll for violations every 30 minutes, so policy violations may take up to 30 minutes to appear in the
results field. If you want the violations for a cluster to be processed more frequently, you can specify an integer in the
POLL_INTERVAL environment variable in the insights client deployment (
Configuring Violations to be Sent to Incident Management Systems
Description of policyreport_info metric
In addition to viewing the
PolicyReport object from the CLI, the insights client exposes the items in the
results list as a metric called
policyreport_info. This metric can be viewed from the Metrics tab in the Openshift console for a cluster, and in Prometheus. View the following policy report sample:
The insights client passes the following fields to the metric:
managed_cluster_id: The name of the managed cluster where the violation is reported.
category: The category of the policy.
policy: The name of the policy that reported the violation.
result: Similar to the
resultsfield of the
PolicyReport, the parameter value is always
severity: This is mapped to the
total_riskfield in the
Integration with RHACM Observability and Incident Management Systems
The RHACM observability component is already set up to process
PolicyReports; the alert feature for
PolicyReports was initially added in RHACM 2.3. With the integration of the governance framework and
PolicyReports in 2.4, users are now able to configure alerts on policy violations that can be sent to incident management systems, like Slack. In order to alert on any violations produced by a policy, set the severity of the policy to
critical and follow the steps previously outlined in in the alerting blog to set up alerting on your cluster.
In this blog, we explored how to view governance violations in
PolicyReports, and how those PolicyReport violations generate metrics that can be picked up by an Alertmanager. With the information in this blog and alerting set up on your cluster, you can create policies and have their violations be sent to incident management systems.