Tips for using templating in Governance Policies - Part 2

August 29, 2023Brian Jarvis10-minute read

Red Hat Advanced Cluster Management for Kubernetes (RHACM) Governance provides an extensible framework for enterprises to introduce their own security and configuration policies and apply them to managed OpenShift or Kubernetes clusters. For more information on RHACM policies, I recommend that you read the Applying Policy-Based Governance at Scale Using Templates and Comply to standards using policy-based governance blogs.

This multi-part blog series will showcase several techniques you can apply when using templates in your RHACM Policies. In part one, I reviewed practices you can use to make your templates more readable and easier to maintain.

Part two of this series will discuss more advanced template functionality and extended use cases for using Policies to manage clusters.

Prerequisites

Review Governance Policy Templates and template functions

Validate the cluster state

Users typically view RHACM Policies as the mechanism to apply day-2 configuration to a cluster. This could be configuring authentication, creating infra nodes and configuring cluster workloads, and installing operators along with numerous other day-2 tasks. In part one of this series, I discussed using templating to make these configurations more dynamic and the policies easier to maintain.

A policy to install an Operator using the Operator Lifecycle Manager (OLM) might consist of a Namespace definition, an OperatorGroup, and a Subscription. Applying these three objects will result in OLM installing the specified Operator. Once those three objects exist, the Policy will show as status compliant. Compliance is only an indicator the objects have been created as specified, not that the Operator has successfully installed and is running.

RHACM Policies can be in an "Inform" state where you can extend the Policy to validate the state of objects in the cluster. This additional functionality opens a very powerful set of tools, setting RHACM apart from other GitOps cluster management tooling. As a cluster manager, you can ensure all components are healthy across your entire fleet of clusters by viewing the status in RHACM.

I'll review how to implement this when installing an Operator like OpenShift GitOps. In addition to the Policy to enforce creating the Subscription, you can add a Policy to verify the health of the Operator. The example below will validate the health of the Subscription, the Operator Deployment, and the ArgoCD instance itself.

apiVersion: policy.open-cluster-management.io/v1
kind: Policy
metadata:
  name: gitops-operator-health
  namespace: bry-tam-policies
spec:
  disabled: false
  policy-templates:
  - objectDefinition:
      apiVersion: policy.open-cluster-management.io/v1
      kind: ConfigurationPolicy
      metadata:
        name: gitops-operator-health
      spec:
        object-templates:
        - complianceType: musthave
          objectDefinition:
            apiVersion: operators.coreos.com/v1alpha1
            kind: Subscription
            metadata:
              labels:
                acm-policy: gitops-operator
              namespace: openshift-operators
            status:
              state: AtLatestKnown
        - complianceType: musthave
          objectDefinition:
            apiVersion: apps/v1
            kind: Deployment
            metadata:
              labels:
                olm.owner: '{{ (lookup "operators.coreos.com/v1alpha1" "Subscription" "openshift-operators" "openshift-gitops-operator").status.currentCSV }}'
              namespace: openshift-operators
            status:
              availableReplicas: 1
              conditions:
              - status: "True"
                type: Available
              readyReplicas: 1
              replicas: 1
              updatedReplicas: 1
        - complianceType: musthave
          objectDefinition:
            apiVersion: argoproj.io/v1alpha1
            kind: ArgoCD
            metadata:
              namespace: openshift-gitops
            status:
              applicationController: Running
              applicationSetController: Running
              dex: Running
              notificationsController: Running
              phase: Available
              redis: Running
              repo: Running
              server: Running
              ssoConfig: Success
        remediationAction: inform
        severity: high

You can determine the objects required were created to install the Operator and that the Operator is installed and running successfully. When combined with Policy Dependencies, you can now confirm the Operator requirements are met before creating CustomResources.

RHACM inform policies can identify other cluster issues, not just the health of day-2 configurations. Common cluster health states, such as kcs-645901, can be identified in Policies, making cluster administrators aware of potential problems before users are impacted. This example will become non-compliant if the openshift-marketplace Job or InstallPlan contain the indicated status conditions. An upcoming addition to this series will look at how to use policies to correct issues such as this automatically.

---
kind: Job
apiVersion: batch/v1
metadata:
  namespace: openshift-marketplace
status:
  conditions:
    - type: Failed
      status: 'True'
      reason: DeadlineExceeded
      message: Job was active longer than specified deadline
  failed: 1

---
apiVersion: operators.coreos.com/v1alpha1
kind: InstallPlan
metadata:
  generateName: install-
status:
  bundleLookups:
    - conditions:
        - reason: JobIncomplete
          status: 'True'
          type: BundleLookupPending
        - message: Job was active longer than specified deadline
          reason: DeadlineExceeded
          status: 'True'
          type: BundleLookupFailed
  conditions:
    - message: >-
        bundle unpacking failed. Reason: DeadlineExceeded, and Message: Job was active longer than specified deadline
      reason: InstallCheckFailed
      status: 'False'
      type: Installed
  phase: Failed

Note that for the above example, you must create a PolicyGenerator configuration.

Enabling new capabilities with object-templates-raw

A new capability was added to ConfigurationPolicies in RHACM 2.7.2 and 2.8; objects-template-raw. This new feature allows you to use if statements, assign values to variables, and make use of ranges.

All of the templating discussed to this point has been to return a string or a single value. object-templates-raw supports advanced templating use cases by allowing a policy to generate YAML string representation.

The example from part one included setting the default value for the number of replicas on the IngressController based on the number of infra nodes found. However, it did not configure the nodeSelector or tolerations to support running on the infra nodes. Consider how using raw templates allows you to solve this fully.

apiVersion: policy.open-cluster-management.io/v1
kind: ConfigurationPolicy
metadata:
  name: ingressoperator-default
spec:
  remediationAction: enforce
  severity: low
  object-templates-raw: |
    - complianceType: musthave
      objectDefinition:
        apiVersion: operator.openshift.io/v1
        kind: IngressController
        metadata:
          name: default
          namespace: openshift-ingress-operator
        spec:
          httpEmptyRequestsPolicy: Respond
    {{- $infraCount := (len (lookup "v1" "Node" "" "" "node-role.kubernetes.io/infra").items) }}
    {{- if ne $infraCount 0 }}
          nodePlacement:
            nodeSelector: 
              matchLabels:
                node-role.kubernetes.io/infra: ""
            tolerations:
            - operator: Exists
              key: node-role.kubernetes.io/infra
    {{- end }}
          replicas: {{ ($infraCount | default 2) | toInt }}

When you apply the policy to the cluster, if there are zero infra nodes ($infraCount == 0), the whole block for the spec.nodePlacement will not be part of the IngressController configuration. Once infra nodes are added to the cluster, the policy will be reevaluated, and the configuration will be updated.

The raw templating also allows you to create more advanced objects where some information processing must be completed before generating the objectDefinition. I create the multiline string for the Thanos configuration using information from the OpenShift Data Foundation configured on the cluster. The Thanos configuration is then processed and encoded to be stored in the thanos.yaml key of the secret generated by the policy.

apiVersion: policy.open-cluster-management.io/v1
kind: ConfigurationPolicy
metadata:
  name: thanos-secret
spec:
  remediationAction: enforce
  severity: high
  object-templates-raw: |
    {{- /* read the bucket data and noobaa endpoint access data */ -}}
    {{- $objBucket := (lookup "objectbucket.io/v1alpha1" "ObjectBucket" "" "obc-openshift-storage-obc-observability") }}
    {{- $awsAccess := (lookup "v1" "Secret" "openshift-storage" "noobaa-admin") }}
    {{- /* create the thanos config file as a template */ -}}
    {{- $thanosConfig := `
    type: s3
    config:
      bucket: %[1]s
      endpoint: %[2]s
      insecure: true
      access_key: %[3]s
      secret_key: %[4]s`
    }}
    }}
    {{- /* create the secret using the thanos configuration template created above. */ -}}
    - complianceType: mustonlyhave
      objectDefinition:
        apiVersion: v1
        kind: Secret
        metadata:
          name: thanos-object-storage
          namespace: open-cluster-management-observability
        type: Opaque
        data:
          thanos.yaml: {{ (printf $thanosConfig $objBucket.spec.endpoint.bucketName 
                                                $objBucket.spec.endpoint.bucketHost 
                                                ($awsAccess.data.AWS_ACCESS_KEY_ID | base64dec) 
                                                ($awsAccess.data.AWS_SECRET_ACCESS_KEY | base64dec)
                          ) | base64enc }}

Using range to generate objects in policies

The range function creates a loop on an array, slice, map, or channel. Use this feature to loop through a list of static values, a return from the lookup function, or parts of an object, such as the labels on a Deployment. Each iteration of the loop can be assigned to a variable using the format {{ range $myItem := $list }} printf $myItem.property {{ end }}, a dot (.) context variable using the format {{ range $list }} printf .property {{ end }}, or {{ range $myItem := $list }} printf $myItem.property {{ else }} printf "empty list" {{ end }}, which will execute the else if the $list is empty.

This approach can be useful for creating policies that would generate many objectDefinitions, such as creating a ConfigMap for each namespace that meets a set requirement. This example loops through all Pods in the "portworx" namespace and identifies failed pods with the name containing kvdb. Pods found matching this condition are removed from the cluster.

apiVersion: policy.open-cluster-management.io/v1
kind: ConfigurationPolicy
metadata:
  name: portworx-failed-pod-claner
spec:
  remediationAction: enforce
  severity: low
  object-templates-raw: |
    {{- /* find Portworx pods in terminated state */ -}}
    {{- range $pp := (lookup "v1" "Pod" "portworx" "").items }}
      {{- /* if the pod is blocked because it is in node shutdown we should delete the pod */ -}}
      {{- if and (eq $pp.status.phase "Failed") 
                 (contains "kvdb" $pp.metadata.name) }}
    - complianceType: mustnothave
      objectDefinition:
        apiVersion: v1
        kind: Pod
        metadata:
          name: {{ $pp.metadata.name }}
          namespace: {{ $pp.metadata.namespace }}
      {{- end }}
    {{- end }}

Expanding the earlier example checking the health of OpenShift GitOps instances, note the use of range to check all ArgoCD instances on a cluster, along with a range on a list of label selectors to validate each Deployment verifying all components are healthy and contain the expected number of replicas.

apiVersion: policy.open-cluster-management.io/v1
kind: ConfigurationPolicy
metadata:
  name: argocd-instance-status
spec:
  remediationAction: inform
  severity: high
  object-templates-raw: |
    ## Get all the ArgoCD instances we are checking health for  
    {{- range $argo := (lookup "argoproj.io/v1alpha1" "ArgoCD" "" "").items }}
      ## list all of the lookups for Argo deployments  
      {{- $selectors := list "app.kubernetes.io/name=argocd-applicationset-controller"
                            (printf "app.kubernetes.io/name=%s-dex-server" $argo.metadata.name)
                            (printf "app.kubernetes.io/name=%s-notifications-controller" $argo.metadata.name)
                            (printf "app.kubernetes.io/name=%s-redis" $argo.metadata.name)
                            (printf "app.kubernetes.io/name=%s-repo-server" $argo.metadata.name)
                            (printf "app.kubernetes.io/name=%s-server" $argo.metadata.name)
      }}
      
      ## ensure ArgoCD is reporting healthy 
    - complianceType: musthave
      objectDefinition:
        apiVersion: argoproj.io/v1alpha1
        kind: ArgoCD
        metadata:
          namespace: {{ $argo.metadata.namespace }}
        status:
          server: Running
          notificationsController: Running
          applicationController: Running
          applicationSetController: Running
          ssoConfig: Success
          repo: Running
          dex: Running
          phase: Available
          redis: Running

      ## ensure all deployments are healthy in each argo instance 
      {{- range $sel := $selectors }}
        {{- $dep := (lookup "apps/v1" "Deployment" $argo.metadata.namespace "" $sel).items }}
    - complianceType: musthave
      objectDefinition:
        kind: Deployment
        apiVersion: apps/v1
        metadata:
          namespace: {{ $argo.metadata.namespace }}
          labels:
            {{ $sel | replace "=" ": " }}
        status:
        {{- if gt (len $dep) 0 }}
          {{- $dp := (index $dep 0) }}
          replicas: {{ $dp.spec.replicas }}
          updatedReplicas: {{ $dp.spec.replicas }}
          readyReplicas: {{ $dp.spec.replicas }}
          availableReplicas: {{ $dp.spec.replicas }}
          conditions:
            - type: Available
              status: 'True'
        {{- end }}
      {{- end }}
    {{- end }}

Wrap up

In part one of this series, I outlined the use of several template functions and examples to make your templates easier to read and maintain. In part two, I looked at validating cluster health with policies and how to use the object-templates-raw to expand templates for more complex use cases.

About the author

Brian Jarvis

Browse by channel

Explore all channels

Tips for using templating in Governance Policies - Part 2

Validate the cluster state

Enabling new capabilities with object-templates-raw

Using range to generate objects in policies

Wrap up

About the author

Brian Jarvis

More like this

Browse by channel

Platforms

Tools

Try, buy, & sell

Communicate

About Red Hat

Change page language

Red Hat legal and privacy links

Red Hat legal and privacy links