Introduction

GitOps is the combination of two winning approaches: infrastructure as code and Git-based workflows to accept configuration changes and deploy them. In Kubernetes, GitOps generally is implemented with a set of manifests (or manifest templates) stored in a Git repository and a Kubernetes operator that deploys these manifests into a Kubernetes cluster. At the time of writing, the two most popular operators in this space are Argo CD and Flux (see also this article for an in depth comparison).

I have been using Argo CD recently both with my customers as well as for my personal experiments and I believe that GitOps is definitely an approach that everyone working with Kubernetes should adopt.

That being said, there are two limitations that I have found that inhibited me from building robust, reusable Git workflows.

The first limitation is that, as a de facto rule, GitOps operators can only manipulate resources that they create. They will not change pre-existing resources. However, any Kubernetes distribution will most likely include a significant amount of pre-configured settings and components. As consumers of these environments, we typically need to perform some customizations on these pre-existing components to suit our needs. This is certainly true for OpenShift, which, as a sophisticated Kubernetes distribution, includes a vast number of additional features and configuration options.

The second limitation is that it is hard to create configurations that depend on the current state of a cluster. Existing GitOps operators expect all of the parameters to be passed from the source repositories without the ability to discover values based on the state of existing objects within the cluster (this is not fully accurate in the case of Flux and Helm, because Flux supports using the Helm lookup function). One such example that highlights this issue is that most Kubernetes clusters (including OpenShift) have the concept of a base domain (the domain that all FQDNs of endpoints exposed by the cluster share). This naturally becomes a parameter that would be needed across a variety of cluster and user level configurations, including applications deployed to the cluster.

Templated Patches

It turns out that by using templated patches, we can solve both of the limitations of the current GitOps operator implementations called out previously.

Templated patches are patches whose actual value is realized by merging a template with a set of  parameters. Consider, for example, this patch:

spec:

  base_domain: {{ (lookup "config.openshift.io/v1" "DNS" "" "cluster").spec.baseDomain }}

Here we use the golang template notation and processing engine. This patch sets the spec.base_domain field of a fictitious object to the current value of the base domain for an OpenShift cluster, which can be found in a cluster level object named cluster of type DNS and apiVersion  config.openshift.io/v1. We are using the lookup function to retrieve the value from a field within an existing object from a running cluster.

With this primitive we can solve both of the issues stated above. If we have a way to declare the intention to make a patch to an existing object, then we can use a templated patch to describe what kind of change we need.

The Patch Operator

We created the patch-operator to support both the above use cases. Instructions on how the usage including the installation can be found within the project repository.

While the patch-operator supports a number of use cases, l start off by discussing creation time injection.

Creation time injection

This operator allows you to install a mutating webhook to intercept the creation of any object type and apply a patch to it. Support is also available to manage resource updates as well, but the overall use case described favors patching at creation time. Instructions on how to create the webhook can be found within the operator documentation.

This feature, like most, can be best illustrated with an example: Let’s say we want to configure a cert-manager Issuer, a Kubernetes Custom resource included as part of the cert-manager solution which represents Certificate Authorities that are able to generate signed certificates, to integrate with Let’s encrypt. This is handy for demonstration and production scenarios as it will provide certificates from trusted Certificate Authority to be exposed by your cluster.  

The following object represents the resource that we need to create:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-issuer
spec:
acme:
  server: 'https://acme-v02.api.letsencrypt.org/directory'
  email: user@example.com
  privateKeySecretRef:
    name: letsencrypt-staging
  solvers:  
  - dns01:
      route53:
        accessKeyID: << access_key >>
        secretAccessKeySecretRef:
          name: cert-manager-dns-credentials
          key: aws_secret_access_key
        region: << region >>
        hostedZoneID: << hosted_zone_id >>

As you can see, three of the fields depend on the current configuration of the cluster, and in most cases, cannot be known beforehand.

The value of these fields can be discovered from the running cluster and injected with a templated patch.

The patch-operator creation time injection can be activated by an annotation, so an updated manifest would be as follows:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-issuer
namespace: {{ .Release.Namespace }}
annotations:
  "redhat-cop.redhat.io/patch": |
    spec:
      acme:
      - dns01:
          route53:
            accessKeyID: {{ (lookup "v1" "Secret" .metadata.namespace "cert-manager-dns-credentials").data.aws_access_key_id | b64dec }}
            secretAccessKeySecretRef:
              name: cert-manager-dns-credentials
              key: aws_secret_access_key
            region: {{ (lookup "config.openshift.io/v1" "Infrastructure" "" "cluster").status.platformStatus.aws.region }}
            hostedZoneID: {{ (lookup "config.openshift.io/v1" "DNS" "" "cluster").spec.publicZone.id }}
spec:
acme:
  server: 'https://acme-v02.api.letsencrypt.org/directory'
  email: {{ .Values.letsencrypt.email }}
  privateKeySecretRef:
    name: letsencrypt-staging
  solvers:  
  - dns01:
      route53:
        accessKeyID: dummy
        secretAccessKeySecretRef:
          name: cert-manager-dns-credentials
          key: aws_secret_access_key
        region: dummy
        hostedZoneID: dummy

The mutating webhook detects the presence of the injection annotation redhat-cop.redhat.io/patch on the incoming object, performs the resolution of the templated patch, including using the lookup function, it patches the object and then it returns the mutated object.

With this approach, we unlock the capability to reuse the same resource across any cluster and can store this manifest in our GitOps repository.

Patch enforcement

There are situations in which injection at creation time might not be enough. This might be because the resources we need values from may not exist yet, or because the injected values might change over time, or because someone might inadvertently change the values in the patched object to something different from what we need. In either of these cases, as administrators, we need to make sure that our patch is always enforced.

To address these use cases, the patch-operator provides a way to declaratively express the need for a patch that will be permanently enforced. This is accomplished via the Patch CRD.

Let’s take a look at a few examples.

The Patch CRD allows for declaring the source objects (from which we need to look up values) and target object (to which we need to apply the patch), the patch template, and a few more fields to govern how the patch is handled (see the reference in the documentation for a field by field explanation). Here is an example:

apiVersion: redhatcop.redhat.io/v1alpha1
kind: Patch
metadata:
name: test-complex-patch
spec:
patches:
 test-complex-patch:
   targetObjectRef:
     apiVersion: v1
     kind: ServiceAccount
     name: test
     namespace: test-patch-operator
   patchTemplate: |
     metadata:
       annotations:
         {{ (index . 1).metadata.name }}: {{ (index . 2).metadata.name }}  
   sourceObjectRefs:
   - apiVersion: v1
     kind: Namespace
     name: test-patch-operator
   - apiVersion: v1
     kind: ServiceAccount
     name: default
     namespace: test-patch-operator

This simple test patch adds an annotation on the test Service Account located in the test-patch-namespace namespace. The annotation key and value are respectively sourced from the test-patch-namespace namespace name and the name from the default service account.

When this patch is created, the patch-operator watches the source and target objects, and if and when any of them changes over time, the patch is reapplied, thus enforcing the desired state.

Let’s go another step forward and see some real world use case scenarios

Injecting the cluster default certificate

Often when configuring OpenShift, we need to replace the default certificate within the ingress controller (a pre-existing resource). The details associated with the certificate can be injected using the following Patch:

apiVersion: redhatcop.redhat.io/v1alpha1
kind: Patch
metadata:
name: letsencrypt-ingress-operator
namespace: openshift-config
spec:
patches:
 letsencrypt-ingress-operator-patch:
   targetObjectRef:
     apiVersion: operator.openshift.io/v1
     kind: IngressController
     name: default
     namespace: openshift-ingress-operator
   patchTemplate: |
     spec:
       defaultCertificate:
         name: lets-encrypt-certs-tls

This configuration alone may not be enough to properly configure the ingress controller certificate. This is due to the fact that the ingress controller expects the certificate and private key to be stored in a secret containing the keys. In fact, the ingress controller (weirdly) expects the cert and key at the secret keys: cert and key respectively. But the convention (which is followed by cert-manager) in Kubernetes is to generate TLS secrets in which the secret keys are: tls.crt and tls.key respectively. We can the use another patch to solve this challenge:

apiVersion: redhatcop.redhat.io/v1alpha1
kind: Patch
metadata:
name: letsencrypt-certs
namespace: openshift-config
spec:
patch:
 targetObjectRef:
   apiVersion: v1
   kind: Secret
   name: lets-encrypt-certs-tls
   namespace: openshift-ingress
 patchTemplate: |
   data:
     cert: {{ (index (index . 0).data "tls.crt") }}
     key: {{ (index (index . 0).data "tls.key") }}
 patchType: application/merge-patch+json

Notice that we are patching the secret referenced within the IngressController resource by looking up fields from the secret itself and applying the content to the required keys. Since the operator is constantly enforcing the patch, when the certificate is rotated, the newly patched keys will also be updated.

Service account pull secret

Often, when using an enterprise container registry, pull secrets need to be added to all the tenant namespaces so that they can properly retrieve content from it. While we could ask the tenant to specify the pull secret in every single manifest with a reference to this registry, a better alternative would be to bind the pull secret to the default Service Account such that it is available automatically. In order to accomplish this task, we need to patch the default service account in every tenant namespace. The default service account is generated by the Kubernetes control plane, so it is a resource that is not owned by the GitOps operator.

With the patch-operator, we can configure a patch that patches multiple objects at the same time. The above used case can be modeled with the following patch:

apiVersion: redhatcop.redhat.io/v1alpha1
kind: Patch
metadata:
name: puller-secret-service-account-patch
namespace: openshift-config
spec:
patch:
 targetObjectRef:
   apiVersion: v1
   kind: ServiceAccount
   name: default
 sourceObjectRefs:
 - apiVersion: v1
   kind: Namespace
   name: '{{ .metadata.namespace }}"
 # gives gch-puller to all default service accounts in namespaces with the app label.  
 patchTemplate: |
   imagePullSecrets:
     {{- if and (hasKey (index . 1).metadata.labels "app") (not (has (dict "name" "ghcr-puller") (index . 0).imagePullSecrets)) }}
   {{ append (index . 0).imagePullSecrets (dict "name" "ghcr-puller") | toYaml | indent 2 }}
   {{- else }}
   {{ (index . 0).imagePullSecrets | toYaml | indent 2 }}
   {{- end }}

This patch will be applied to all of the default service accounts from namespaces that have the label app present regardless of the value. By targeting a specific label, it provides a sort of marker signifying the fact that this namespace is a tenant namespace and should have the desired configuration injected.

This patch adds a secret called ghcr-puller to the list of the pull secrets automatically available to resources using this service account. As part of this scenario, we assume that the ghcr-puller-secret secret was provisioned previously to all the required namespaces. In addition, there is logic included in this patch to check whether the ghcr-puller element exists already in the array of pull secrets as it is challenging to patch arrays in JSON formatted documents.

The documentation has information on how to use patches that target multiple objects.

Conclusion

As stated at the onset, the primary reason why the patch-operator was built is to enhance the experience of using GitOps methodologies and GitOps operators to manage Kubernetes clusters. Several use cases in which the patch-operator has been useful in my workflows have been demonstrated within this article. It is the hope that you will find this operator useful as well.


執筆者紹介

Raffaele is a full-stack enterprise architect with 20+ years of experience. Raffaele started his career in Italy as a Java Architect then gradually moved to Integration Architect and then Enterprise Architect. Later he moved to the United States to eventually become an OpenShift Architect for Red Hat consulting services, acquiring, in the process, knowledge of the infrastructure side of IT.

Read full bio