Achieving Operational Resilience with OpenShift Platform Plus

September 8, 2023Jeroen Wilms18-minute read

Operational resilience is becoming more of a boardroom concern especially so for organizations operating in industries deemed as essential by governements for the functioning of society. Organizations operating IT systems that underpin Critical National Infrastructure (CNI) are required to implement capabilites to protect data and processes against hostile actions, human error and natural disasters.

Often when organizations embark on a cloud journey they think of resilience within the context of a single cloud platform. But as events have born out over time a single cloud platform can also experience failures for which the cause is only understood as part of a post mortem analysis, but typically involves some sort of cascading failure scenario in which existing coping mechanisms and resources are overwhelmed due to multiple classes of events occurring in short order of each other. To protect against cloud platform failure it is imperative that organizations consider these scenarios more carefully and implement appropriate measures as part of an enhanced business continuity plan to address CNI mandates.

In this blog we will look at how the toolset available with OpenShift Platform Plus can help businesses orchestrate operational resilience and protect themselves against cloud platform failure. The specific area of focus is on stateful applications that manage data which is "sticky" by nature and is more difficult to move between cloud platforms given the propriety nature of cloud platform APIs. Stateless applications in comparison are simpler to deal with and can be made resilient by deploying the application across multiple cloud platforms fronted by a global load balancer that itself is decoupled from any of the cloud platforms.

Note that the techniques described below could also be applied to facilitate a migration of stateful applications across cloud platforms.

Architecture

In order to decouple a stateful application from the storage infrastructure exposed by the cloud platform we will leverge Red Hat OpenShift Data Foundation (RHODF) which presents a cloud-agnostic Container Storage Interface (CSI) across all cloud platforms as well as on premises for block, file and object storage types. In our solution architecture we will leverage block storage (based on Ceph RBD) and object storage (based on NooBaa) to demonstrate operational resilience capabilities across cloud platforms from AWS and GCP.

Other key components of the solution architecture are the Policy orchestration engine included with Red Hat Advanced Cluster Management (RHACM) which will manage all data movement operations performed by the OpenShift API for Data Protection (OADP) Operator. The following diagram captures the overall workflow.

Cluster Landing Zone - Migration

Red lines on the diagram indicate flow of control based on Kubernetes resource manifests being downloaded from Git repositories and processed by the Policy orchestration engine. A PolicyGenerator kustomize plugin that is loaded when OpenShift GitOps (ArgoCD) starts transforms these resource manifests into policy documents which is indicated by steps 1 and 2. For more details on the PolicyGenerator kustomize plugin please refer to the documentation. The policy documents are responsible not only for configuring RHODF and OADP Operators but also for scheduling backup and restore workflows underpinning all data movement operations between the managed clusters which is indicated by steps 3 and 4.

Blue lines on the diagram indicate flow of data in response to Policies being enforced. Step 3a occurs whenever a Backup resource is scheduled and involves snapshotting of data in the Ceph RBD volume and uploading this to a hybrid cloud bucket accessible via the Multicloud Object Gateway which is indicated by step 3b. Similarly when a failover is required (or a backup validation test needs to be performed) a Restore resource is submitted which results in data to be downloaded from MCG and written to a Ceph RBD volume which is indicated by steps 4a and 4b.

Note that in this blog we will be protecting a stateful application that has no built-in data replication capabilities and relies on the underlying platform for this. For an example of protecting an application that has built-in data replication capabilities across cloud platforms please refer to this blog which leverages Submariner that is included with RHACM.

Prerequisites

One hub cluster located on premises with OpenShift 4.13, RHODF 4.13, RHACM 2.8, and OpenShift GitOps 1.9 installed.
One managed cluster located on AWS with OpenShift 4.13, RHODF 4.13, OADP 1.2, and VolSync 0.7 installed.
One managed cluster located on GCP with OpenShift 4.13, RHODF 4.13, OADP 1.2, and VolSync 0.7 installed.

Configuration on the Hub Cluster

As a first step we need to create two uniquely named object storage buckets located in AWS and GCP. These will then be "fused" into a hybrid object bucket to enable seamless cross-cloud data transfers and thereby protect the data stored from a cloud platform failure.

Create a bucket in AWS via the CLI:

REGION=<AWS REGION>
aws s3api create-bucket --bucket `uuid` \
  --create-bucket-configuration LocationConstraint=$REGION \
  --region $REGION

Create a bucket in GCP via the CLI:

REGION=<GCP REGION>
gsutil mb -l $REGION gs://`uuid`

The object buckets must be registered with the hub cluster and configured into a hybrid object bucket class which mirrors the data between the underlying object buckets. The YAML manifests for this is presented below followed by the PolicyGenerator configuration file. The first two manifests create the standalone Multicloud Object Gateway on the hub cluster.

apiVersion: odf.openshift.io/v1alpha1
kind: StorageSystem
metadata:
  name: ocs-storagecluster-storagesystem
  namespace: openshift-storage
spec:
  kind: storagecluster.ocs.openshift.io/v1
  name: ocs-storagecluster
  namespace: openshift-storage
---
apiVersion: ocs.openshift.io/v1
kind: StorageCluster
metadata:
  annotations:
    uninstall.ocs.openshift.io/cleanup-policy: delete
    uninstall.ocs.openshift.io/mode: graceful
  name: ocs-storagecluster
  namespace: openshift-storage
spec:
  multiCloudGateway:
    dbStorageClassName: thin-csi
    reconcileStrategy: standalone
---
apiVersion: v1
kind: Secret
metadata:
  name: aws-creds
  namespace: openshift-storage
type: Opaque
data:
  AWS_ACCESS_KEY_ID: <AWS ACCESS KEY ID ENCODED IN BASE64>
  AWS_SECRET_ACCESS_KEY: <AWS SECRET ACCESS KEY ENCODED IN BASE64>
---
apiVersion: v1
kind: Secret
metadata:
  name: gcp-creds
  namespace: openshift-storage
type: Opaque
data:
  GoogleServiceAccountPrivateKeyJson: <GCP PRIVATE KEY ENCODED IN BASE64>
---
apiVersion: noobaa.io/v1alpha1
kind: BackingStore
metadata:
  labels:
    app: noobaa
  name: noobaa-aws-backing-store
  namespace: openshift-storage
spec:
  awsS3:
    region: <AWS REGION>
    secret:
      name: aws-creds
      namespace: openshift-storage
    targetBucket: <AWS BUCKET NAME>
  type: aws-s3
---
apiVersion: noobaa.io/v1alpha1
kind: BackingStore
metadata:
  labels:
    app: noobaa
  name: noobaa-gcp-backing-store
  namespace: openshift-storage
spec:
  googleCloudStorage:
    secret:
      name: gcp-creds
      namespace: openshift-storage
    targetBucket: <GCP BUCKET NAME>
  type: google-cloud-storage
---
apiVersion: noobaa.io/v1alpha1
kind: BucketClass
metadata:
  labels:
    app: noobaa
  name: noobaa-mirror-bucket-class
  namespace: openshift-storage
spec:
  placementPolicy:
    tiers:
    - backingStores:
      - noobaa-aws-backing-store
      - noobaa-gcp-backing-store
      placement: Mirror
---
apiVersion: objectbucket.io/v1alpha1
kind: ObjectBucketClaim
metadata:
  name: migration-datastore
  namespace: openshift-storage
spec:
  generateBucketName: migration-datastore-bucket
  storageClassName: openshift-storage.noobaa.io
  additionalConfig:
    bucketclass: noobaa-mirror-bucket-class

The hybrid object bucket may take a few minutes to become available and it is important to not proceed further until it is ready. A blocking wait can be achieved by using Policy dependencies in the PolicyGenerator configuration file.

apiVersion: noobaa.io/v1alpha1
kind: BackingStore
metadata:
  name: noobaa-aws-backing-store
  namespace: openshift-storage
status:
  phase: Ready
---
apiVersion: noobaa.io/v1alpha1
kind: BackingStore
metadata:
  name: noobaa-gcp-backing-store
  namespace: openshift-storage
status:
  phase: Ready
---
apiVersion: noobaa.io/v1alpha1
kind: BucketClass
metadata:
  name: noobaa-mirror-bucket-class
  namespace: openshift-storage
status:
  phase: Ready
---
apiVersion: objectbucket.io/v1alpha1
kind: ObjectBucketClaim
metadata:
  name: migration-datastore
  namespace: openshift-storage
status:
  phase: Bound

Once the hybrid object bucket is available, information about it must be transferred to each of the managed clusters which will access this bucket remotely. This information includes the name of the object bucket, an object service endpoint and credentials for accessing the bucket securely. This information must be staged in the policies namespace first from where it can be securely downloaded by the managed clusters.

apiVersion: v1
kind: ConfigMap
metadata:
  name: migration-datastore
  namespace: policies
data:
  s3Url: '{{ (lookup "route.openshift.io/v1" "Route" "openshift-storage" "s3").spec.host }}'
  bucketName: '{{ (lookup "objectbucket.io/v1alpha1" "ObjectBucket" "" "obc-openshift-storage-migration-datastore").spec.endpoint.bucketName }}'
---
apiVersion: v1
kind: Secret
metadata:
  name: migration-datastore
  namespace: policies
stringData:
  cloud: |
    [default]
    aws_access_key_id={{ fromSecret "openshift-storage" "migration-datastore" "AWS_ACCESS_KEY_ID" | base64dec }}
    aws_secret_access_key={{ fromSecret "openshift-storage" "migration-datastore" "AWS_SECRET_ACCESS_KEY" | base64dec }}
type: Opaque

The PolicyGenerator configuration file brings together all of the above and controls the execution workflow using Policy dependencies and remediation actions. Note that directories are used to segregate the three sets of YAML manifests so that they can be managed as separate Policies.

apiVersion: policy.open-cluster-management.io/v1
kind: PolicyGenerator
metadata:
  name: multicloudgateway
placementBindingDefaults:
  name: multicloudgateway
policyDefaults:
  namespace: policies
  complianceType: musthave
  remediationAction: enforce
  policySets:
    - multicloudgateway
policies:
  - name: multicloudgateway-config
    manifests:
      - path: <DIRECTORY TO MANIFEST FILES>
  - name: multicloudgateway-status
    remediationAction: inform
    manifests:
      - path: <DIRECTORY TO MANIFEST FILES>
  - name: multicloudgateway-config-copy
    dependencies:
      - name: multicloudgateway-status
    manifests:
      - path: <DIRECTORY TO MANIFEST FILES>
policySets:
  - name: multicloudgateway
    placement:
      placementName: multicloudgateway

The following Placement resource referenced in the PolicyGenerator file ensures that this set of Policies will be evaluated on the hub cluster only.

apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
metadata:
  name: multicloudgateway
  namespace: policies
spec:
  predicates:
  - requiredClusterSelector:
      labelSelector:
        matchExpressions:
          - {key: name, operator: In, values: ["local-cluster"]}

Configuration on the Managed Clusters

The next set of Policies are to be evaluated on managed clusters in AWS and GCP. These will deploy stateful application using data volumes created from Ceph RBD Storage Class, which abstract the underlying cloud platform storage and presents a consistent CSI across all cloud platforms. Note the use of Policy template functions to map native cloud platform storage classes to a cloud-agnostic storage class.

apiVersion: odf.openshift.io/v1alpha1
kind: StorageSystem
metadata:
  name: ocs-storagecluster-storagesystem
  namespace: openshift-storage
spec:
  kind: storagecluster.ocs.openshift.io/v1
  name: ocs-storagecluster
  namespace: openshift-storage
---
apiVersion: ocs.openshift.io/v1
kind: StorageCluster
metadata:
  annotations:
    uninstall.ocs.openshift.io/cleanup-policy: delete
    uninstall.ocs.openshift.io/mode: graceful
  name: ocs-storagecluster
  namespace: openshift-storage
spec:
  arbiter: {}
  encryption:
    kms: {}
  externalStorage: {}
  managedResources:
    cephBlockPools: {}
    cephCluster: {}
    cephConfig: {}
    cephDashboard: {}
    cephFilesystems: {}
    cephObjectStoreUsers: {}
    cephObjectStores: {}
    cephToolbox: {}
  mirroring: {}
  nodeTopologies: {}
  storageDeviceSets:
  - config: {}
    count: 1
    dataPVCTemplate:
      metadata: {}
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 0.5Ti
        storageClassName: '{{- if eq "AWS" (fromClusterClaim "platform.open-cluster-management.io") -}} gp3-csi {{- else if eq "GCP" (fromClusterClaim "platform.open-cluster-management.io") -}} standard-csi {{- end }}'
        volumeMode: Block
      status: {}
    name: 'ocs-deviceset-{{- if eq "AWS" (fromClusterClaim "platform.open-cluster-management.io") -}} gp3-csi {{- else if eq "GCP" (fromClusterClaim "platform.open-cluster-management.io") -}} standard-csi {{- end }}'
    placement: {}
    portable: true
    preparePlacement: {}
    replica: 3
    resources: {}
---
apiVersion: snapshot.storage.k8s.io/v1
deletionPolicy: Retain
driver: openshift-storage.rbd.csi.ceph.com
kind: VolumeSnapshotClass
metadata:
  labels:
    velero.io/csi-volumesnapshot-class: "true"
  name: ocs-storagecluster-rbdplugin-snapclass

Similar to above, a blocking wait is introduced into the execution workflow to check on the readiness of storage resources before proceeding further.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ocs-operator
  namespace: openshift-storage
status:
  conditions:
    - status: "True"
      type: Available
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: odf-operator-controller-manager
  namespace: openshift-storage
status:
  conditions:
    - status: "True"
      type: Available
---
apiVersion: ocs.openshift.io/v1
kind: StorageCluster
metadata:
  name: ocs-storagecluster
  namespace: openshift-storage
status:
  phase: Ready

The next set of YAML configures OADP data movers on each managed cluster to perform the heavy lifting of data from user application volumes to the remote hybrid cloud bucket. Note that OADP version 1.2 is still in Tech Preview and requires VolSync to scrape the data off of the volumes. For an overview of this technology please refer to this blog.

apiVersion: v1
kind: Secret
metadata:
  name: dm-restic-secret
  namespace: openshift-adp
type: Opaque
data:
  RESTIC_PASSWORD: <PRIVATE KEY ENCODED IN BASE64>
---
apiVersion: v1
kind: Secret
metadata:
  name: cloud-credentials
  namespace: openshift-adp
type: Opaque
data: '{{hub copySecretData "policies" "migration-datastore" hub}}'
---
apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
metadata:
  name: velero-dpa
  namespace: openshift-adp
spec:
  features:
    dataMover:
      credentialName: dm-restic-secret
      enable: true
  configuration:
    velero:
      defaultPlugins:
        - openshift
        - aws
        - csi
        - vsm
    restic:
      enable: true
  backupLocations:
    - velero:
        config:
          profile: default
          region: noobaa
          s3Url: 'https://{{hub fromConfigMap "policies" "migration-datastore" "s3Url" hub}}'
          s3ForcePathStyle: "true"
          insecureSkipTLSVerify: "true"
        provider: aws
        default: true
        credential:
          key: cloud
          name: cloud-credentials
        objectStorage:
          bucket: '{{hub fromConfigMap "policies" "migration-datastore" "bucketName" hub}}'
          prefix: velero

This set of Policies transfers information about the hybrid object bucket previously staged on the hub cluster to each of the managed clusters by using {{hub .. hub}} delimiters which performs a secure data transfer. For more details about this mechanism please refer to the documentation.

Once the Data Protection Application has been processed by the owning controller it generates a Backup Storage Location resource which in our case points to the Multicloud Object Gateway and for which availability can be validated via Policy.

apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
  name: velero-dpa-1
  namespace: openshift-adp
status:
  phase: Available

Another PolicyGenerator configuration file brings together all of the above and controls the execution workflow using Policy dependencies and remediation actions. Note that directories are used to segregate the four sets of YAML manifests so that they can be managed as separate Policies.

apiVersion: policy.open-cluster-management.io/v1
kind: PolicyGenerator
metadata:
  name: dataprotectionapplication
placementBindingDefaults:
  name: dataprotectionapplication
policyDefaults:
  namespace: policies
  complianceType: musthave
  remediationAction: enforce
  policySets:
    - dataprotectionapplication
policies:
  - name: storagecluster-config
    manifests:
      - path: <DIRECTORY TO MANIFEST FILES>
  - name: storagecluster-status
    remediationAction: inform
    manifests:
      - path: <DIRECTORY TO MANIFEST FILES>
  - name: dataprotectionapplication-config
    dependencies:
      - name: storagecluster-status
    manifests:
      - path: <DIRECTORY TO MANIFEST FILES>
  - name: dataprotectionapplication-status
    remediationAction: inform
    manifests:
      - path: <DIRECTORY TO MANIFEST FILES>
policySets:
- name: dataprotectionapplication
    placement:
      placementName: dataprotectionapplication

The following Placement resource referenced in the PolicyGenerator file ensures that this set of Policies will be evaluated on the managed clusters only.

apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
metadata:
  name: dataprotectionapplication
  namespace: policies
spec:
  predicates:
  - requiredClusterSelector:
      labelSelector:
        matchExpressions:
	  - {key: name, operator: NotIn, values: ["local-cluster"]}

Finally, our stateful application (Hello OpenShift!) needs to be deployed to the OpenShift cluster running in AWS. This will write a datestamp record in to a filesystem mounted on the data volume generated by Ceph RBD Storage Class.

apiVersion: v1
kind: Namespace
metadata:
  name: hello-openshift
spec: {}
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-pvc
  namespace: hello-openshift
  labels:
    app: hello-openshift
spec:
  storageClassName: ocs-storagecluster-ceph-rbd
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello-openshift
  namespace: hello-openshift
spec:
  replicas: 1
  selector:
    matchLabels:
      app: hello-openshift
  template:
    metadata:
      labels:
        app: hello-openshift
    spec:
      containers:
      - name: hello-openshift
        image: registry.access.redhat.com/ubi8/ubi
        command: ["sh", "-c"]
        args: ["echo $(date) Hello OpenShift! >> /data/hello-openshift.txt && sleep inf"]
        volumeMounts:  
          - name: data
            mountPath: /data
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: data-pvc

Another PolicyGenerator configuration file is used to deploy the application (alternatively consider using OpenShift GitOps ApplicationSets).

apiVersion: policy.open-cluster-management.io/v1
kind: PolicyGenerator
metadata:
  name: hello-openshift
placementBindingDefaults:
  name: hello-openshift
policyDefaults:
  namespace: policies
  complianceType: musthave
  remediationAction: enforce
  policySets:
    - hello-openshift
policies:
  - name: hello-openshift-deploy
    manifests:
      - path: <DIRECTORY TO MANIFEST FILES>
policySets:
  - name: hello-openshift
    placement:
      placementName: hello-openshift

The following Placement resource referenced in the PolicyGenerator file ensures that this set of Policies will be evaluated on a managed cluster on AWS only.

apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
metadata:
  name: hello-openshift
  namespace: policies
spec:
  predicates:
  - requiredClusterSelector:
      labelSelector:
        matchExpressions:
          - {key: name, operator: NotIn, values: ["local-cluster"]}
      claimSelector:
        matchExpressions:
          - {key: platform.open-cluster-management.io, operator: In, values: ["AWS"]}

Login to the managed cluster and confirm the data written to the cloud-agnostic volume created by Ceph RBD Storage Class.

$ oc -n hello-openshift get pod,pvc
NAME                                  READY   STATUS    RESTARTS   AGE
pod/hello-openshift-c86f7f48b-bb2sj   1/1     Running   0          2m3s

NAME                             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
persistentvolumeclaim/data-pvc   Bound    pvc-5a127dbf-06db-402a-ac70-cf196b54799d   1Gi        RWO            ocs-storagecluster-ceph-rbd   2m3s

$ oc -n hello-openshift rsh hello-openshift-c86f7f48b-qt76p cat /data/hello-openshift.txt
Mon Sep 4 03:36:05 UTC 2023 Hello OpenShift!

The following Policy will trigger a backup of the namespace in which the application has been deployed. Given the above configuration that has now been put in place, this will result in OADP data movers uploading the data to the hybrid object bucket which in turn will mirror the data across both AWS and GCP.

apiVersion: velero.io/v1
kind: Backup
metadata:
  name: hello-openshift
  labels:
    velero.io/storage-location: default
  namespace: openshift-adp
spec:
  hooks: {}
  includedNamespaces:
  - hello-openshift
  storageLocation: velero-dpa-1
  ttl: 720h0m0s

The following YAML will validate the success of the backup which will take a few moments to complete.

apiVersion: velero.io/v1
kind: Backup
metadata:
  name: hello-openshift
  namespace: openshift-adp
status:
  phase: Completed

The following PolicyGenerator configuration file is used to manage the backup workflow.

apiVersion: policy.open-cluster-management.io/v1
kind: PolicyGenerator
metadata:
  name: backup
placementBindingDefaults:
  name: backup
policyDefaults:
  namespace: policies
  complianceType: musthave
  remediationAction: enforce
  policySets:
    - backup
policies:
  - name: backup-config
    manifests:
      - path: <DIRECTORY TO MANIFEST FILES>
  - name: backup-status
    remediationAction: inform
    manifests:
      - path: <DIRECTORY TO MANIFEST FILES>
policySets:
  - name: backup
    placement:
      placementName: backup

The following Placement resource referenced in the PolicyGenerator file ensures that this set of Policies will be evaluated on a managed cluster in AWS only.

apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
metadata:
  name: backup
  namespace: policies
spec:
  predicates:
  - requiredClusterSelector:
      labelSelector:
        matchExpressions:
          - {key: name, operator: NotIn, values: ["local-cluster"]}
      claimSelector:
        matchExpressions:
          - {key: platform.open-cluster-management.io, operator: In, values: ["AWS"]}

To restore the data on a managed cluster running in GCP the following Policy is used.

apiVersion: velero.io/v1
kind: Restore
metadata:
  name: hello-openshift
  namespace: openshift-adp
spec:
  backupName: hello-openshift
  excludedResources:
  - nodes
  - events
  - events.events.k8s.io
  - backups.velero.io
  - restores.velero.io
  - resticrepositories.velero.io
  restorePVs: true

Similar to validation of the backup, the outcome of the restore operation can be validated via Policy too.

apiVersion: velero.io/v1
kind: Restore
metadata:
  name: hello-openshift
  namespace: openshift-adp
status:
  phase: Completed

The following PolicyGenerator configuration file is used to manage the restore workflow.

apiVersion: policy.open-cluster-management.io/v1
kind: PolicyGenerator
metadata:
  name: restore
placementBindingDefaults:
  name: restore
policyDefaults:
  namespace: policies
  complianceType: musthave
  remediationAction: enforce
  policySets:
    - restore
policies:
  - name: restore-config
    manifests:
      - path: <DIRECTORY TO MANIFEST FILES>
  - name: restore-status
    remediationAction: inform
    manifests:
      - path: <DIRECTORY TO MANIFEST FILES>
policySets:
  - name: restore
    placement:
      placementName: restore

The following Placement resource referenced in the PolicyGenerator file ensures that this set of Policies will be evaluated on a managed cluster in GCP only.

apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
metadata:
  name: restore
  namespace: policies
spec:
  predicates:
  - requiredClusterSelector:
      labelSelector:
        matchExpressions:
          - {key: name, operator: NotIn, values: ["local-cluster"]}
      claimSelector:
        matchExpressions:
          - {key: platform.open-cluster-management.io, operator: In, values: ["GCP"]}

Login to the managed cluster and confirm the restored data has been written to cloud-agnostic volume created by Ceph RBD Storage Class. There will be additional datestamp due to the container being restarted.

$ oc -n hello-openshift get pod,pvc
NAME                                  READY   STATUS    RESTARTS   AGE
pod/hello-openshift-c86f7f48b-8fr8k   1/1     Running   0          31s

NAME                             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
persistentvolumeclaim/data-pvc   Bound    pvc-14d6edb4-e097-4078-a2fc-e2010e9516e6   1Gi        RWO            ocs-storagecluster-ceph-rbd   32s

$ oc -n hello-openshift rsh hello-openshift-c86f7f48b-8fr8k cat /data/hello-openshift.txt
Mon Sep 4 03:36:05 UTC 2023 Hello OpenShift!
Mon Sep 4 03:49:55 UTC 2023 Hello OpenShift!

In order to productionize the above consider replacing the Backup with a Schedule resource which will periodically (once every 5 minutes in the example given) create a backup that is written to the hybrid cloud bucket.

apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: hello-openshift
  namespace: openshift-adp
spec:
  schedule: '*/5 * * * *'
  template:
    hooks: {}
    includedNamespaces:
    - hello-openshift
    storageLocation: velero-dpa-1
    ttl: 720h0m0s

The corresponding resource status validation can also be performed via Policy.

apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: hello-openshift
  namespace: openshift-adp
status:
  phase: Enabled

Note that this only confirms that Schedules are in effect but says nothing about the outcome of individual backups. To do so requires the use of raw object template processing to iterate over an ever-growing list of backups and filter for failures (indicated by a backup with a status that is not "Completed"). Such occurances should trigger policy violation alerts that Observability tools including the Alert manager can action and thus we raise the severity for this Policy to critical. For more details on raw object template processing please refer to the documentation.

apiVersion: policy.open-cluster-management.io/v1
kind: ConfigurationPolicy
metadata:
  name: failed-scheduled-backups
spec:
  remediationAction: inform
  severity: critical
  object-templates-raw: |
    {{- range $backup := (lookup "velero.io/v1" "Backup" "openshift-adp" "").items }}
      {{- if not (eq $backup.status.phase "Completed") }}
    - complianceType: mustnothave
      objectDefinition:
        apiVersion: velero.io/v1
        kind: Backup
        metadata:
          name: {{ $backup.metadata.name }}
          namespace: {{ $backup.metadata.namespace }}
      {{- end }}
    {{- end }}

As a final note it is recommended to periodically test the integrity of a random backup by performing a point-in-time restoration so that in the event of a real business continuity scenario confidence in the backup process has been well-established.

Summary

For organizations operating in industries that are deemed to be of systemic importance to society, it is imperative that they adopt a multi-cloud architecture to protect their IT systems against the catastrophic failure of a single cloud platform. Organizations can deliver on this by building their applications with tools from OpenShift Platform Plus so that their applications can readily failover from one cloud platform to another without needing to rearchitect their applications.

About the author

Jeroen Wilms

Browse by channel

Explore all channels

Achieving Operational Resilience with OpenShift Platform Plus

Architecture

Prerequisites

Configuration on the Hub Cluster

Configuration on the Managed Clusters

Summary

About the author

Jeroen Wilms

More like this

Browse by channel

Platforms

Tools

Try, buy, & sell

Communicate

About Red Hat

Change page language

Red Hat legal and privacy links

Red Hat legal and privacy links