Background

Backup is defined as the process of creating copies of data and storing them in separate locations or mediums, while restore is defined as the process of retrieving the backed-up data and returning it to its original location or system or to a new one. In other words, backup is akin to data preservation, and restore is in essence data retrieval.

In this article, I will discuss several considerations in the backup and restore processes for both Red Hat OpenShift Service on AWS (ROSA) and Azure Red Hat OpenShift (ARO) clusters. I will also discuss what to back up, and lastly, what tools can be used to support backup and restore processes. The objective of this article is to share best practices for backup and restore for your ROSA and ARO clusters at a high level.

Considerations

First and foremost, I recommend that you do backup and restore processes at the application level instead of at the cluster level. Note that this applies to all the following considerations.

  • Disaster Recovery (DR) plan
    • High Availability (HA) vs Fault Tolerant (FT)

      Both HA and FT are important considerations in any DR plan. Designing applications with HA in mind will result in minimal downtime, such as during failover or maintenance; thus the focus is to reduce this downtime to an acceptable level.

      On the other hand, designing FT systems will ensure no downtime and the overall systems will continue operating correctly even when some parts are failing. As such, these systems are typically able to recover in real-time without user intervention.

      In short, the former design is commonly used for applications that can tolerate downtime to some extent, while the latter is for critical applications where downtime is unacceptable.

    • Recovery Point Objective (RPO) vs Recovery Time Objective (RTO)

      Other critical considerations in the DR plan are the organization’s RPO and RTO. RPO refers to the acceptable amount of data loss in the event of disaster, while RTO is defined as the acceptable downtime before the systems recover.

      Let's say your RPO is one hour and RTO is also one hour. This means that if a disaster happens at 12 noon, then you could only tolerate data loss from 11am and your systems need to resume normal operations by 1pm.  

Figure 1. Illustration of DR scenario

What to Back Up

  • Frequency
    • As discussed above, your RPO and RTO determine how often or how frequently you need to do backup. Using the previous example, since your RPO is one hour, you will need to do backup at least every hour to meet the RPO.
  • Location
    • To ensure that your systems remain available during an outage, you might consider distributing your clusters to multiple availability zones (AZs) and/or regions. In other words, if you currently have a single cluster in a single AZ, you might consider having your cluster replicated in multiple AZs instead. Similarly, you might want to ensure that your cluster can failover to another region in the event of a catastrophe affecting the region.
  • Security
    • Security is another important consideration since you don't want anyone without proper authentication or authorization to access your backup data. That said, you should restrict the access to this data. You should store the backup in a secure location and have it encrypted both at rest and in transit.
  • Automation
    • You might want to automate the backup and restore processes. I will discuss which tools can help you with the process below.
  • Test and validation
    • Last but not least, you want to ensure that all plans with the considerations above are tested, validated, documented, and maintained. This will ensure that the backup and restore processes are functioning properly in disaster recovery scenarios.

What to Back Up

When it comes to what you need to back up, consider this at the application level. The first rule: It is not advisable to backup etcd since it is highly unlikely that your new cluster will be the exact copy of the one that you are about to backup. So do not backup etcd!

  • Namespaces
    • Determine which namespaces are critical to your applications.
  • Custom Resources (CRs)/Custom Resource Definitions (CRDs)
    • Along with your namespaces, select which CR and CRDs are relevant to your cluster.
  • YAML Manifests
    • Select which YAML manifests you want to prioritize and back up.
  • Persistent Volumes (PVs)
    • If your cluster is using PVs, see where those volumes reside (i.e., inside or outside the cluster) and consider how to back up these volumes.

Tools

Finally, let's discuss several tools that can help you with backup and restore processes.

  • GitOps and CI/CD
    • Automation is an important consideration in the backup and restore processes. GitOps helps in tracking changes and can be beneficial in identifying the cause of data loss during backup and restore. CI/CD pipelines can also be extended to include deployment of backup and processes themselves, allowing you to perform and test the processes regularly.
  • Red Hat OpenShift API for Data Protection (OADP)
    • OADP is an API that will allow external backup and recovery tools to interact with OpenShift clusters, allowing you to utilize your preferred data protection solutions and ensuring availability and recoverability of the application data in the event of catastrophe. Please refer here for more details.
  • Migration Toolkit for Containers (MTC)
    • On the other hand, MTC is a toolkit that provides tools and resources for migrating applications within the same Red Hat OpenShift cluster or between clusters. Please refer here for more details.
  • Third-party tools

Other third-party tools we recommend include TrilioKonveyorVeleroKasten K10, and Portworx.  


저자 소개

UI_Icon-Red_Hat-Close-A-Black-RGB

채널별 검색

automation icon

오토메이션

기술, 팀, 인프라를 위한 IT 자동화 최신 동향

AI icon

인공지능

고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트

open hybrid cloud icon

오픈 하이브리드 클라우드

하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요

security icon

보안

환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보

edge icon

엣지 컴퓨팅

엣지에서의 운영을 단순화하는 플랫폼 업데이트

Infrastructure icon

인프라

세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보

application development icon

애플리케이션

복잡한 애플리케이션에 대한 솔루션 더 보기

Virtualization icon

가상화

온프레미스와 클라우드 환경에서 워크로드를 유연하게 운영하기 위한 엔터프라이즈 가상화의 미래