What’s your Recovery Time Objective (RTO) and Recovery Point Objective (RPO)? Do you have a disaster recovery plan for both OpenShift and the applications? Not sure what they are or how they're related? Let’s talk about what RPO and RTO are, their similarities and differences, and why you need to analyze your application priority to balance between resources and application availability.

In our last episode, we discussed high availability for OpenShift. Now we get to the big question: what happens if things go wrong? This week we discuss disaster recovery scenarios and strategies, including some suggestions on what is, or isn’t, important to protect and recover.

As always, please see the list below for additional links to specific topics, questions, and supporting materials for the episode!

If you’re interested in more streaming content, please subscribe to the Red Hat livestreaming calendar to see the upcoming episode topics and to receive any schedule changes. If you have questions or topic suggestions for the Ask an OpenShift Admin Office Hour, please contact us via Discord, Twitter, or come join us live, Wednesdays at 11am EDT / 1500 UTC, on YouTube and Twitch.

Episode 39 recorded stream:

Use this link to jump directly to where we start talking about today’s topic. 

This week’s top of mind topics:

Questions answered and topics discussed during the stream:

  • If you missed last week’s stream, we do a brief summary here. The important part is to remember that high availability usually refers to keeping an application running during a partial cluster failure, whereas disaster recovery is bringing back the application after a full cluster failure.
  • Applications deployed to Kubernetes, and of course OpenShift, are often responsible for their own high availability. We often lump this under the blanket term “cloud native”. But, that doesn’t mean that we don’t want some cluster level capabilities to help them recover.
  • When beginning to assess your disaster recovery requirements, it’s extremely important to understand several things: your RTO, your RPO, and the requirements of the application. Maybe the app doesn’t need anything other than a new cluster to deploy to. Maybe it needs storage replicated. You need to know those requirements in order to design and deploy appropriately.
  • What do you need at a disaster recovery site? Well, it depends on your RTO, RPO, and the application team’s plan. Perhaps you only need a small set of hardware compared to the primary site because some parts will be moved to a hyperscaler.
  • Make sure the hardware at the DR site is appropriately configured before an event happens. This includes not just installing an operating system, but things like making sure network connections are configured, BIOS/EFI settings are correct, and so on. Finding and troubleshooting these can take a long time, which is the last thing you want during a disaster recovery scenario.
  • Additionally, dependent services - like DHCP, DNS, Active Directory, and so on - also need to be available at the recovery site. You may need to remember to order your recovery appropriately so that those services come up first.
  • Does the destination cluster need to be exactly the same? Maybe. There’s a lot of factors that go into the decision, and you - very importantly! - need to work with the application team to understand aspects of their scaling and other configuration.
  • Once your destination cluster is up and running, you’ll want to make sure that the dependent Operators and other functionality is available for the application. For example, if you’re relying on a specific Operator to deploy your database, is it available on the DR cluster? Does the application plan to rebuild components at the DR site? Do they need access to the artifact repository? If that repository isn’t replicated, will the rebuild time be extended in order to pull or build other dependencies?
  • The last thing we talked about this week, but certainly still important, is to be aware of client side changes that need to happen. For example, if you deployed to a new cluster, did the DNS name change? Do you need to update any other applications to use a different name? 
  • During the last segment we also talked about how the OpenShift Subscription Guide defines hot, warm, and cold disaster recovery. These are important to understand because it can affect the entitlements you need, specifically infrastructure used for warm and cold DR does not need entitlements until it’s used.

저자 소개

UI_Icon-Red_Hat-Close-A-Black-RGB

채널별 검색

automation icon

오토메이션

기술, 팀, 인프라를 위한 IT 자동화 최신 동향

AI icon

인공지능

고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트

open hybrid cloud icon

오픈 하이브리드 클라우드

하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요

security icon

보안

환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보

edge icon

엣지 컴퓨팅

엣지에서의 운영을 단순화하는 플랫폼 업데이트

Infrastructure icon

인프라

세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보

application development icon

애플리케이션

복잡한 애플리케이션에 대한 솔루션 더 보기

Virtualization icon

가상화

온프레미스와 클라우드 환경에서 워크로드를 유연하게 운영하기 위한 엔터프라이즈 가상화의 미래