Most people think of Kubernetes and OpenShift as hosting “cloud native” applications, where cloud native refers to a system that expects failure and can compensate automatically, for example, using horizontal scaling. But does that apply to the infrastructure hosting OpenShift? What about OpenShift itself?
In this episode we are joined by Christian Hernandez, Technical Marketing Manager for Red Hat, to look at high availability for OpenShift, including what it expects from the infrastructure, what it’s capable of providing for applications, and some example scenarios.
As always, please see the list below for additional links to specific topics, questions, and supporting materials for the episode!
If you’re interested in more streaming content, please subscribe to the Red Hat livestreaming calendar to see the upcoming episode topics and to receive any schedule changes. If you have questions or topic suggestions for the Ask an OpenShift Admin Office Hour, please contact us via Discord, Twitter, or come join us live, Wednesdays at 11am EDT / 1500 UTC, on YouTube and Twitch.
Episode 38 recorded stream:
Use this link to jump directly to where we start talking about today’s topic.
This week’s top of mind topics:
- Late last week Red Hat and Nutanix jointly announced a strategic partnership and support for OpenShift on Nutanix AOS, including a certified CSI provisioner! This is just the first step toward much more, be sure to watch for more information in the future!
- We talked a bit about how the National Security Agency (NSA) recently released a hardening guide for Kubernetes. Including how OpenShift already meets many of those guidelines out of the box. You can find more information about Red Hat’s perspective on the NSA hardening guide in the blog post here.
- The last top of mind topic this week was around namespaces, in particular, when should you create additional namespaces and what purpose do they serve? You can listen to answers from all of us here in the stream.
Questions answered and topics discussed during the stream:
- Can I run dev and prod in the same cloud? Yes, just be cognizant of what you’re protecting against and your failure domains. For example, is it ok for the applications to go down if the entire cloud provider is down? Or, do you need a multi-cloud strategy?
- One of the first steps to putting in place an effective high availability - and disaster recovery - strategy is to identify what risks you’re mitigating against. Without a clear understanding of the goals, then you can accidentally omit an important scenario that you wanted to protect against.
- What is HA and how is it different than disaster recovery (DR)? HA is usually targeted at keeping an application running rather than recovering it after going down.
- Do you consider a failed pod to be an HA event? Yes, of course it is because we want to ensure that the application continues to run when some of its capacity is lost.
- As hyperscaler deployments have become more widespread, applications have evolved to provide their own high availability. The important aspect is to strike a balance between what’s provided by the infrastructure and the application.
- How can I restore a cluster from just an etcd backup? It’s possible, but complicated because the restored etcd won’t be aware that it’s actually a new cluster.
- We talk about two KCS articles, one providing recommended practices for high availability and another with guidance for multi-site clusters.
- With two sites, how can I achieve HA with OpenShift? OpenShift 4 requires three control plane nodes, which means that with only two sites, one of them will always have a majority of nodes. If that site fails, then you’re in a DR scenario, so HA with two sites is, at best, complex.
- One option is two use two OpenShift clusters with a global load balancer. We talked about this in-depth during the stream, including how this is the recommended method from other vendors as well.
- Scheduling hints, such as (anti)affinity, node selectors, and scheduler profiles can all affect high availability.
- “Kubernetes doesn’t fix a broken application design”. Simply using Kubernetes to deploy and manage a containerized application doesn’t make it magical. Operations teams need to work closely with applications teams to provide the right resources - and capabilities - at the right layer.
- We reinforce this during the stream by talking about how containers can still be utilized to make the application deployment and configuration process easier, even without Kubernetes. It makes sense to deploy the application to the infrastructure that provides the right services, like HA, for what it’s needs are rather than blindly prioritizing Kubernetes over everything.
- Did you know that it takes 5 minutes, by default, for Kubernetes to reschedule workload from a failed node? Using node and pod health checks and liveness probes are a critical component of ensuring that the application remains available, even when non-obvious failures happen.
- It’s important to work with the application team. The infrastructure cannot be the only way that the application achieves HA with Kubernetes. Both teams have to cooperate!
- When using two clusters deployed in two locations, are there any recommendations or guidelines for keeping the application data in sync? You can use either infrastructure level replication, i.e. asynchronous storage replication, or application level replication, e.g. CockroachDB, which is a cloud-native, distributed SQL database.
저자 소개
유사한 검색 결과
Red Hat and Sylva unify the future for telco cloud
NetApp’s certified OpenShift operator for Trident
Edge computing covered and diced | Technically Speaking
채널별 검색
오토메이션
기술, 팀, 인프라를 위한 IT 자동화 최신 동향
인공지능
고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트
오픈 하이브리드 클라우드
하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요
보안
환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보
엣지 컴퓨팅
엣지에서의 운영을 단순화하는 플랫폼 업데이트
인프라
세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보
애플리케이션
복잡한 애플리케이션에 대한 솔루션 더 보기
가상화
온프레미스와 클라우드 환경에서 워크로드를 유연하게 운영하기 위한 엔터프라이즈 가상화의 미래