In my previous blog post, I have shared the vision of Disaster Recovery as a Service for OpenStack (DraaS) as an umbrella topic that describes what needs to be done to protect workloads running in an OpenStack cloud from a large scale disaster.
Last week we shared this vision in several sessions at the OpenStack summit. While OpenStack attendees were dealing with infrastructure Disaster Recovery topics in Hong Kong, the strongest tropical cyclone in recorded history “Typhoon Haiyan” also known as Typhoon Yolanda, devastated multiple coastal cities in the Philippines and took the lives of tens of thousands of people with millions evacuated. The storm destroyed complete cities, villages, airports, roads, power and communications infrastructures.
If there’s one thing that history has not only taught us, but also keeps on teaching us every year, is that catastrophic events do happen and that if we don’t invest in preventative measures now, we will pay a hefty price later.
What would happen to your organization if this type of calamity hit?
It is hard enough to protect hosted workloads even in a case of an overheated datacenter that can knock down production servers and deeply impact your operations and revenue generating activities. For service providers, downtime is not an option, every hour that your production service is down, you can loose not only business but also your reputation.
It is one thing to put your application workload in the cloud, but how can you guarantee that when the hosting service goes down, you can provide the right safety net and business continuity for your customers?
Although an entire datacenter, can in fact go down in the case of a disaster, from a user’s point of view, what service providers should care about is how to protect their own data and make sure their services continue running after such events.
When it comes to elastic clouds, it is all about being able to adapt to workload changes by dynamically provisioning and decommissioning resources, and the more dynamic and elastic the cloud platform is, the more challenges you face in making your data and services highly available in the event of a disaster. Data recovery is usually not the end goal–it is the ability to restore the services that use this data. For service providers, this is the hosted workload.
The Replication targets
Disaster Recovery between a primary cloud and a target cloud requires the data to be available in (at least two) geographically dispersed, independent sites in a share-nothing model.
OpenStack replication targets can include:
- Private cloud to Private cloud
- Private cloud to Public cloud
- Public cloud to Public cloud
- Bare-metal environments to Public cloud
As our recovery target is the hosted workload, we should look at ways to achieve DR at the workload level. Imagine selecting a DR service level flavor for a workload, such as applying a “Gold” profile for application service that requires the highest protection level with the shortest recovery point objective (RPO) and the shortest recovery time objective (RTO). Such a DR policy can be based on synchronous replication and hot backup site. Or what if you were able to select the other policies such as “Silver” based on periodic replication, or “Bronze” based on async replication with low capacity standby site for application services that require lower protection levels with longer RPOs & RTOs?
The first step for Disaster Recovery enablement in OpenStack is the ability to support data and state (metadata) replication. Several different approaches may be applicable, such as leveraging application-based replication, host-based replication (Hypervisor VM level) and of course array-based replication.
Replicating Data
OpenStack Swift Globally Distributed Cluster object storage can be used to replicate Glance virtual machine images. Swift is currently designed to work in a single region where a region is defined as a low latency link between Swift zones. As long as sites are nearby, zones can be distributed over multiple sites.
Another option to replicate virtual machine images would be to utilize Glance’s multiple image locations feature. Starting in the OpenStack Havana release, image service images can now be stored in multiple locations. This enables the efficient consumption of image data and the use of backup images in the event of a primary image failure.
Cinder can be extended to support storage array based replication in the following ways:
- Utilize the scheduler to create “protected” volumes on storage arrays that are continuously replicating
- Use volume types to create replicated volumes where drivers support volume level granularity for replication
- Replicate data in 2 independent volumes (across different storage backends and possibly sites) using hypervisor based replication
Replicating OpenStack Services State
Disaster Recovery in OpenStack should include support for:
Capturing the metadata relevant for the protected workloads/resources: either as point-in-time snapshots of the metadata, or as continuous replication of the metadata. Without capturing the Openstack different services state, we will not be able to achieve a complete failover of the hosted workloads to the recovery site.
Examples of OpenStack metadata that requires replication can include:
- Nova: VM flavors and SSH keys
- Keystone: Identities of tenants and users
- Neutron: Virtual networks between VMs
- Cinder: Volume types and pairing
- Glance: Registry and image metadata
- Ability to provide consistency of the replicated data & metadata with checkpoints
We note that metadata changes are less frequent than application data changes, and different mechanisms can handle replication of different portions of the metadata and data (volumes, images, etc).
Understanding that Disaster Recovery is a complex task where different applications and use-cases have different requirements, some use-cases can be easily supported while others may be more complex, this is targeted as a long-term effort with incremental steps.
Some APIs and features are expected to be integrated into existing projects such as Nova (DR features for compute). Some functionality, like DR orchestration may be part of Heat, or a new project, or even outside the scope of OpenStack.
Enabling Cinder storage replication in the OpenStack Icehouse release is just the first step in protecting workloads running in OpenStack clouds to ensure business continuity while preparing for the worst case scenario.
저자 소개
유사한 검색 결과
채널별 검색
오토메이션
기술, 팀, 인프라를 위한 IT 자동화 최신 동향
인공지능
고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트
오픈 하이브리드 클라우드
하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요
보안
환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보
엣지 컴퓨팅
엣지에서의 운영을 단순화하는 플랫폼 업데이트
인프라
세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보
애플리케이션
복잡한 애플리케이션에 대한 솔루션 더 보기
오리지널 쇼
엔터프라이즈 기술 분야의 제작자와 리더가 전하는 흥미로운 스토리
제품
- Red Hat Enterprise Linux
- Red Hat OpenShift Enterprise
- Red Hat Ansible Automation Platform
- 클라우드 서비스
- 모든 제품 보기
툴
체험, 구매 & 영업
커뮤니케이션
Red Hat 소개
Red Hat은 Linux, 클라우드, 컨테이너, 쿠버네티스 등을 포함한 글로벌 엔터프라이즈 오픈소스 솔루션 공급업체입니다. Red Hat은 코어 데이터센터에서 네트워크 엣지에 이르기까지 다양한 플랫폼과 환경에서 기업의 업무 편의성을 높여 주는 강화된 기능의 솔루션을 제공합니다.