Preparing for the Worst Case Scenario: A Vision for OpenStack Disaster Recovery

31 ottobre 2013Sean Cohen3 minuti (tempo di lettura)

In a time where the rules of Enterprise IT are constantly changing and every day there seems to be a new app born in the cloud, we must not forget to ask ourselves what are the challenges we face with these changes and rapid app development. What do we need to do to secure the horizon? What technology bridges are still waiting to be built in order to get us where we want to be in term of service level and securing cloud workload availability.

The Hybrid Cloud Reality Check

While the cloud growth rate continues to rise rapidly, a balance between on-premise and cloud has become the reality.

As organizations would like to maintain a significant amount of their IT facilities at their sites, they also require new capabilities to allow them to address issues such as automated provisioning & hybrid management, when they look to build enterprise private clouds which combine their on-site resources with hosted private or public cloud resources. One of these capabilities is the need to provide business continuity and service availability in case of disasters which can lead to a complete loss of a data center such as floods, tornadoes, hurricanes, fires, etc.

Building the Disaster Recovery Bridges

Preparing for the worst case scenario means providing the ability to recover the technology infrastructure when a disaster occurs. This requires geographic distribution, where data written by an application is replicated to the data center which will be used for recovery. When we look at OpenStack, we find that it is still immature in this respect. As an emerging IaaS platform that is seeing more and more enterprise use, we need OpenStack to evolve to more easily support disaster recovery (DR).

We believe OpenStack should provide a consistent mechanism to abstract the DR support built into many enterprise systems, and higher level automation, and it should be able to easily configure these mechanisms into a DR solution appropriate for a workload.

Disaster Recovery (DR) for OpenStack is an umbrella topic that describes what needs to be done for applications and services (generally referred to as workload) running in an OpenStack cloud to survive a large-scale disaster.

Providing DR for a workload is a complex task involving infrastructure, software and an understanding the workload. To enable recovery following a disaster, the administrator needs to execute a complex set of provisioning operations that will mimic the day-to-day setup in a different environment.

Enabling DR for OpenStack hosted workloads requires enablement (APIs) in OpenStack components (e.g., Cinder) and tools which may be outside of OpenStack (e.g., scripts) to invoke, orchestrate and leverage the component specific APIs.

Goal

The goal is to provide a mechanism to mark and protect from disaster applications and services (a set of OpenStack entities) also referred to as a hosted workload. In this context the cloud is the equivalent of the physical hardware. The target of the disaster recovery is not recovering the hardware, but the applications, services and their data.

A separate recovery mechanism should address making the primary cloud available to run workloads following a disaster. The disaster recovery mechanism for applications and services can handle the fail-back to the primary cloud.

Examples

Application service running on customer cloud and protected by recovery on hosted cloud.
Application service running on customer cloud in data center #1 and protected by recovery on customer data center #2.

The plan is to provide a solution for both the born-in-the-cloud applications that were “Design for Failure” that are stateless in nature, as well as traditional applications that require storage and statefulness.

Disaster Recovery should include support for:

Capturing the metadata of the cloud management stack, relevant for the protected workloads/resources: either as point-in-time snapshots of the metadata, or as continuous replication of the metadata.
Making available the VM images needed to run the hosted workload on the target cloud.
Replication of the workload data using storage replication, application level replication, or backup/restore.

The Road to Ice-House goes through Hong Kong...The DRaaS is a new initiative led by Red Hat & IBM Research to design Disaster Recovery as a Service for OpenStack

The plan calls for the architecture to be open and allow vendor products or products enablement to be integrated with the DRaaS using plug-ins and API's.

Join at the Related sessions in IceHouse summit:

Surviving the worst : A vision for OpenStack disaster recovery - November 7, 9:50am
Cinder Design Summit session on Storage Replication: http://summit.openstack.org/cfp/details/69
The DRaaS wiki page can be found at: https://wiki.openstack.org/wiki/DisasterRecovery

By: Sean Cohen, Sr. Technical Product Manager, Red Hat
October 31, 2013

Sull'autore

Sean Cohen

Director of Product Management

Sean Cohen is the Director of Product Management in the Hybrid Platforms organization at Red Hat, a leading provider of open-source technologies and hybrid cloud solutions. He oversees the infrastructure and observability business, including strategy and delivery for Red Hat OpenShift and OpenStack Platforms. With over 15 years of experience, Sean has a robust background in senior product management and delivery across enterprise and telco markets, driving cloud infrastructure strategy, product lifecycle management, and leading high-performance cross-functional teams.

Read full bio