Keep critical applications running
As organizations move their systems to the hybrid cloud, resilience is often a critical concern. The ability to withstand errors and failures without data loss is key to providing reliable application services that contribute to business continuity.
Critical applications must also continue to perform well, even under component failure. Applications alone can go only so far in providing resilience, ultimately depending on underlying data services infrastructure for resilience and performance under failure conditions.
Build a highly available platform
High availability is protecting infrastructure or applications on a single site to ensure continuous operations. The aim is to reduce single points of failure in a computing stack, generally through redundant access paths and component resiliency. Including high availability concepts in an environment means services have built-in resiliency and can recover on their own. To recover, these services might restart if they fail, allow for a faulted node to be restarted, redeploy a workload on failed hardware in another location in the environment, or resend transactions to the service or a different instance of the service if a network path fails.
High availability is key to ensuring your applications operate without downtime and can handle unforeseen failures. Technologies such as containers, Kubernetes, and serverless present new opportunities in application development but still need a recovery plan in the event of a failure.
Exceed your recovery objectives
Disaster recovery (DR) is the ability to recover and continue business-critical applications from natural or human-created disasters, protecting infrastructure or applications in a geographically distributed manner to reduce business impact as much as possible. It is the overall business continuance strategy of any major organization, designed to preserve the continuity of business operations during major adverse events. The aim is to enable automated or automatic recovery over longer distances than traditional high availability and extend recovery to a different cluster. In environments where an application is restricted to one site at a time, migration between sites may be automated and require an individual with authority to make a decision to move computing services between sites. This is needed when technology requires a cost to resync applications when failover between sites occurs. Reducing the time it takes to recover from incidents is critical to your organization's success.
Regional DR capability provides volume-persistent data and metadata replication across sites that are geographically dispersed. In the public cloud, these would be akin to protecting from a regional failure. Regional DR ensures business continuity during the unavailability of a geographical region, accepting some loss of data in a predictable amount. This is usually expressed as recovery point objectives (RPO) and recovery time objectives (RTO).
RPO is a measure of how frequently you take backups or snapshots of persistent data. In practice, the RPO indicates the amount of data that will be lost or need to be reentered after an outage.
RTO is the amount of downtime a business can tolerate. The RTO answers the question, "How long can it take for our system to recover after we were notified of a business disruption?"
Check out documentation on configuring Red Hat OpenShift Data Foundation for regional disaster recovery with advanced cluster management.