What is high availability?

Updated March 28, 2025•5-minute read

High availability is the ability of an IT system to be accessible and reliable nearly 100% of the time, eliminating or minimizing downtime. It combines two concepts to determine if an IT system is meeting its operational performance level: that a given service or server is accessible–or available–almost 100% of the time without downtime, and that the service or server performs to reasonable expectations for an established time period. High availability is more than hitting an uptime service level agreement (SLA), or the expectations set between a service provider and client. It is about truly resilient, reliable, and well-functioning systems.

With the adoption of online services and hybrid workloads, there is greater demand for infrastructures to handle increased system loads while still maintaining operational standards. To achieve high availability, these infrastructures, often referred to as high availability systems, must hit defined, quantifiable outcomes beyond just "running better."

One of the targets of high-availability solutions, or high availability services, is called five-nines availability, meaning that a system is running and performing well 99.999% of the time. Usually only mission critical systems like healthcare, government, and financial services require this level of availability for compliance or competitive reasons. However, many organizations and industries still require their high availability systems to maintain 99.9% or even 99.99% uptime to provide constant digital access for their customers or allow their employees to work from home.

High-availability infrastructure is dependent on detecting and eliminating single points of failure that could contribute to increased system downtime and prevent organizations from reaching their performance goals. A single point of failure is an aspect of the infrastructure that could take the entire system offline, and in complex systems, multiple single points of failure can exist.

Organizations also have to take into account the different types of failures that can occur in a modern, complex IT infrastructure. These include hardware failures, software failures (both for the operating system and for the running applications), service failures (such as inaccessible networking and latency or cloud services or performance degradation), and external failures, such as power outages.

The first step each organization can take toward high availability is determining the specific, most important outcomes it wants to see based on its core services, workload and regulatory or compliance requirements, performance benchmarks, critical applications, and operational priorities:

What are the uptime requirements either for regulatory compliance or for user experience?
How distributed is the environment? What are the key points of failure?
What is the required performance for the application? What are the risks to that app's performance (e.g., high user traffic or heavy write loads)?
What kind of storage is in use?
What requirements are there around data loss or data access?
Given current IT resources, what are achievable SLAs in case of an outage? What are the current planned maintenance schedules, and what is the impact on uptime?
Are there plans around different disaster recovery scenarios or changes in business operations?

With high-availability environments, there are also several common metrics that IT teams use to determine whether the high availability architecture is meeting its objectives. Some may be more relevant to your architecture than others, but it is worthwhile to evaluate all of them to set baseline performance expectations:

Mean time between failures (MTBF): How long the environment operates between a system failure.
Mean downtime: How long the system is down (minutes of downtime) before it is recovered or replaced in the topology.
Recovery time objective (RTO): The total time it takes to complete a repair and bring a system back online.
Recovery point objective (RPO): The period of time in which you need to be able to recover data. This is the window of lost data. For example, if a system is relying on bringing in another system from backups and the backups are taken daily, then there could be up to 24 hours of lost data in the recovered system. However, if there is replicated or shared storage, then the data loss may only be minutes or even less.

A high-availability architecture incorporates principles from each layer of continuity planning, such as monitoring and automation. This allows the overall system to be resilient to all types of failures, from specific local failures to an overall outage. It even allows the overall system to remain operational even with planned maintenance windows and other service interruptions.

A disaster recovery or continuity plan would incorporate approaches for each potential failure:

Anticipate specific failures: For each of those areas, IT architects first make sure that systems are redundant, and that backup systems are available in case of a failure. The next step is to automate failover and failure-detection processes so that down systems are automatically detected and services are switched to the backup system.
Manage performance proactively: Fault tolerance will address an outage, but it won’t necessarily deal with performance degradation. This is where load balancing and scalability become useful tools. In this case, IT architects monitor system performance and use multiple systems to manage user requests and operations. Load balancers and traffic management can intelligently route traffic in real time based on bandwidth, system performance, user, or request type.
Deal with catastrophe: Widespread infrastructure failures–like a cloud provider going down, a natural disaster at a data center site–are rare, but they require a more comprehensive approach than hardware/software failures alone. Along with bringing the infrastructure back online, it is necessary to have up-to-date data available. This can be done synchronously through replication (though there are performance risks) or asynchronously through data backups (with some risk of data loss).

Maximize your business continuity

High-availability architectures run active failover clusters, so there is built-in redundancy and failover and—hopefully—zero downtime. Within the cluster, nodes are monitored not just for availability, but for overall performance of applications, services, and network. Because there is shared storage, there is no data loss if a node goes down, because all cluster nodes work from the same data source. Load balancing can be used to manage traffic for best performance.

Outside those broad characteristics, high-availability clusters can be designed for more specialized activities, depending on the priorities and activities within the IT infrastructure. The Red Hat Enterprise Linux High Availability Add-on, for example, has four default configurations:

High availability: focuses on uptime and availability
High performance: for high speed, concurrent operations
Load balancing: for cost-effective scalability
Storage: for resilient data management

In real-life environments, the high-availability systems would incorporate aspects of those focus elements.

I enjoy the patching processes and the way Red Hat Enterprise Linux has elements set up. I have never had a patch session fail, even when installing a thousand packages at a time.
Bruce Lundberg
Linux HPC Systems Administrator

Keep reading

Why run Linux on AWS?

For organizations using Amazon Web Services (AWS), Linux shortens time to market, reduces complexity, provides on-demand scalability, and lowers costs.

What is ERP?

Enterprise resource planning (ERP) unifies the massive amounts of data within an organization, and enables information flow between different teams.

What is SAP HANA (and why does it run on Linux)?

SAP HANA is an in-memory database that helps organize and analyze big data for SAP ERP applications.

What is high availability?

How to achieve high availability for Apache Kafka

Red Hat Enterprise Linux Resilient Storage Add-On

All Red Hat product trials

Keep reading

Why run Linux on AWS?

What is ERP?

What is SAP HANA (and why does it run on Linux)?

Linux resources

Platforms

Tools

Try, buy, & sell

Communicate

About Red Hat

Change page language

Red Hat legal and privacy links

Red Hat legal and privacy links