Issue #7 May 2005

New availability features in
Red Hat® Enterprise Linux® 4

Introduction

New features in Red Hat Enterprise Linux 4 have increased the availability and ease-of-management of servers running Red Hat's most recent enterprise OS. Most of the new availability features are a result of the release of LVM2, the latest version of the Linux Logical Volume Manager; all the new features are not yet available in Enterprise Linux 4, but they are scheduled for release in mid-summer 2005. These features include:

  • Online volume and file system resizing
  • Volume mirroring and I/O multi-pathing
  • Volume snapshots for online file system backup, application testing and restart
  • Data migration from an ailing storage device
  • Clustering software for application failover and data sharing

This article explores these new availability features and how they might be useful to increase the availability of Red Hat Enterprise Linux servers.

Server availability

Server availability is often considered only in the narrow context of migrating applications from one failed server to another. Although application failover technologies are an important component in increasing availability (and are discussed later in the article), Red Hat Enterprise Linux comes with other important features that increase overall system availability by avoiding scheduled as well as unscheduled downtime. This is accomplished via techniques such as volume mirroring and I/O multi-pathing that avoid single points of failure. In addition, techniques such as volume snapshots and data migration allow for additional redundancy and protection against potential faults; more importantly, they can be performed while the Red Hat Enterprise Linux server remains available to run application workloads. In the next section, we discuss these specific volume management features.

Linux Logical Volume Manager (LVM)

The Linux Logical Volume Manager (LVM) is a software tool for managing disks. LVM allows multiple physical disks or partitions to be treated as a single logical disk. Logical Volumes (LVs) are created by partitioning an LVM Volume Group (VG) into separate volumes. Volume groups are formed by aggregating Physical Volumes (PVs). PVs are physical storage devices that are integrated into LVM by labeling them as Physical Volumes (PVs). The three-layer abstraction used in LVM can be seen in Figure 1, LVM three layer abstraction.

LVM three layer abstraction
Figure 1. LVM three layer abstraction

When using LVM, logical volumes replace raw physical partitions. File systems and database tables can be mapped to logical volumes instead of raw partitions. A key advantage of logical volumes over raw partitions is that they can easily be resized on-the-fly without stopping server operations. With this capability, it is possible to add a new storage device, integrate its storage into a logical volume, and then make that storage available to a database or file system that is approaching its maximum size.

A snapshot of a logical volume can be taken at any time. A snapshot preserves an exact copy of the logical volume at the point-in-time when the snapshot is taken. The snapshot volume copy can be used for several purposes. As shown in Figure 2, Logical volume snapshot, a consistent view of the file system or database residing on the snapshot volume can be backed up without any later changes to the file system disrupting the backup. This approach allows a consistent file system or database backup to be taken without disrupting server operations, which increases server availability.

LVM snapshots can be read from, written to, and resized on-the-fly just like standard LVM volumes, allowing great flexibility in their usage. In addition to allowing consistent, non-disruptive backups, snapped volumes can be used for application testing without disrupting current operations. They can also be used to back up to a known, consistent state of the file system or database in case of system errors. Snapshots are also useful in development environments where accessing prior versions of test data or programs before more recent modifications that may have introduced errors is an important capability. LVM snapshots use copy-on-write techniques to reduce the number of block copies between the original volume and its snapshots, so that volumes that are only slightly modified from the original require only an additional 3-5% of disk storage.

LVM snapshot
Figure 2. LVM snapshot

It's important to understand that logical volumes are created by aggregating multiple physical disks into a single logical disk. Red Hat Enterprise Linux has mechanisms to monitor the health and status of its physical storage devices. If a disk is beginning to malfunction, it is prudent to migrate its data to another, healthier disk in the volume group. This can be accomplished using the LVM PVMOVE command, which migrates data from one mounted physical device to another. As shown in Figure 3, PVMOVE, this data movement can be accomplished while the original physical disk is in use by the server, without shutting down server operations during the move, thereby increasing server availability.

PVMOVE
Figure 3. PVMOVE

Red Hat Enterprise Linux 4 provides additional fault tolerance for storage subsystem failures via LVM volume mirroring and multi-pathing. When deployed with Enterprise Linux, these two capabilities significantly increase the availability of Red Hat Enterprise Linux servers.

LVM volumes can be mirrored so that the data on a single logical volume is copied onto up to 32 separate physical volumes. Each physical volume gets a copy of each disk block written to the LV, as shown in Figure 4, Volume mirroring. These mirrors can be used to recover from disk failures and to create point-in-time mirror copies that can be removed from a mirror set and mounted as a separate volume. Also note that combining snapshots with mirroring allows recovery from both disk errors as well as file system or database errors.

Volume mirroring
Figure 4. Volume mirroring

Red Hat Enterprise Linux supports I/O multi-pathing, which allows servers to tolerate failures in storage host bus adapters, storage area network switch paths, and storage array ports. By exploiting storage network path, host bus adapter, and storage port redundancy and routing around failures in any of these system components, I/O multi-pathing increases server uptime. It is also possible to utilize the redundant storage paths to increase the rate of data transfer between the Enterprise Linux server and its storage.

In addition to these new features to increase availability, the latest version of LVM (known as LVM 2, and supported only in Red Hat Enterprise Linux 4 and later releases) supports more physical devices (thousands versus only 256 in prior versions), larger volumes (up to 8 Exabytes in 64-bit systems, and 16 Terabytes in 32-bit systems), and transactional metadata updates to simplify recovery after server crashes. Readable/writable/resizable snapshots, volume mirroring, multi-pathing, and data migration via PVMOVE are only available in LVM 2 and Red Hat Enterprise Linux 4, not in LVM 1.

Increased availability via Red Hat Cluster Suite and GFS

Red Hat Cluster Suite increases application availability by providing an automated way to migrate applications from a one server to another in a Red Hat Enterprise Linux cluster in the event of a hardware or software failure or for purposes of server maintenance as directed by the system administrator. Cluster Suite monitors server availability via heartbeats and application availability via service monitoring: the loss of either server or application availability results in a restart operation for applications depending on the failed server or a restart of the monitored application. Scripts are created to define the steps necessary to both start and stop an application.

Cluster Suite's primary benefit is an automated, scripted, controlled sequence of steps for migrating applications from one server to another for continuation of application operation. This results in increased uptime at low cost using industry-standard hardware components (servers, networks, and storage), building on the availability and ease-of-management capabilities in Red Hat Enterprise Linux and Red Hat Network. The lowest cost hardware configuration can be achieved in Enterprise Linux 4 with Cluster Suite without a Quorum Disk or a storage area network. It is also possible to continue to share data via an NFS server in this configuration.

Cluster Suite is commonly used to provide application fail-over for databases such as Oracle and MySQL, file services via NFS or Samba, and web services via Apache and Red Hat Application Server. Scripts are provided for starting and stopping most of these applications.

GFS cluster
Figure 5. A GFS cluster

It is also possible to share data directly on a storage area with Red Hat Global File System (GFS), a cluster file system for Linux, as shown in Figure 5, A GFS cluster. GFS and Cluster Suite are integrated applications and in Red Hat Enterprise Linux 4 use the same cluster infrastructure software components. System architects can use Cluster Suite without shared storage for applications that do not require high performance or data sharing, but achieve higher performance and data sharing with GFS if required.

Summary

Table 1 summarizes Red Hat Enterprise Linux 4 availability features and the failures conditions that are handled by each feature. The new capabilities in Red Hat Enterprise Linux, if used properly by system implementers, can significantly increase server uptime and availability.

RHEL Feature What the RHEL Feature protects against
  Disk errors Filesystem/ Database corruption HBA/SAN failure Filesystem overflow Server crash Virus/ Application failure
LVM2 Mirroring X          
LVM Snapshots   X       X
Multi-pathing     X      
LVM/ext3fx resize       X    
Cluster Suite         X X
Table 1. Red Hat Enterprise Linux 4 availability features

Further reading

To learn more about availability features in Red Hat Enterprise Linux 4 check the following websites for additional information:

About the author

From 1990 to May 2000, Matthew O'Keefe taught and performed research in storage systems and parallel simulation software as a professor of electrical and computer engineering at the University of Minnesota. He founded Sistina Software in May of 2000 to develop storage infrastructure software for Linux, including the Global File System (GFS) and the Linux Logical Volume Manager (LVM). Sistina was acquired by Red Hat in December 2003, where Matthew now directs storage software strategy.