Issue #9 July 2005

Enterprise data sharing with Red Hat Global File System

Introduction

Red Hat® Global File System (GFS) provides scalable performance and capacity for Red Hat Enterprise Linux® clusters. By using storage area networking and sophisticated clustering techniques provided by GFS and Red Hat Cluster Manager, system administrators can create large Red Hat Enterprise Linux clusters for the most demanding enterprise applications. In this article, we review Red Hat GFS basics, describe the most important GFS commands, and review developments in the latest GFS release (GFS 6.1) from Red Hat.

File system performance and scalability are very important operating system components. Both executable files and data files must be accessed at high speed to ensure system responsiveness and throughput, and yet file systems must also be robust and recoverable under the most common server and storage hardware failure conditions. Today's Linux file systems, such as ext3fs, meet those performance and robustness requirements for a single Linux server but are not designed to meet the file sharing needs of multiple Linux servers. Instead, the Network File System (NFS) is used to export a "local" file system like ext3fs from a file server to a network of Linux NFS clients.

The NFS approach alone works well for simple file sharing between machines but has limited scalability in terms of the number of clients (perhaps a few dozen machines for high-bandwidth applications). In addition, the NFS protocol requires significantly more CPU processing and memory bandwidth per file system operation compared to a local file system like ext3fs because those operations must be packaged and sent back and forth across the network. NFS also has limited server state to simplify recovery, but that approach requires that more operations be performed to get the same work done compared to a local file system. Figures 1 and 2 illustrate the differences between an NFS protocol stack on client and server and a local file system protocol "stack." For additional information on GFS versus NFS, refer to the article Red Hat GFS vs. NFS: Improving performance and scalability.

Figure 1. NFS protocol stack on client and server
Figure 2. Local file system protocol "stack"

Combining local file system performance with file and data sharing among machines is very powerful. Cluster file systems like Red Hat Global File System (GFS) are designed to achieve this using block networking protocols like Fibre Channel and iSCSI, cluster membership and locking mechanisms, and well-designed file system metadata structures. Red Hat GFS has the same efficient protocol stack profile as a local file system (as shown in Figure 2. Local file system protocol "stack"), yet allows multiple machines on a storage area network to share files and data. Figure 3. A GFS cluster shows storage devices being shared among Red Hat Enterprise Linux servers in a GFS cluster. In the next section we discuss how Red Hat GFS achieves this.

Figure 3. A GFS cluster

GFS architecture and operation

Red Hat GFS, like the ext3fs file system, has its own metadata formats that support scalable access by multiple servers to shared storage. Metadata structures are spread across multiple disks to reduce spatial locality and increase parallel access. That approach also reduces contention among multiple servers for the same file system metadata, increasing scalability and performance. One aspect of the distributed metadata is that each node mounting the GFS file system has its own journal, so that journaling operations can proceed in parallel. GFS scales up to 300 servers and yet attains good single node performance.

GFS scales up to 300 servers and yet attains good single node performance.

With Red Hat GFS, the NFS server is effectively replaced by a storage area network and shared access to disk storage. Because all servers can read and write simultaneously to the file system metadata and data on the shared disks, GFS uses the cluster membership, locking, and fencing infrastructure that it shares with the Red Hat Cluster Manager to coordinate shared access to those structures.

For example, before a node writes to a GFS file for the first time, GFS must obtain a write lock for the file. If another node (for example, node 2) is reading from or writing to the file, then the lock manager must inform node 2 to release its lock for the file. Once node 2 has done so, the lock manager gives the write lock to node 1 so that it may operate on the file.

Like any good file system, GFS uses journaling and file caching to improve robustness and performance. The clustering infrastructure monitors node operation to ensure that a node is functioning. If a node or its network connection fails in a way that disconnects the node from the cluster, the membership layer detects that and initiates a fence operation. A fence operation isolates (or fences off) the node from shared storage and resets the node. After its journal is replayed and its lock state is recovered, the fenced node is allowed back into the cluster. The fence and recovery operations can take tens of seconds.

GFS configuration information is specified in the cluster configuration file /etc/cluster/cluster.conf. This file is created in the process of configuring Cluster Suite using the Cluster Configuration Tool, a process described in more detail in this month's article Open source high availability clustering.

Important GFS capabilities

A complete description of the capabilities and features of Red Hat GFS can be found in the Red Hat GFS 6.1 Administrator's Guide. This section describes several key features and the command line sequences necessary to invoke them.

Making a file system

Creating a Red Hat GFS file system involves invoking the gfs_mkfs command to initialize a shared disk volume with GFS metadata. In the following example, the gfs_mkfs command is used to create a file system named gfs1 within the cluster named alpha that uses the distributed lock manager (DLM) lock protocol. The file system has eight journals, allowing it to initially support up to eight nodes, and resides on the LVM2 volume named /dev/vg01/lvol0.

gfs_mkfs -p lock_dlm -t alpha:gfs1 -j 8 /dev/vg01/lvol0

Mounting a file system

Once a GFS file system is created, its volume activated, and its clustering and locking system started, it can be mounted and accessed. In the following example, the GFS file system (the -t gfs option indicates the type of file system to be mounted) on /dev/vg01/lvol0 is mounted on the /gfs1 directory:

mount -t gfs /dev/vg01/lvol0 /gfs1

Growing a file system

Red Hat GFS can be extended while a file system is mounted and in use, increasing system availability by allowing management operations to be performed online. The gfs_grow command is used to expand a GFS file system after the device where the file system resides has been expanded. Running a gfs_grow command on an existing GFS file system fills all remaining space between the current end of the file system and the end of the device with a newly initialized GFS file system extension. When the fill operation is completed, the metadata for the file system is updated. All nodes in the cluster can then use the extra storage space that has been added. The gfs_grow command must be run on a mounted file system, but only needs to be run on one node in a cluster. All the other nodes sense that the expansion has occurred and automatically start using the new space.

In the following example, the gfs_grow command is used to expand mount point /gfs1:

gfs_grow /gfs1

Direct I/O

Certain applications, such as databases, perform their own internal caching operations to improve performance, obviating the need for file system caching operations to hide disk latency. Therefore, it's useful to allow file system caching to be bypassed; this technique is known as direct I/O and is available in Red Hat GFS.

An application invokes GFS direct I/O support by opening a file with the O_DIRECT flag. Alternatively, GFS can attach a direct I/O attribute to a file, in which case direct I/O is used regardless of how the file is opened. When a file is opened with O_DIRECT or when a GFS direct I/O attribute is attached to a file, all I/O operations must be done in block-size multiples of 512 bytes. The memory being read from or written to must also be 512-byte aligned. If an application uses the O_DIRECT flag on an open() system call, direct I/O is used for the opened file.

The gfs_tool command can be used to assign (set) the direct I/O attribute flag, directio, for a GFS file, or to clear the directio flag.

In the following example, the gfs_tool command is used to set directio for the file named datafile in directory /gfs1/:

gfs_tool setflag directio /gfs1/datafile

In the following example, the gfs_tool command is used to clear directio for the file named datafile in directory /gfs1/:

gfs_tool clearflag directio /gfs1/datafile

Suspending activity on a file system

It is useful to suspend I/O activity to a file system before a point-in-time snapshot is taken of the state of the underlying volume so that the file system is in a consistent state when the snapshot is taken. You can suspend write activity to a file system by using the gfs_tool freeze command. The gfs_tool unfreeze command ends the suspension.

In the following example, the gfs_tool command suspends writes to file system /gfs1:
gfs_tool freeze /gfs1

In the following example, the gfs_tool command allows writes to file system /gfs1 to resume:

gfs_tool unfreeze /gfs1

Evolution of GFS and comparison of GFS 6.1 to GFS 6.0

Red Hat GFS 6.1, the latest release of GFS, is the culmination of nearly a decade of work in the development of a cluster file system. GFS 1.0 was released in 1996 and worked only on the SGI® IRIX® platform [1] . Earlier versions of GFS experimented with various locking mechanisms, including a lock protocol embedded in the SCSI command set [2]. GFS 3.0 was the first version of GFS to run on Linux. Four major releases later, GFS 6.1 is a mature, scalable, high performance cluster file system with support for distributed locking, sophisticated volume management provided by the Linux Logical Volume Manager 2, and tight integration with Red Hat Enterprise Linux.

Table 1. Comparing GFS 6.0 and GFS 6.1 shows the primary differences between GFS 6.1 and 6.0. GFS 6.1 adds a variety of improvements, including faster fsck times, an option to withdraw a GFS mount point on certain error conditions instead of forcing a kernel panic, and much tighter integration with Red Hat Cluster Suite and its clustering infrastructure. In addition, the tighter integration with Red Cluster Suite includes a distributed lock manager that has been submitted upstream to the Linux kernel community.

  Red Hat GFS 6.0 Red Hat GFS 6.1
Red Hat Enterprise Linux 3 support Yes No
Red Hat Enterprise Linux 4 support No Yes
LVM2 support No Yes
Pool support Yes No
Improved fsck No Yes
DLM support No Yes
GULM support Yes Yes
Cluster Suite infrastructure No Yes
Mount point withdraw No (panic instead) Yes
Table 1. Comparing GFS 6.0 to GFS 6.1

Additional resources

For more information on Red Hat GFS, refer to the following articles:

References

[1] S. Soltis, T. Ruwart, and M. O'Keefe, The Global File System, Fifth NASA Goddard Conference on Mass Storage Systems and Technologies, College Park, MD, September 1996. This paper describes GFS version 1.0.

[2] Kenneth W. Preslan, et al., 64-bit, Shared Disk File System for Linux, Proceedings of the Seventh NASA Goddard Conference on Mass Storage Systems and Technologies in cooperation with the Sixteenth IEEE Symposium on Mass Storage Systems, San Diego, CA, March 1999. This paper describes GFS version 3.0, the first version to run on Linux.

Additional information can be found in the The Red Hat Cluster Suite, Red Hat GFS 6.0, and Red Hat GFS 6.1 Administrator's Guides.

You can learn more about storage area networking from the book entitled Fibre Channel: Gigabit Communications and I/O for Computer Networks by Alan F. Benner and by visiting the Storage Networking Industry Association website.

About the authors

From 1990 to May 2000, Matthew O'Keefe taught and performed research in storage systems and parallel simulation software as a professor of electrical and computer engineering at the University of Minnesota. He founded Sistina Software in May of 2000 to develop storage infrastructure software for Linux, including the Global File System (GFS) and the Linux Logical Volume Manager (LVM). Sistina was acquired by Red Hat in December 2003, where Matthew now directs storage software strategy.

Paul Kennedy is a technical writer at Red Hat. He is the primary writer and maintainer of Red Hat GFS technical documentation and a contributing writer to the Red Hat Cluster Suite technical documentation. Paul joined Red Hat in December 2003 as part of the Sistina Software acquisition.