ProductsDesktop Server For Scientific Computing For IBM POWER For IBM System z For SAP Business Applications Red Hat Network Satellite ManagementExtended Update Support High Availability High Performance Network Load Balancer Resilient Storage Scalable File System Smart Management Extended Lifecycle SupportWeb Server Developer Studio Portfolio Edition JBoss Operations Network FuseSource Integration Products Web Framework Kit Application Platform Data Grid Portal Platform SOA Platform Business Rules Management System (BRMS) Data Services Platform Messaging JBoss Community or JBoss enterprise
SolutionsApplication development Business process management Enterprise application integration Interoperability Operational efficiency Security VirtualizationMigrate to Red Hat Enterprise Linux Systems management Upgrading to Red Hat Enterprise Linux JBoss Enterprise Middleware IBM AIX to Red Hat Enterprise Linux HP-UX to Red Hat Enterprise Linux Solaris to Red Hat Enterprise Linux UNIX to Red Hat Enterprise Linux Start a conversation with Red Hat Migration services
TrainingPopular and new courses JBoss Middleware Administration curriculum Core System Administration curriculum JBoss Middleware Development curriculum Advanced System Administration curriculum Linux Development curriculum Cloud Computing and Virtualization curriculum
ConsultingStandard Operating Environment (SOE) Strategic Migration Planning Service-oriented architecture (SOA) Enterprise Data Solutions Business Process Management
Issue #9 July 2005
- Red Hat GFS now supported with Oracle RAC
- Enterprise data sharing with Red Hat Global File System
- Best practices with Red Hat GFS
- The Linux Logical Volume Manager
- Open source high-availability clustering
- Limiting buffer overflows with ExecShield
- 64-bit computing: Co-existing in a 32-bit world
- Video: Now showing: Red Hat Summit keynotes
- Managing your projects with Planner
- Join Red Hat at Linux conferences in San Francisco and South Wales
- Video: Giving control back to the customer
- Video: Leading French retailer chooses Red Hat for critical sales and customer service system
- FUDCon convenes in Karlsruhe
- Red Hat Enterprise Linux development environment now pre-configured on VMWare
From the Inside
In each Issue
- Editor's blog
- Red Hat speaks
- Ask Shadowman
- Tips & tricks
- Fedora status report
- Magazine archive
Enterprise data sharing with Red Hat Global File System
by Matthew O'Keefe and Paul Kennedy
- GFS architecture and operation
- Important GFS capabilities
- Evolution of GFS and comparison of GFS 6.1 to GFS 6.0
- Additional resources
- About the authors
Red Hat® Global File System (GFS) provides scalable performance and capacity for Red Hat Enterprise Linux® clusters. By using storage area networking and sophisticated clustering techniques provided by GFS and Red Hat Cluster Manager, system administrators can create large Red Hat Enterprise Linux clusters for the most demanding enterprise applications. In this article, we review Red Hat GFS basics, describe the most important GFS commands, and review developments in the latest GFS release (GFS 6.1) from Red Hat.
File system performance and scalability are very important operating system components. Both executable files and data files must be accessed at high speed to ensure system responsiveness and throughput, and yet file systems must also be robust and recoverable under the most common server and storage hardware failure conditions. Today's Linux file systems, such as ext3fs, meet those performance and robustness requirements for a single Linux server but are not designed to meet the file sharing needs of multiple Linux servers. Instead, the Network File System (NFS) is used to export a "local" file system like ext3fs from a file server to a network of Linux NFS clients.
The NFS approach alone works well for simple file sharing between machines but has limited scalability in terms of the number of clients (perhaps a few dozen machines for high-bandwidth applications). In addition, the NFS protocol requires significantly more CPU processing and memory bandwidth per file system operation compared to a local file system like ext3fs because those operations must be packaged and sent back and forth across the network. NFS also has limited server state to simplify recovery, but that approach requires that more operations be performed to get the same work done compared to a local file system. Figures 1 and 2 illustrate the differences between an NFS protocol stack on client and server and a local file system protocol "stack." For additional information on GFS versus NFS, refer to the article Red Hat GFS vs. NFS: Improving performance and scalability.
Combining local file system performance with file and data sharing among machines is very powerful. Cluster file systems like Red Hat Global File System (GFS) are designed to achieve this using block networking protocols like Fibre Channel and iSCSI, cluster membership and locking mechanisms, and well-designed file system metadata structures. Red Hat GFS has the same efficient protocol stack profile as a local file system (as shown in Figure 2. Local file system protocol "stack"), yet allows multiple machines on a storage area network to share files and data. Figure 3. A GFS cluster shows storage devices being shared among Red Hat Enterprise Linux servers in a GFS cluster. In the next section we discuss how Red Hat GFS achieves this.
GFS architecture and operation
Red Hat GFS, like the ext3fs file system, has its own metadata formats that support scalable access by multiple servers to shared storage. Metadata structures are spread across multiple disks to reduce spatial locality and increase parallel access. That approach also reduces contention among multiple servers for the same file system metadata, increasing scalability and performance. One aspect of the distributed metadata is that each node mounting the GFS file system has its own journal, so that journaling operations can proceed in parallel. GFS scales up to 300 servers and yet attains good single node performance.
GFS scales up to 300 servers and yet attains good single node performance.
With Red Hat GFS, the NFS server is effectively replaced by a storage area network and shared access to disk storage. Because all servers can read and write simultaneously to the file system metadata and data on the shared disks, GFS uses the cluster membership, locking, and fencing infrastructure that it shares with the Red Hat Cluster Manager to coordinate shared access to those structures.
For example, before a node writes to a GFS file for the first time, GFS must obtain a write lock for the file. If another node (for example, node 2) is reading from or writing to the file, then the lock manager must inform node 2 to release its lock for the file. Once node 2 has done so, the lock manager gives the write lock to node 1 so that it may operate on the file.
Like any good file system, GFS uses journaling and file caching to improve robustness and performance. The clustering infrastructure monitors node operation to ensure that a node is functioning. If a node or its network connection fails in a way that disconnects the node from the cluster, the membership layer detects that and initiates a fence operation. A fence operation isolates (or fences off) the node from shared storage and resets the node. After its journal is replayed and its lock state is recovered, the fenced node is allowed back into the cluster. The fence and recovery operations can take tens of seconds.
GFS configuration information is specified in the cluster configuration file
/etc/cluster/cluster.conf. This file is created in the process of configuring Cluster Suite using the Cluster Configuration Tool, a process described in more detail in this month's article Open source high availability clustering.
Important GFS capabilities
A complete description of the capabilities and features of Red Hat GFS can be found in the Red Hat GFS 6.1 Administrator's Guide. This section describes several key features and the command line sequences necessary to invoke them.
Making a file system
Creating a Red Hat GFS file system involves invoking the
gfs_mkfs command to initialize a shared disk volume with GFS metadata. In the following example, the
gfs_mkfs command is used to create a file system named
gfs1 within the cluster named
alpha that uses the distributed lock manager (DLM) lock protocol. The file system has eight journals, allowing it to initially support up to eight nodes, and resides on the LVM2 volume named
gfs_mkfs -p lock_dlm -t alpha:gfs1 -j 8 /dev/vg01/lvol0
Mounting a file system
Once a GFS file system is created, its volume activated, and its clustering and locking system started, it can be mounted and accessed. In the following example, the GFS file system (the
-t gfs option indicates the type of file system to be mounted) on
/dev/vg01/lvol0 is mounted on the
mount -t gfs /dev/vg01/lvol0 /gfs1
Growing a file system
Red Hat GFS can be extended while a file system is mounted and in use, increasing system availability by allowing management operations to be performed online. The
gfs_grow command is used to expand a GFS file system after the device where the file system resides has been expanded. Running a
gfs_grow command on an existing GFS file system fills all remaining space between the current end of the file system and the end of the device with a newly initialized GFS file system extension. When the fill operation is completed, the metadata for the file system is updated. All nodes in the cluster can then use the extra storage space that has been added. The
gfs_grow command must be run on a mounted file system, but only needs to be run on one node in a cluster. All the other nodes sense that the expansion has occurred and automatically start using the new space.
In the following example, the
gfs_grow command is used to expand mount point
Certain applications, such as databases, perform their own internal caching operations to improve performance, obviating the need for file system caching operations to hide disk latency. Therefore, it's useful to allow file system caching to be bypassed; this technique is known as direct I/O and is available in Red Hat GFS.
An application invokes GFS direct I/O support by opening a file with the O_DIRECT flag. Alternatively, GFS can attach a direct I/O attribute to a file, in which case direct I/O is used regardless of how the file is opened. When a file is opened with O_DIRECT or when a GFS direct I/O attribute is attached to a file, all I/O operations must be done in block-size multiples of 512 bytes. The memory being read from or written to must also be 512-byte aligned. If an application uses the O_DIRECT flag on an open() system call, direct I/O is used for the opened file.
gfs_tool command can be used to assign (set) the direct I/O attribute flag, directio, for a GFS file, or to clear the
In the following example, the
gfs_tool command is used to set
directio for the file named
datafile in directory
gfs_tool setflag directio /gfs1/datafile
In the following example, the
gfs_tool command is used to clear
directio for the file named
datafile in directory
gfs_tool clearflag directio /gfs1/datafile
Suspending activity on a file system
It is useful to suspend I/O activity to a file system before a point-in-time snapshot is taken of the state of the underlying volume so that the file system is in a consistent state when the snapshot is taken. You can suspend write activity to a file system by using the
gfs_tool freeze command. The
gfs_tool unfreeze command ends the suspension.
gfs_toolcommand suspends writes to file system
gfs_tool freeze /gfs1
In the following example, the
gfs_tool command allows writes to file system
/gfs1 to resume:
gfs_tool unfreeze /gfs1
Evolution of GFS and comparison of GFS 6.1 to GFS 6.0
Red Hat GFS 6.1, the latest release of GFS, is the culmination of nearly a decade of work in the development of a cluster file system. GFS 1.0 was released in 1996 and worked only on the SGI® IRIX® platform  . Earlier versions of GFS experimented with various locking mechanisms, including a lock protocol embedded in the SCSI command set . GFS 3.0 was the first version of GFS to run on Linux. Four major releases later, GFS 6.1 is a mature, scalable, high performance cluster file system with support for distributed locking, sophisticated volume management provided by the Linux Logical Volume Manager 2, and tight integration with Red Hat Enterprise Linux.
Table 1. Comparing GFS 6.0 and GFS 6.1 shows the primary differences between GFS 6.1 and 6.0. GFS 6.1 adds a variety of improvements, including faster
fsck times, an option to withdraw a GFS mount point on certain error conditions instead of forcing a kernel panic, and much tighter integration with Red Hat Cluster Suite and its clustering infrastructure. In addition, the tighter integration with Red Cluster Suite includes a distributed lock manager that has been submitted upstream to the Linux kernel community.
|Red Hat GFS 6.0||Red Hat GFS 6.1|
|Red Hat Enterprise Linux 3 support||Yes||No|
|Red Hat Enterprise Linux 4 support||No||Yes|
|Cluster Suite infrastructure||No||Yes|
|Mount point withdraw||No (panic instead)||Yes|
For more information on Red Hat GFS, refer to the following articles:
 S. Soltis, T. Ruwart, and M. O'Keefe, The Global File System, Fifth NASA Goddard Conference on Mass Storage Systems and Technologies, College Park, MD, September 1996. This paper describes GFS version 1.0.
 Kenneth W. Preslan, et al., 64-bit, Shared Disk File System for Linux, Proceedings of the Seventh NASA Goddard Conference on Mass Storage Systems and Technologies in cooperation with the Sixteenth IEEE Symposium on Mass Storage Systems, San Diego, CA, March 1999. This paper describes GFS version 3.0, the first version to run on Linux.
Additional information can be found in the The Red Hat Cluster Suite, Red Hat GFS 6.0, and Red Hat GFS 6.1 Administrator's Guides.
You can learn more about storage area networking from the book entitled Fibre Channel: Gigabit Communications and I/O for Computer Networks by Alan F. Benner and by visiting the Storage Networking Industry Association website.