White Paper: Piranha - Load-balanced Web and FTP Clusters

January 1, 2010

Mike WangsmoRed Hat Software
wanger@redhat.com

Clustering is a commonly used phrase these days in computing technology circles. Unfortunately, if you ask 10 people what clustering is, you will likely get nine unique answers. Based on that premise, the first part of this document gives a brief overview of general clustering principles. This quick review is not meant to be a replacement for deep discussions of clustering. For a much more thorough understanding of clusters, please see In Search of Clusters by Gregory Pfister.

Table of contents

 

Clustering Technology

Clustering, by definition, implies two or more computers working in tight conjunction with each other to perform the work that a single machine would normally handle. This, on the surface, might seem wasteful, however the tradeoff can be increases in performance for some problem types, increases in overall system reliability or increases in load distribution. There are essentially five primary types of clustering.

  • HA (high Availability)
  • Fault Tolerance
  • (Massive Parallel Processing)
  • SMP Clusters
  • NUMA (Non-Uniform Memory Access)

These categories are not to be considered mutually exclusive classes, rather they should be thought of as unique properties that define a type of clustering. A cluster can be build that pulls features from any and possibly all of these types. As a general rule of thumb, a cluster will have its primary feature set pulled from a single type of clustering.

HA clusters are most uniquely defined by the use of redundancy throughout the system. Generally, HA clusters rely on software to achieve the goal of uninterrupted services. A portion of this software work is done in the kernel itself as well as a larger portion being done in userland libraries. Other places where this level of availability is done is through the filesystem layers.

Fault tolerance works to achieve a similar goal of continuous availability. However, fault tolerance is usually applied to having the redundancy built into the system's hardware rather than the software. This type of clustering could be argued to not even be a "cluster" since it is generally a single machine whose internal components redundant internally. When combined with an operating system that has clustering extensions, this environment can prove to be virtually continuously available.

MPP is a class of computing that links many individual computing nodes together into a singular computing machine. These clusters are designed to provide a highly scalable environment to solve computational problems whose components can be worked on in parallel. In order to be effective, the software that is run on these clusters requires very careful work to ensure the highest levels of parallelism while still remaining coherent as a single problem.

SMP clusters are a relatively new schema in clusters. These types of clusters attempt to combine features of HA and fault tolerance into a single architecture. There is a limit to the general useful scalability of SMP machines to provide increasing computing power. However, by running multiple instances of the kernel on groups of CPUs and I/O subsystems within the same physical machine, a clustered environment is created. This design allows the OS to run on smaller SMP configurations (usually 4-8 CPUs) which reduces the amount of overhead associated with SMPs that are 64 CPUs or greater.

NUMA (and its variations) have been around for a long time. The general design schema for NUMA is to combine computing nodes into a larger cluster via memory bus interconnects. The effect of this type of connection is very high data access between nodes (all nodes can directly address all memory segments). This model is generally considered highly scalable, however it tends to have very complex caching issues. It is also subject to performance hits when the locality of reference for memory access is not observed.