White Paper: Piranha - Load-balanced Web and FTP Clusters

Piranha's Design


Piranha is strictly a software HA implementation. (Piranha is the name of the cluster package, and also of the GUI administrative tool for the cluster.) interface to the entire cluster system. The clustering system is quite modular in nature and can be completely configured and run from a text mode, command line interface.

Requests for service from a Piranha cluster are sent to a virtual server address: a unique triplicate consisting of a protocol, a floating IP address, and a port number. Depending on the role it performs, a computer in a Piranha cluster is either a router or a real server. A router receives job requests and redirects those requests the real servers that actually process the requests.

The Piranha clustering system includes the following components:

  • IPVS kernel code (written by Wensong Zhang).

  • lvs daemon to manage the IPVS routing table via the ipvsadm tool.

  • nanny daemon to monitor servers and services on real servers in a cluster.

  • pulse daemon to control the other daemons and handle failovers between IPVS routing boxes.

  • piranha GUI tool to administer and manage the cluster environment.

All of these daemons use the same configuration file, named /etc/lvs.cf by default. Piranha's primary function is to start and stop pulse interactively and to edit the contents of the /etc/lvs.cf config file.

The IPVS code provides the controlling intelligence. It matches incoming network traffic for defined virtual servers and redirects each request to a real server, based on an adaptive scheduling algorithm. The scheduler supports two classes of scheduling, each with a weighted and non-weighted version. There is a basic Round Robin scheduler that simply rotates between all active real servers. The more complex scheduler, Least Connections, keeps a history of open connections to all real servers and sends new requests to the real server with the least number of open connections.

The weighted versions of those two schedulers allow the administrator to mix heterogeneous hardware and OSes as the real servers and have the load redistribution reflect physical differences between the machines. As an additional benefit, the Piranha clustering package will adaptively modify this weight based on the load averages of the real servers.

IPVS supports three types of network configurations. Network Address Translation (NAT), Tunneling and Direct Routing. NAT requires that there be a public address for the virtual server(s) and a private subnet for the real servers. It then uses IP Masquerading for the real servers. Tunneling uses IP encapsulation and reroutes packets to the real servers. This method requires that the real servers support a tunneled device to unencode the packets. Direct routing rewrites the IP header information and then resends the packet directly to the real server.

Each service running on a real server being routed to as a part of a virtual server is monitored by a nanny process running on the active IPVS router. These service monitors follow a two-step process. First, the hardware/network connectivity is checked to ensure that the real server is responding to the network. Second, a connect is sent to the port of the real server that has the monitored service running on it. Once connected, nanny sends a short header request string and checks to make sure that it receives a banner string back. This process is repeated every two seconds. If a sufficient time (configurable) elapses with no successful connects, the real server is assumed dead and is removed from the IPVS routing table. Nanny continues to monitor the real server and when the service has returned and has remained alive for a specified amount of time, the server's place in the IPVS routing table is restored.

The IPVS router is a single point of failure (SPOF) so support for a hot standby node is supported. When configured with a standby, the inactive machine maintains a current copy of the cluster's configuration file (/etc/lvs.cf) and heartbeats across the public network between it and the active IPVS router node. If, after a specified amount of time, the active router fails to respond to heartbeats, the inactive node will execute a failover. The failover process consists of recreating the last known IPVS routing table and stealing the virtual IP(s) that the cluster is responsible for. It also steals the private gateway IP for the real servers. It brings up those IPs on its public and private networks, then sends out gratuitous ARPs to announce the new MAC addresses for the relocated IPs. It also starts up the appropriate nanny process to monitor services on the real servers. Should the failed node return to life, it will announce its return in the form of heartbeats and will become the new inactive hot standby IPVS router.


Prev Table of Contents Next
Clustering Technology   Coverage