|
Piranha is strictly a software HA implementation. (Piranha is the name of the
cluster package, and also of the GUI administrative tool for the cluster.)
interface to the entire cluster system. The clustering system is quite modular
in nature and can be completely configured and run from a text mode, command
line interface.
Requests for service from a Piranha cluster are sent to a
virtual server address: a unique triplicate consisting of
a protocol, a floating IP address, and a port number. Depending on the role it
performs, a computer in a Piranha cluster is either a router or a real server. A router
receives job requests and redirects those requests the real servers that
actually process the requests.
The Piranha clustering system includes the following components:
- IPVS kernel code (written by Wensong Zhang).
- lvs daemon to manage the IPVS routing table via the ipvsadm tool.
- nanny daemon to monitor servers and services on real servers in a cluster.
- pulse daemon to control the other daemons and handle failovers between IPVS routing boxes.
- piranha GUI tool to administer and manage the cluster environment.
All of these daemons use the same configuration file, named
/etc/lvs.cf by default. Piranha's primary function is to start and stop
pulse interactively and to edit the contents of the /etc/lvs.cf
config file.
The IPVS code provides the controlling intelligence. It
matches incoming network traffic for defined virtual servers and redirects
each request to a real server, based on an adaptive scheduling algorithm.
The scheduler supports two classes of scheduling, each with a weighted and
non-weighted version. There is a basic Round Robin scheduler that simply
rotates between all active real servers. The more complex scheduler,
Least Connections, keeps a history of open connections to all real servers
and sends new requests to the real server with the least number of open
connections.
The weighted versions of those two schedulers allow the administrator to mix
heterogeneous hardware and OSes as the real servers and have the load
redistribution reflect physical differences between the machines. As an
additional benefit, the Piranha clustering package will adaptively modify this
weight based on the load averages of the real servers.
IPVS supports three types of network configurations. Network Address
Translation (NAT), Tunneling and Direct Routing. NAT requires that there be a
public address for the virtual server(s) and a private subnet for the real
servers. It then uses IP Masquerading for the real servers. Tunneling uses
IP encapsulation and reroutes packets to the real servers. This method
requires that the real servers support a tunneled device to unencode the
packets. Direct routing rewrites the IP header information and then resends
the packet directly to the real server.
Each service running on a real server being routed to as a part of a virtual
server is monitored by a nanny process running on the
active IPVS router. These service monitors follow a two-step process. First,
the hardware/network connectivity is checked to ensure that the real server is
responding to the network. Second, a connect is sent to the port of the real
server that has the monitored service running on it. Once connected, nanny
sends a short header request string and checks to make sure that it receives a banner string back.
This process is repeated every two seconds. If a sufficient time (configurable) elapses with no successful
connects, the real server is assumed dead and is removed from the IPVS routing
table. Nanny continues to monitor the real server and when
the service has returned and has remained alive for a specified amount of time,
the server's place in the IPVS routing table is restored.
The IPVS router is a single point of failure (SPOF) so support for a hot
standby node is supported. When configured with a standby, the inactive
machine maintains a current copy of the cluster's configuration file
(/etc/lvs.cf) and heartbeats across the public network
between it and the active IPVS router node. If, after a specified amount of
time, the active router fails to respond to heartbeats, the inactive node will
execute a failover. The failover process consists of recreating the last known
IPVS routing table and stealing the virtual IP(s) that the cluster is
responsible for. It also steals the private gateway IP for the real servers.
It brings up those IPs on its public and private networks, then sends out
gratuitous ARPs to announce the new MAC addresses for the relocated IPs. It
also starts up the appropriate nanny process to monitor
services on the real servers. Should the failed node return to life, it will
announce its return in the form of heartbeats and will become the new inactive
hot standby IPVS router.
|