Keepalived and high availability: Advanced topics
If you read my first article on using Keepalived for managing simple failover in clusters, then you will recall that VRRP
uses the concept of a priority when determining which server will be the active master. The server with the highest priority “wins” and will act as the master, holding onto the VIP and servicing requests. Keepalived
provides several useful methods to adjust priority based on the state of your system. In this article, you will explore several of these mechanisms, along with Keepalived
’s ability to run scripts when a server’s state changes.
I will only be showing the configuration on server1 for these examples. At this point, you are probably comfortable with the configuration needed on server2 if you have been reading the entire series. If not, take a moment to review the first and second articles of this series before continuing.
- Using Keepalived for managing simple failover in clusters
- Setting up a Linux cluster with Keepalived: Basic configuration
Network symbols in the diagrams available via VRT Network Equipment Extension, CC BY-SA 3.0.
Keepalived
does a great job of triggering a failover when advertisements aren’t received, such as when the active master dies completely or is unreachable for some other reason. However, you will often find that more fine-grained trigger mechanisms are necessary. For example, your application may run its own health checks to determine the ability of the app to service client requests. You wouldn’t want an unhealthy app server to remain the active master just because it was alive and sending VRRP
advertisements.
Note: I found that the version of Keepalived
available via the standard package repositories contained bugs that prevented some of the below examples from working properly. If you run into issues, you may want to install Keepalived
from source, as described in the previous article.
Tracking processes
One of the most common Keepalived
setups involves tracking a process on the server to determine the health of the host. For example, you might set up a pair of highly available webservers and trigger a failover if Apache stops running on one of them.
Keepalived
makes this easy through its track_process
configuration directives. In the example below, I’ve set up Keepalived
to watch the httpd
process with a weight of 10. As long as httpd
is running, the advertised priority will be 254 (244 + 10 = 254). If httpd
stops running, then the priority will drop to 244 and trigger a failover (assuming that a similar configuration exists on server2).
server1# cat keepalived.conf
vrrp_track_process track_apache {
process httpd
weight 10
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 244
advert_int 1
authentication {
auth_type PASS
auth_pass 12345
}
virtual_ipaddress {
192.168.122.200/24
}
track_process {
track_apache
}
}
With this configuration in place (and Apache installed and running on both servers), you can test out a failover scenario by stopping Apache and watching the VIP move from server1 to server2:
server1# ip -br a
lo UNKNOWN 127.0.0.1/8 ::1/128
eth0 UP 192.168.122.101/24 192.168.122.200/24 fe80::5054:ff:fe82:d66e/64
server1# systemctl stop httpd
server1# ip -br a
lo UNKNOWN 127.0.0.1/8 ::1/128
eth0 UP 192.168.122.101/24 fe80::5054:ff:fe82:d66e/64
server2# ip -br a
lo UNKNOWN 127.0.0.1/8 ::1/128
eth0 UP 192.168.122.102/24 192.168.122.200/24 fe80::5054:ff:fe04:2c5d/64
Tracking files
Keepalived
also has the ability to make priority decisions based on the contents of a file, which can be useful if you’re running an application that can write values to this file. For example, you might have a background process in your app that periodically performs a health check and writes a value to a file based on the overall health of the application.
The Keepalived
man page explains that file tracking is based on the configured weight for the file:
“value will be read as a number in text from the file. If the weight configured against the track_file is 0, a non-zero value in the file will be treated as a failure status, and a zero value will be treated as an OK status, otherwise the value will be multiplied by the weight configured in the track_file statement. If the result is less than -253 any VRRP instance or sync group monitoring the script will transition to the fault state (the weight can be 254 to allow for a negative value being read from the file).”
I will keep things simple and use a weight of 1 for the track file in this example. This configuration will take the numerical value in the file at /var/run/my_app/vrrp_track_file
and multiply it by 1.
server1# cat keepalived.conf
vrrp_track_file track_app_file {
file /var/run/my_app/vrrp_track_file
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 244
advert_int 1
authentication {
auth_type PASS
auth_pass 12345
}
virtual_ipaddress {
192.168.122.200/24
}
track_file {
track_app_file weight 1
}
}
You can now create the file with a starting value and restart Keepalived
. The priority can be seen in tcpdump
output, as discussed in the second article of this series.
server1# mkdir /var/run/my_app
server1# echo 5 > /var/run/my_app/vrrp_track_file
server1# systemctl restart keepalived
server1# tcpdump proto 112
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
16:19:32.191562 IP server1 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 249, authtype simple, intvl 1s, length 20
You can see that the advertised priority is 249, which is the value in the file (5) multiplied by the weight (1) and added to the base priority (244). Similarly, adjusting the priority to 6 will increase the priority:
server1# echo 6 > /var/run/my_app/vrrp_track_file
server1# tcpdump proto 112
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
16:20:43.214940 IP server1 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 250, authtype simple, intvl 1s, length 20
Track interface
For servers with multiple interfaces, it can be useful to adjust the priority of the Keepalived
instance based on the status of an interface. For example, a load balancer with a frontend VIP and a backend connection to an internal network might want to trigger a Keepalived
failover if the connection to the backend network goes down. This can be accomplished with the track_interface configuration:
server1# cat keepalived.conf
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 244
advert_int 1
authentication {
auth_type PASS
auth_pass 12345
}
virtual_ipaddress {
192.168.122.200/24
}
track_interface {
ens9 weight 5
}
}
The configuration above assigns a weight of 5 to the status of interface ens9. This will cause server1 to assume a priority of 249 (244 + 5 = 249) as long as ens9 is up. If ens9 goes down, then the priority will drop down to 244 (and trigger a failover, assuming that server2 is configured in the same way). You can test this on a multi-interface server by turning down an interface and watching the VIP move between hosts:
server1# ip -br a
lo UNKNOWN 127.0.0.1/8 ::1/128
eth0 UP 192.168.122.101/24 192.168.122.200/24 fe80::5054:ff:fe82:d66e/64
ens9 UP 192.168.122.15/24 fe80::7444:5ec4:8015:722f/64
server1# ip link set ens9 down
server1# ip -br a
lo UNKNOWN 127.0.0.1/8 ::1/128
eth0 UP 192.168.122.101/24 fe80::5054:ff:fe82:d66e/64
ens9 DOWN
server2# ip -br a
lo UNKNOWN 127.0.0.1/8 ::1/128
ens9 UP 192.168.122.119/24 fe80::fc9f:8999:b93e:d491/64
eth0 UP 192.168.122.102/24 192.168.122.200/24 fe80::5054:ff:fe04:2c5d/64
Track script
You’ve seen that Keepalived
offers plenty of useful built-in check methods for determining the health and subsequent VRRP
priority of a host. However, sometimes more complex environments require the use of custom tooling, such as health check scripts, to meet their needs. Thankfully, Keepalived
also has the ability to run an arbitrary script to determine the health of a host. You can adjust the weight of the script, but I’m going to keep things simple for this example: a script that returns 0 will indicate success, while a script that returns anything else will indicate that the Keepalived
instance should enter the fault state.
The script is a simple ping to everyone’s favorite 8.8.8.8
Google DNS server, as seen below. In your environment, you will likely use a more complex script to perform whatever health checks you need.
server1# cat /usr/local/bin/keepalived_check.sh
#!/bin/bash
/usr/bin/ping -c 1 -W 1 8.8.8.8 > /dev/null 2>&1
You will notice that I used a timeout of 1 second for ping
(-W 1). When writing Keepalived
check scripts, it’s a good idea to keep them lightweight and fast. You don’t want a broken server staying the master for a long time because your script is slow.
The Keepalived
configuration for a check script is shown below:
server1# cat keepalived.conf
vrrp_script keepalived_check {
script "/usr/local/bin/keepalived_check.sh"
interval 1
timeout 5
rise 3
fall 3
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 244
advert_int 1
authentication {
auth_type PASS
auth_pass 12345
}
virtual_ipaddress {
192.168.122.200/24
}
track_script {
keepalived_check
}
}
This looks a lot like the configuration that you’ve been working with, but the vrrp_script
block has a few unique directives:
interval
: How often the script should be run (1 second).timeout
: How long to wait for the script to return (5 seconds).rise
: How many times the script must return successfully in order for the host to be considered “healthy.” In this example, the script must return successfully 3 times. This helps to prevent a “flapping” condition where a single failure (or success) causes theKeepalived
state to quickly flip back and forth.fall
: How many times the script must return unsuccessfully (or time out) in order for the host to be considered “unhealthy.” This functions as the reverse of the rise directive.
You can test this configuration by forcing the script to fail. In the example below, I added an iptables
rule that prevents communication with 8.8.8.8
. This caused the healthcheck to fail and the VIP to disappear after a few seconds. I can then remove the rule and watch the VIP re-appear.
server1# iptables -I OUTPUT -d 8.8.8.8 -j DROP
server1# ip -br a
lo UNKNOWN 127.0.0.1/8 ::1/128
eth0 UP 192.168.122.101/24 fe80::5054:ff:fe82:d66e/64
server1# iptables -D OUTPUT -d 8.8.8.8 -j DROP
server1# ip -br a
lo UNKNOWN 127.0.0.1/8 ::1/128
eth0 UP 192.168.122.101/24 192.168.122.200/24 fe80::5054:ff:fe82:d66e/64
A quick tip about scripts in Keepalived
: They can be run as a different user besides root. While I didn’t demonstrate that in these examples, take a look at the man page and ensure that you’re using the least privileged user possible to avoid any negative security implications from your check script.
Notify scripts
I’ve been discussing ways to trigger Keepalived
responses based on external conditions. However, you probably also want to trigger actions when Keepalived
transitions from one state to another. For example, you might want to stop a service when Keepalived
enters the backup state, or you might want to kick off an email to an administrator. Keepalived
allows you to do this with notify scripts.
Keepalived
provides several notify directives for only calling scripts on particular states (notify_master
, notify_backup
, etc), but I’m going to focus on the bare notify
directive as it is the most flexible. When a script in the notify
directive is called, it receives four additional arguments (after any arguments that are passed to the script itself).
Listed in order, these are:
- Group or instance: Indication of whether the notify is triggered by a
VRRP
group (not discussed in this series) or a particularVRRP
instance. - Name of the group or instance
- State that the group or instance is transitioning into
- The priority
Taking a look at an example makes this more clear. The script and Keepalived
configuration looks like this:
server1# cat /usr/local/bin/keepalived_notify.sh
#!/bin/bash
echo "$1 $2 has transitioned to the $3 state with a priority of $4" > /var/run/keepalived_status
server1# cat keepalived.conf
vrrp_script keepalived_check {
script "/usr/local/bin/keepalived_check.sh"
interval 1
timeout 5
rise 3
fall 3
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 244
advert_int 1
authentication {
auth_type PASS
auth_pass 12345
}
virtual_ipaddress {
192.168.122.200/24
}
track_script {
keepalived_check
}
notify "/usr/local/bin/keepalived_notify.sh"
}
The above configuration will call the /usr/local/bin/keepalived_notify.sh
script each time a Keepalived
state transition occurs. Since the same check script is in place, you can easily inspect the initial state and then trigger a transition:
server1# cat /var/run/keepalived_status
INSTANCE VI_1 has transitioned to the MASTER state with a priority of 244
server1# iptables -A OUTPUT -d 8.8.8.8 -j DROP
server1# cat /var/run/keepalived_status
INSTANCE VI_1 has transitioned to the FAULT state with a priority of 244
server1# iptables -D OUTPUT -d 8.8.8.8 -j DROP
server1# cat /var/run/keepalived_status
INSTANCE VI_1 has transitioned to the MASTER state with a priority of 244
You can see that the command line arguments correspond to the ones that I described at the beginning of this section. Obviously this is a simple example, but notify scripts can perform plenty of complex actions, such as adjusting routing rules or triggering other scripts. They’re a useful way to take external actions based on Keepalived
state changes.
Wrapping up
This article closed out a foundational Keepalived
series with some advanced concepts. You learned how to trigger Keepalived
priority and state changes based on external events, such as process status, interface changes, and even the results of external scripts. You also learned how to trigger notify scripts in response to Keepalived
state changes. You can combine two or more of these approaches to build a highly available pair of Linux servers that respond to multiple external stimuli and ensure that traffic always reaches a healthy IP address that can serve client requests.
[ Want to learn more about system administration? Take a free online course: Red Hat Enterprise Linux technical overview. ]
Anthony Critelli
Anthony Critelli is a Linux systems engineer with interests in automation, containerization, tracing, and performance. He started his professional career as a network engineer and eventually made the switch to the Linux systems side of IT. He holds a B.S. and an M.S. More about me