[Linux-cluster] IP-based tie-breaker on a 2-node cluster?

gordan at bobich.net gordan at bobich.net
Thu Apr 17 17:06:13 UTC 2008



On Thu, 17 Apr 2008, Andrew Lacey wrote:

>> If you have a spare NIC, and the nodes are next to each other, you could
>> make them use a cross-over cable for their cluster communication, so they
>> would notice that they are both still up even when the switch dies. That's
>> what I do.
>
> I had considered this option but I haven't tried it. One thing I was
> wondering is how the cluster knows which network interface should get the
> cluster service IP address in that situation.

Whichever interface has the IPs on the right subnet. Your public interface 
has the public/fail-over IPs. The private cluster interface has a pair of 
private IPs on a network of their own. No resource groups should be 
assigned to that interface. It's there just for intra-cluster 
communication (e.g. dlm, san/drbd, etc.).

> Right now, I don't have
> anything in my cluster.conf that specifies this, but it just seems to
> work. I figured that if I tried to use a crossover cable, what I would
> need to do is use /etc/hosts to create hostnames on this little private
> network (consisting of just the 2 nodes connected by a cable) and use
> those hostnames as the node hostnames in cluster.conf.

That works.

> If I did that,
> would the cluster services try to assign the cluster service IP to the
> interface with the crossover cable (when obviously what I want is to
> assign it to the outward-facing interface)?

It will assign the IPs to whatever interface already has an IP on that 
subnet. i.e. if your private cluster interface (crossover one) is 
192.168.0.0/16 and your public interface is 10.0.0.0/8, you will have a 
resource group with IPs on the 10.0.0.0/8 subnet, not on the 
192.168.0.0/16 subnet.

You will probably want to add additional monitoring against switch port 
failures here, as otherwise if the switch port of the master node fails 
(it does happen, I've seen many a switch with just 1-2 dead ports), 
the backup will not notice as it can verify that the primary is up and 
responding, and it will not fence it and fail over to itself. You'd end up 
with a working cluster but unavailable service. IIRC there is a 
monitor_link option in the resource spec for this kind of thing.

Gordan




More information about the Linux-cluster mailing list