[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] IP-based tie-breaker on a 2-node cluster?





On Thu, 17 Apr 2008, Andrew Lacey wrote:

If you have a spare NIC, and the nodes are next to each other, you could
make them use a cross-over cable for their cluster communication, so they
would notice that they are both still up even when the switch dies. That's
what I do.

I had considered this option but I haven't tried it. One thing I was
wondering is how the cluster knows which network interface should get the
cluster service IP address in that situation.

Whichever interface has the IPs on the right subnet. Your public interface has the public/fail-over IPs. The private cluster interface has a pair of private IPs on a network of their own. No resource groups should be assigned to that interface. It's there just for intra-cluster communication (e.g. dlm, san/drbd, etc.).

Right now, I don't have
anything in my cluster.conf that specifies this, but it just seems to
work. I figured that if I tried to use a crossover cable, what I would
need to do is use /etc/hosts to create hostnames on this little private
network (consisting of just the 2 nodes connected by a cable) and use
those hostnames as the node hostnames in cluster.conf.

That works.

If I did that,
would the cluster services try to assign the cluster service IP to the
interface with the crossover cable (when obviously what I want is to
assign it to the outward-facing interface)?

It will assign the IPs to whatever interface already has an IP on that subnet. i.e. if your private cluster interface (crossover one) is 192.168.0.0/16 and your public interface is 10.0.0.0/8, you will have a resource group with IPs on the 10.0.0.0/8 subnet, not on the 192.168.0.0/16 subnet.

You will probably want to add additional monitoring against switch port failures here, as otherwise if the switch port of the master node fails (it does happen, I've seen many a switch with just 1-2 dead ports), the backup will not notice as it can verify that the primary is up and responding, and it will not fence it and fail over to itself. You'd end up with a working cluster but unavailable service. IIRC there is a monitor_link option in the resource spec for this kind of thing.

Gordan


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]