[Linux-cluster] IP-based tie-breaker on a 2-node cluster?

Andrew Lacey alacey at brynmawr.edu
Thu Apr 17 18:51:55 UTC 2008


>> Very informative post...thanks! The scenario you mentioned with a dead
>> switch port (or a single unplugged network cable, or whatever) is
>> something I had thought about, and I considered it to be a strike
>> against
>> using a crossover cable.
>
> How does that follow? With a switch in the middle your points of failure
> are:
> cable, switch, cable

The potential problem with the crossover cable design is: Although the
cluster communication goes over the crossover cable, the path to the
switch is used for user connections to the cluster service. Suppose node 1
is active and node 2 is standby. Node 1 loses its connection to the switch
for whatever reason, but node 2 doesn't. Since the heartbeat goes across
the crossover cable, the nodes think nothing is wrong, so no failover
occurs and the service is not reachable to users. If the service had
failed over to node 2 (which can still talk to the switch), it would be
reachable to users.

Eliminating the crossover cable and sending the cluster traffic through
the switch eliminates this problem nicely -- both nodes try to fence, but
node 1 can't reach anything, so node 2 kills node 1 and the service is up
on node 2. But then, of course, you have the pathological case when
neither node can talk to the switch until the downed switch comes back up,
and boom, they both fence each other.

Maybe the monitor_link option in conjunction with the crossover-cable
heartbeat will fix this. I'm in the process of setting that up right now,
so I'll post back when I have a result.

-Andrew L




More information about the Linux-cluster mailing list