[Linux-cluster] Network switch problem

Hi !

We have a cluster of 8 nodes that are splited among 2 gigabit 24 ports network switch. Port one on each server is used for services, and port 2 for the "totem-ring" or cluster communications.

The servers are splited 4 on each switch, with each port configured to the proper vlan. We have a vlan trunk between the switchs.

I need to reboot one or both switch, without interupting the cluster services. In the past (i.e. before there were critical services), I did rebooted a switch and the cluster lost quorum and all services stoped and restarted as the quorum got back. I can live with a minute or so without services as the switch reboot, but not 5 or 10 while the services stops and starts.

Now, to reboot the switch, I plan on adding a 3rd temporary switch just for the cluster vlan, and connect, one by one, the network interfaces to that switch.

So, if I disconnect a the cluster network interface on a node, will that node immediatly be fenced or I have some time, let's say 10 seconds, to complete the reconnect ?

I also see that each node has a tcp connection to the other nodes. So, will the disconnect / reconnect sever complety that connection or will it be retried ?

Thanks for any insights.
