[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] rhcs upgrade from rh el 5.4 to rh el 5.5 and alb bonding on production lan



Hello,

target is upgrade from rh el 5.4 to 5.5 in a two nodes cluster with quorum disk.
Part of the upgrade is also replacement of the nodes themselves.

Current nodes are using alb bonding for the production lan and the driver for the underlying eth adapters is tg3
Target nodes I'm testing are again with alb bonding for the production lan and the driver for the underlying eth adapters is bnx2

During relocation of service I can see that the node that has released the virtual ip continues to send arp replies to some particular clients on the same lan telling that the virtual ip "is-at" its mac address....
(done via tcpdump on the client itself)

This way these clients are then unable to ping and reach the cluster ip.
Instead it seems that clients on other lans are not influenced (aka the routers are ok... as I also verified in arp table of the router on the lan of nodes)

Even if I do 
arp -d vip

suddenly the arp cache of the client is filled again with the vip associated to the mac of this node that doesn't have the vip anymore
(on this node an "arp -an" correctly says that the mac is the other node one...)
 
If I shutdown the (now) passive node, the arp replies stop and after an "arp -d" on the client, it can then reach the vip...

Current workaround is to set active-backup on the production lan bonding.

I don't experimented the problem on the production cluster, but many changes are involved, such as:
hardware
eth driver from tg3 to bnx2
kernel version updates
rhcs sw updates

Is there any suggested bonding strategy for the production lan of a cluster that has to manage a vip?
Anyone having problems with bnx2, if this one can be the cause..?

What is the program called by the cluster stack when it has to start/stop a vip?
in general I can see the vip with the command 
ip addr list
but not with
ifconfig -a

I don't know if this is due to ifconfig being deprecated or to the way the cluster enables the vip on the bonding interface...

Thanks in advance,
Gianluca
 



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]