I have a direct-routed Piranha/LVS cluster that I would like to load balance between real servers svr1 and svr2. As a test I’m using the telnet service. For my setup, I’m using WRR, no load monitoring right now, direct-route, weights of 1 on both svr1 and svr2, using virtual ip address of test_vip. Also have I have ip_forward turned on on my router machine (say it’s on host rtr1). Kernel: “2.6.9-42.0.10.ELsmp #1 SMP Fri Feb 16 17:17:21 EST 2007”. I also have an iptables rule in the nat table, PREROUTING chain for my telnet service and test_vip on both real servers saying to “REDIRECT”.
If I get on another machine, say client1, and telnet to test_vip, everything works fine … meaning each subsequent connection goes to the next server in the cluster: svr1, svr2, svr1, svr2, …
However, if I run my telnet client from one of my real servers, say svr2 connecting to my service’s virtual IP, test_vip, I will get connected to the other real server (svr1) fine when the round-robin schedule says I should go there, but when the schedule points me back to my host (svr2), my connection hangs. ipvsadm -l reports that a connection did go through the ipvs tables, but it does not connect.
Same thing will happen if I start from svr1…. Works to svr2, but hangs trying to come back to svr1, the source machine.
Bottom line: it works if client is run from non-real-server machines, and half the time if client is run from a real server machine but the other half of the time (when redirected to itself), it hangs. (I recognize “half” is because I just have 2 real servers.)
Any clues as to why it won’t work consistently when balanced back to itself?
(Extra note: I say it works from non-real-server machines… not completely true: doesn’t work from the router machine, rtr1, as that always goes back to itself, rtr1, ignoring the ipvs tables apparently as ipvsadm -l never shows the connection in its statistics… Apparently the virtual IP address being local short-circuits the ipvs table operations).