[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]
Foundry ServerIron and Compay DS10/Linux - SYN Flood Troubles
- From: Michael Rommel <rommel erlm siemens de>
- To: axp-list redhat com
- Cc: fuh003 erlm siemens de, sven erlm siemens de
- Subject: Foundry ServerIron and Compay DS10/Linux - SYN Flood Troubles
- Date: Thu, 27 Jan 2000 14:34:12 +0100
Hi there,
we experience an interesting problem here, currently under heavy
investigation, but I thought I'd share it with you in case
it rings a bell...
Abstract
A load-balancing device suddenly stops sending incoming client
requests on to a particular server. Testing the functionality
of the server from a third workstation works fine. Netstat -an on
the server shows an unusual high amount of tcp connections in
SYN_RECV state.
Setup
+-------------------+ +--------------------+
| Catalyst + + +----------+ Router +-----> Internet
+-------------|-|---+ +--------------------+
| +----------------+
+-------------|-----+ +------+-------------+
| Foundry + + | | Client |
+-----------|-------+ +--------------------+
|
+-----------|-------+
| C2948 + + + + + |
+---------|---|-|-|-+
| | | +--------------+
+---------+---|-|---+ +------+-------------+
| sikinos | | | | santorini |
+-------------|-|---+ +--------------------+
| +----------------+
+-------------+-----+ +------+-------------+
| syros | | emporio |
+-------------------+ +--------------------+
Normal operation
The normal operation is as follows:
The client sends a HTTP Request to the virtual IP Address of
the Foundry (e.g. GET http://www.foundrynetworks.com/ HTTP/1.1)
The Foundry translates the destination IP field of
the incoming tcp packet to one of the real caches
(sikinos,santorini,emporio or syros) and forwards
the packet there and remembers the connection somewhere.
For easier understanding let's assume that the object is in the
cache, the server then fetches the object from disk storage and
sends it back to the client.
The response from the cache passes the Foundry, the
source address gets translated back and that's about it.
To know when a real cache is down for maintenance or whatever
other reason, the Foundry periodically checks the function of
each real cache. It uses an HTTP connection and retrieves an
object from the cache. If the response code of the real cache is
200-400 everything is okay. Otherwise the real cache is marked
dead and excluded from the round-robin algorithm.
Faulty operation
At the moment we experience the following:
The testing of a real server fails. The foundry cannot retrieve
an object from a real server. Furhter debugging showed, that
while opening a TCP connection the whole of the TCP handshake
(SYN, SYN_ACK, ACK) is not completed.
Sender's (Foundry) view: After sending the SYN packet, the SYN_ACK
is not received on time, the Foundry sends a RST packet and
assumes the connection is taken down at the other end as well.
(Which is not true :-| btw.) It then tries to open the next
connection USING THE SAME SOURCE PORT 1024 !!! and fails again
and again and again and ... you get the picture.
A dump of the tcp statistics of the foundry lokks like this, but
I cannot yet comment on whether the counters are within normal range
or not :-(
# sh ip traf
TCP Statistics
1 current active tcbs, 67378 tcbs allocated, 67374 tcbs freed
66155 active opens, 0 passive opens, 46 failed attempts
181366 active resets, 65045 passive resets, 620 input errors
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
maybe these counters are too high
276911 in segment, 146132 out segment, 836 retransmission
keepalive: close connection 0, failure callback 0
tcp connect: connection exist 0, out of tcb 0
keepalive: close connection 0, failure callback 0
tcp connect: connection exist 0, out of tcb 0
The system uptime is 20 hours 37 minutes 36 seconds
Receiver's (emporio) view: After receiving the SYN packet, the
system sends out an SYN_ACK packet, receives an RST packet. What
it does is not yet clearly understood, I'll debug that tomorrow,
but the results are:
netstat -an shows a maximum of 128 tcp connections in SYN_RECV
state, all coming from the same source address AND port, like
that:
# netstat -an |grep SYN
tcp 0 0 193.98.99.158:81 193.98.99.47:1024 SYN_RECV
tcp 0 0 193.98.99.158:81 193.98.99.47:1024 SYN_RECV
tcp 0 0 193.98.99.158:81 193.98.99.47:1024 SYN_RECV
tcp 0 0 193.98.99.158:81 193.98.99.47:1024 SYN_RECV
tcp 0 0 193.98.99.158:81 193.98.99.47:1024 SYN_RECV
tcp 0 0 193.98.99.158:81 193.98.99.47:1024 SYN_RECV
tcp 0 0 193.98.99.158:81 193.98.99.47:1024 SYN_RECV
tcp 0 0 193.98.99.158:81 193.98.99.47:1024 SYN_RECV
tcp 0 0 193.98.99.158:81 193.98.99.47:1024 SYN_RECV
tcp 0 0 193.98.99.158:81 193.98.99.47:1024 SYN_RECV
tcp 0 0 193.98.99.158:81 193.98.99.47:1024 SYN_RECV
tcp 0 0 193.98.99.158:81 193.98.99.47:1024 SYN_RECV
tcp 0 0 193.98.99.158:81 193.98.99.47:1024 SYN_RECV
...
# cat /proc/sys/net/ipv4/tcp_max_syn_backlog
128
During the buildup of these backlogs I saw SYN_RECV connections
from random other ports, but they disappeared and in the final
stadium, before we have to restart the process that listens on
port 81, there are only connections from port 1024.
So my question: Has anybody experienced something like that under
the condition that you haven't had a routing error. I am aware
that things like that can happen when you have SYN_ACK packets go
to New Zealand but the client sits in Paris. I suspect a
kernel/tcp stack error in the linux kernel on Alpha Systems.
The system deployed is RedHat 6.0 with Kernel 2.2.14 on a Compaq
DS10.
If you need more information, please let me know.
Thanks in advance,
Michael.
--
Michael Rommel, ATD IT PS, Siemens AG, Erlangen, Germany
[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]
[]