[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Foundry ServerIron and Compay DS10/Linux - SYN Flood Troubles



Hi there,

we experience an interesting problem here, currently under heavy
investigation, but I thought I'd share it with you in case
it rings a bell...

Abstract

A load-balancing device suddenly stops sending incoming client
requests on to a particular server. Testing the functionality
of the server from a third workstation works fine. Netstat -an on
the server shows an unusual high amount of tcp connections in
SYN_RECV state.

Setup

  +-------------------+     +--------------------+
  | Catalyst    + + +----------+       Router    +-----> Internet
  +-------------|-|---+     +--------------------+
                | +----------------+
  +-------------|-----+     +------+-------------+
  | Foundry   + +     |     |          Client    |
  +-----------|-------+     +--------------------+
              |
  +-----------|-------+
  | C2948   + + + + + |
  +---------|---|-|-|-+
            |   | | +--------------+
  +---------+---|-|---+     +------+-------------+
  | sikinos     | |   |     |          santorini |
  +-------------|-|---+     +--------------------+
                | +----------------+
  +-------------+-----+     +------+-------------+
  | syros             |     |          emporio   |
  +-------------------+     +--------------------+

Normal operation

The normal operation is as follows:

  The client sends a HTTP Request to the virtual IP Address of
  the Foundry (e.g. GET http://www.foundrynetworks.com/ HTTP/1.1)
  The Foundry translates the destination IP field of
  the incoming tcp packet to one of the real caches 
  (sikinos,santorini,emporio or syros) and forwards
  the packet there and remembers the connection somewhere.

  For easier understanding let's assume that the object is in the
  cache, the server then fetches the object from disk storage and
  sends it back to the client.

  The response from the cache passes the Foundry, the
  source address gets translated back and that's about it.

To know when a real cache is down for maintenance or whatever
other reason, the Foundry periodically checks the function of
each real cache. It uses an HTTP connection and retrieves an
object from the cache. If the response code of the real cache is
200-400 everything is okay. Otherwise the real cache is marked
dead and excluded from the round-robin algorithm.

Faulty operation

At the moment we experience the following: 

The testing of a real server fails. The foundry cannot retrieve
an object from a real server. Furhter debugging showed, that
while opening a TCP connection the whole of the TCP handshake
(SYN, SYN_ACK, ACK) is not completed.

Sender's (Foundry) view: After sending the SYN packet, the SYN_ACK
is not received on time, the Foundry sends a RST packet and
assumes the connection is taken down at the other end as well.
(Which is not true :-| btw.) It then tries to open the next
connection USING THE SAME SOURCE PORT 1024 !!! and fails again
and again and again and ... you get the picture.

A dump of the tcp statistics of the foundry lokks like this, but
I cannot yet comment on whether the counters are within normal range
or not :-(

  # sh ip traf
  TCP Statistics
  1 current active tcbs, 67378 tcbs allocated, 67374 tcbs freed
  66155 active opens, 0 passive opens, 46 failed attempts
  181366 active resets, 65045 passive resets, 620 input errors
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  maybe these counters are too high

  276911 in segment, 146132 out segment, 836 retransmission
  keepalive: close connection 0, failure callback 0
  tcp connect: connection exist 0, out of tcb 0
  keepalive: close connection 0, failure callback 0
  tcp connect: connection exist 0, out of tcb 0

  The system uptime is 20 hours 37 minutes 36 seconds


Receiver's (emporio) view: After receiving the SYN packet, the
system sends out an SYN_ACK packet, receives an RST packet. What
it does is not yet clearly understood, I'll debug that tomorrow,
but the results are:

  netstat -an shows a maximum of 128 tcp connections in SYN_RECV
  state, all coming from the same source address AND port, like
  that:

  # netstat -an |grep SYN
  tcp        0      0 193.98.99.158:81        193.98.99.47:1024 SYN_RECV
  tcp        0      0 193.98.99.158:81        193.98.99.47:1024 SYN_RECV
  tcp        0      0 193.98.99.158:81        193.98.99.47:1024 SYN_RECV
  tcp        0      0 193.98.99.158:81        193.98.99.47:1024 SYN_RECV
  tcp        0      0 193.98.99.158:81        193.98.99.47:1024 SYN_RECV
  tcp        0      0 193.98.99.158:81        193.98.99.47:1024 SYN_RECV
  tcp        0      0 193.98.99.158:81        193.98.99.47:1024 SYN_RECV
  tcp        0      0 193.98.99.158:81        193.98.99.47:1024 SYN_RECV
  tcp        0      0 193.98.99.158:81        193.98.99.47:1024 SYN_RECV
  tcp        0      0 193.98.99.158:81        193.98.99.47:1024 SYN_RECV
  tcp        0      0 193.98.99.158:81        193.98.99.47:1024 SYN_RECV
  tcp        0      0 193.98.99.158:81        193.98.99.47:1024 SYN_RECV
  tcp        0      0 193.98.99.158:81        193.98.99.47:1024 SYN_RECV
  ...
  # cat /proc/sys/net/ipv4/tcp_max_syn_backlog
  128

During the buildup of these backlogs I saw SYN_RECV connections
from random other ports, but they disappeared and in the final
stadium, before we have to restart the process that listens on
port 81, there are only connections from port 1024.

So my question: Has anybody experienced something like that under
the condition that you haven't had a routing error. I am aware
that things like that can happen when you have SYN_ACK packets go
to New Zealand but the client sits in Paris. I suspect a
kernel/tcp stack error in the linux kernel on Alpha Systems.

The system deployed is RedHat 6.0 with Kernel 2.2.14 on a Compaq
DS10.

If you need more information, please let me know.

Thanks in advance,

  Michael.

-- 
Michael Rommel, ATD IT PS, Siemens AG, Erlangen, Germany



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index] []