[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

/proc/sys/net/ipv4 parameters (see sysctl) (LONG, can be ignored)

Apologies in advance if this post bothers anyone.

/proc/sys/net/ipv4 parameters

Just gave an answer to someone on this stuff and realized that this info, 
although readily available in the lartc manual, seems to be difficult for
many to find. (Including me :-) )

Hopefully having this info in one more place will help.

The lartc manual is :
"Linux Advanced Routing & Traffic Control HOWTO"

available here:

And here : (copy at TLDP.org )

This is a great document, I recommend reading cover to cover.

Chapter 13. Kernel network parameters  ( /proc/sys/net/ipv4 )
13.2. Obscure settings

Ok, there are a lot of parameters which can be modified. We try to list
them all. Also documented (partly) in Documentation/ip-sysctl.txt.

Some of these settings have different defaults based on whether you
answered 'Yes' to 'Configure as router and not host' while compiling
your kernel.

Oskar Andreasson also has a page on all these flags and it appears to be
better than ours, so also check http://ipsysctl-tutorial.frozentux.net/.
13.2.1. Generic ipv4

As a generic note, most rate limiting features don't work on loopback,
so don't test them locally. The limits are supplied in 'jiffies', and
are enforced using the earlier mentioned token bucket filter.

The kernel has an internal clock which runs at 'HZ' ticks (or 'jiffies')
per second. On Intel, 'HZ' is mostly 100. So setting a *_rate file to,
say 50, would allow for 2 packets per second. The token bucket filter is
also configured to allow for a burst of at most 6 packets, if enough
tokens have been earned.

Several entries in the following list have been copied from
/usr/src/linux/Documentation/networking/ip-sysctl.txt, written by Alexey
Kuznetsov <kuznet ms2 inr ac ru> and Andi Kleen <ak muc de>


    If the kernel decides that it can't deliver a packet, it will drop
it, and send the source of the packet an ICMP notice to this effect.

    Don't act on echo packets at all. Please don't set this by default,
but if you are used as a relay in a DoS attack, it may be useful.
/proc/sys/net/ipv4/icmp_echo_ignore_broadcasts [Useful]

    If you ping the broadcast address of a network, all hosts are
supposed to respond. This makes for a dandy denial-of-service tool. Set
this to 1 to ignore these broadcast messages.

    The rate at which echo replies are sent to any one destination.

    Set this to ignore ICMP errors caused by hosts in the network
reacting badly to frames sent to what they perceive to be the broadcast

    A relatively unknown ICMP message, which is sent in response to
incorrect packets with broken IP or TCP headers. With this file you can
control the rate at which it is sent.

    This is the famous cause of the 'Solaris middle star' in
traceroutes. Limits the rate of ICMP Time Exceeded messages sent. 

    Maximum number of listening igmp (multicast) sockets on the host.
FIXME: Is this true?

    FIXME: Add a little explanation about the inet peer storage? Miximum
interval between garbage collection passes. This interval is in effect
under low (or absent) memory pressure on the pool. Measured in jiffies.

    Minimum interval between garbage collection passes. This interval is
in effect under high memory pressure on the pool. Measured in jiffies.

    Maximum time-to-live of entries. Unused entries will expire after
this period of time if there is no memory pressure on the pool (i.e.
when the number of entries in the pool is very small). Measured in

    Minimum time-to-live of entries. Should be enough to cover fragment
time-to-live on the reassembling side. This minimum time-to-live is
guaranteed if the pool size is less than inet_peer_threshold. Measured
in jiffies.

    The approximate size of the INET peer storage. Starting from this
threshold entries will be thrown aggressively. This threshold also
determines entries' time-to-live and time intervals between garbage
collection passes. More entries, less time-to-live, less GC interval.

    This file contains the number one if the host received its IP
configuration by RARP, BOOTP, DHCP or a similar mechanism. Otherwise it
is zero.

    Time To Live of packets. Set to a safe 64. Raise it if you have a
huge network. Don't do so for fun - routing loops cause much more damage
that way. You might even consider lowering it in some circumstances.

    You need to set this if you use dial-on-demand with a dynamic
interface address. Once your demand interface comes up, any local TCP
sockets which haven't seen replies will be rebound to have the right
address. This solves the problem that the connection that brings up your
interface itself does not work, but the second try does.

    If the kernel should attempt to forward packets. Off by default.

    Range of local ports for outgoing connections. Actually quite small
by default, 1024 to 4999.

    Set this if you want to disable Path MTU discovery - a technique to
determine the largest Maximum Transfer Unit possible on your path. See
also the section on Path MTU discovery in the Cookbook chapter.

    Maximum memory used to reassemble IP fragments. When
ipfrag_high_thresh bytes of memory is allocated for this purpose, the
fragment handler will toss packets until ipfrag_low_thresh is reached.

    Set this if you want your applications to be able to bind to an
address which doesn't belong to a device on your system. This can be
useful when your machine is on a non-permanent (or even dynamic) link,
so your services are able to start up and bind to a specific address
when your link is down.

    Minimum memory used to reassemble IP fragments.

    Time in seconds to keep an IP fragment in memory.

    A boolean flag controlling the behaviour under lots of incoming
connections. When enabled, this causes the kernel to actively send RST
packets when a service is overloaded.

    Time to hold socket in state FIN-WAIT-2, if it was closed by our
side. Peer can be broken and never close its side, or even died
unexpectedly. Default value is 60sec. Usual value used in 2.2 was 180
seconds, you may restore it, but remember that if your machine is even
underloaded WEB server, you risk to overflow memory with kilotons of
dead sockets, FIN-WAIT-2 sockets are less dangerous than FIN-WAIT-1,
because they eat maximum 1.5K of memory, but they tend to live longer.
Cf. tcp_max_orphans.

    How often TCP sends out keepalive messages when keepalive is
enabled. Default: 2hours.

    How frequent probes are retransmitted, when a probe isn't
acknowledged. Default: 75 seconds.

    How many keepalive probes TCP will send, until it decides that the
connection is broken. Default value: 9. Multiplied with
tcp_keepalive_intvl, this gives the time a link can be non-responsive
after a keepalive has been sent.

    Maximal number of TCP sockets not attached to any user file handle,
held by system. If this number is exceeded orphaned connections are
reset immediately and warning is printed. This limit exists only to
prevent simple DoS attacks, you _must_ not rely on this or lower the
limit artificially, but rather increase it (probably, after increasing
installed memory), if network conditions require more than default
value, and tune network services to linger and kill such states more
aggressively. Let me remind you again: each orphan eats up to  64K of
unswappable memory.

    How may times to retry before killing TCP connection, closed by our
side. Default value 7 corresponds to  50sec-16min depending on RTO. If
your machine is a loaded WEB server, you should think about lowering
this value, such sockets may consume significant resources. Cf.

    Maximal number of remembered connection requests, which still did
not receive an acknowledgment from connecting client. Default value is
1024 for systems with more than 128Mb of memory, and 128 for low memory
machines. If server suffers of overload, try to increase this number.
Warning! If you make it greater than 1024, it would be better to change
TCP_SYNQ_HSIZE in include/net/tcp.h to keep
TCP_SYNQ_HSIZE*16<=tcp_max_syn_backlog and to recompile kernel.

    Maximal number of timewait sockets held by system simultaneously. If
this number is exceeded time-wait socket is immediately destroyed and
warning is printed. This limit exists only to prevent simple DoS
attacks, you _must_ not lower the limit artificially, but rather
increase it (probably, after increasing installed memory), if network
conditions require more than default value.

    Bug-to-bug compatibility with some broken printers. On retransmit
try to send bigger packets to work around bugs in certain TCP stacks.

    How many times to retry before deciding that something is wrong and
it is necessary to report this suspicion to network layer. Minimal RFC
value is 3, it is default, which corresponds to  3sec-8min depending on

    How may times to retry before killing alive TCP connection. RFC 1122
says that the limit should be longer than 100 sec. It is too small
number. Default value 15 corresponds to  13-30min depending on RTO.

    This boolean enables a fix for 'time-wait assassination hazards in
tcp', described in RFC 1337. If enabled, this causes the kernel to drop
RST packets for sockets in the time-wait state. Default: 0

    Use Selective ACK which can be used to signify that specific packets
are missing - therefore helping fast recovery.

    Use the Host requirements interpretation of the TCP urg pointer
field. Most hosts use the older BSD interpretation, so if you turn this
on Linux might not communicate correctly with them. Default: FALSE 

    Number of SYN packets the kernel will send before giving up on the
new connection.

    To open the other side of the connection, the kernel sends a SYN
with a piggybacked ACK on it, to acknowledge the earlier received SYN.
This is part 2 of the threeway handshake. This setting determines the
number of SYN+ACK packets sent before the kernel gives up on the

    Timestamps are used, amongst other things, to protect against
wrapping sequence numbers. A 1 gigabit link might conceivably
re-encounter a previous sequence number with an out-of-line value,
because it was of a previous generation. The timestamp will let it
recognize this 'ancient packet'.

    Enable fast recycling TIME-WAIT sockets. Default value is 1. It
should not be changed without advice/request of technical experts.

    TCP/IP normally allows windows up to 65535 bytes big. For really
fast networks, this may not be enough. The window scaling options allows
for almost gigabyte windows, which is good for high bandwidth*delay

13.2.2. Per device settings

DEV can either stand for a real interface, or for 'all' or 'default'.
Default also changes settings for interfaces yet to be created.


    If a router decides that you are using it for a wrong purpose (ie,
it needs to resend your packet on the same interface), it will send us a
ICMP Redirect. This is a slight security risk however, so you may want
to turn it off, or use secure redirects.

    Not used very much anymore. You used to be able to give a packet a
list of IP addresses it should visit on its way. Linux can be made to
honor this IP option.

    Accept packets with source address 0.b.c.d with destinations not to
this host as local ones. It is supposed that a BOOTP relay daemon will
catch and forward such packets.

    The default is 0, since this feature is not implemented yet (kernel
version 2.2.12).

    Enable or disable IP forwarding on this interface.

    See the section on Reverse Path Filtering.

    If we do multicast forwarding on this interface

    If you set this to 1, this interface will respond to ARP requests
for addresses the kernel has routes to. Can be very useful when building
'ip pseudo bridges'. Do take care that your netmasks are very correct
before enabling this! Also be aware that the rp_filter, mentioned
elsewhere, also operates on ARP queries!

    See the section on Reverse Path Filtering.

    Accept ICMP redirect messages only for gateways, listed in default
gateway list. Enabled by default.

    If we send the above mentioned redirects.

    If it is not set the kernel does not assume that different subnets
on this device can communicate directly. Default setting is 'yes'.

    FIXME: fill this in

13.2.3. Neighbor policy

Dev can either stand for a real interface, or for 'all' or 'default'.
Default also changes settings for interfaces yet to be created.


    Maximum for random delay of answers to neighbor solicitation
messages in jiffies (1/100 sec). Not yet implemented (Linux does not
have anycast support yet).

    Determines the number of requests to send to the user level ARP
daemon. Use 0 to turn off.

    A base value used for computing the random reachable time value as
specified in RFC2461.

    Delay for the first time probe if the neighbor is reachable. (see

    Determines how often to check for stale ARP entries. After an ARP
entry is stale it will be resolved again (which is useful when an IP
address migrates to another machine). When ucast_solicit is greater than
0 it first tries to send an ARP packet directly to the known host When
that fails and mcast_solicit is greater than 0, an ARP request is

    An ARP/neighbor entry is only replaced with a new one if the old is
at least locktime old. This prevents ARP cache thrashing.

    Maximum number of retries for multicast solicitation.

    Maximum time (real time is random [0..proxytime]) before answering
to an ARP request for which we have an proxy ARP entry. In some cases,
this is used to prevent network flooding.

    Maximum queue length of the delayed proxy arp timer. (see

    The time, expressed in jiffies (1/100 sec), between retransmitted
Neighbor Solicitation messages. Used for address resolution and to
determine if a neighbor is unreachable.

    Maximum number of retries for unicast solicitation.

    Maximum queue length for a pending arp request - the number of
packets which are accepted from other layers while the ARP address is
still resolved.

13.2.4. Routing settings

/proc/sys/net/ipv4/route/error_burst and

    This parameters are used to limit the warning messages written to
the kernel log from the routing code. The higher the error_cost factor
is, the fewer messages will be written. Error_burst controls when
messages will be dropped. The default settings limit warning messages to
one every five seconds.

    Writing to this file results in a flush of the routing cache.

    Values to control the frequency and behavior of the garbage
collection algorithm for the routing cache. This can be important for
when doing fail over. At least gc_timeout seconds will elapse before
Linux will skip to another route because the previous one has died. By
default set to 300, you may want to lower it if you want to have a
speedy fail over.

    Also see this post by Ard van Breemen.

    See /proc/sys/net/ipv4/route/gc_elasticity.

    See /proc/sys/net/ipv4/route/gc_elasticity.

    See /proc/sys/net/ipv4/route/gc_elasticity.

    See /proc/sys/net/ipv4/route/gc_elasticity.

    Maximum delay for flushing the routing cache.

    Maximum size of the routing cache. Old entries will be purged once
the cache reached has this size.

    FIXME: fill this in

    Minimum delay for flushing the routing cache.

    FIXME: fill this in

    FIXME: fill this in

    Factors which determine if more ICMP redirects should be sent to a
specific host. No redirects will be sent once the load limit or the
maximum number of redirects has been reached.

    See /proc/sys/net/ipv4/route/redirect_load.

    Timeout for redirects. After this period redirects will be sent
again, even if this has been stopped, because the load or number limit
has been reached.

speech recognition software was not used in the composition of this e-mail
Jeff Kinz, Emergent Research, Hudson, MA.
¡Ya no mas!

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]