[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[rhelv5-list] Wrong bandwidth values reported by the kernel



Hi,

I already reported this problem here a while back : On many RHEL5
servers, we are seeing bogus bandwidth values being graphed through
snmp. Basically, we sometimes see 2Gbps peaks on Gigabit interfaces...
the curves themselves look good, but the values are way too high.

The same bandwidth usage graphed directly from the switches to which the
servers are attached is fine (confirmed on various Dell, Cisco and Force
10 equipments).

What puzzled me at first is that looking at the bandwidth usage in real
time using "iptraf" reports what seems to be correct values. It seems
like it's the /proc values which are wrong (and ifconfig and snmp use
those).

Attached is a simple python script which displays in real time the
network usage reported by using tcpdump (the iptraf way) and /proc (the
snmp way).

On my (not busy at all) workstation (Fedora 10 x86_64), I currently see
this :
TCPDUMP: 2533105 (23445 packets)
PROC:    2507066

But on a busy web server, RHEL x86_64 5.3 domU, I see :
TCPDUMP: 1959818423 (1563119 packets)
PROC:    5320731075

This is quite a big difference. From my tests, there isn't a fixed
ratio between the two results and I see differences on all the servers
I've tried it on (domU, dom0, no Xen, with bonding etc.)

With all this, the first question would be : Is there anything wrong
with the script I'm using? If any network and/or python experts want to
have a look...

If the answer to the above is "nothing", then something really
wrong is going on...

Matthias

-- 
Clean custom Red Hat Linux rpm packages : http://freshrpms.net/
Fedora release 10 (Cambridge) - Linux kernel
2.6.27.12-170.2.5.fc10.x86_64 Load : 0.40 0.34 0.36
#!/usr/bin/python
import re
import time
import thread
import getopt
import signal
import sys
from subprocess import Popen, PIPE, STDOUT

# TODO print not refreshing correctly

def get_bytes_from_tcpdump(interface, src, byte_values):
    command = Popen(['tcpdump', '-n', '-e', '-p', '-l', '-v', '-i',
                     interface, 'src', src], stdout=PIPE, stderr=PIPE,
                     bufsize=0)
    while 1:
        line = command.stdout.readline()
        if not line:
            # time.sleep(1)
            continue
        bytes_pattern = re.search('length \d*', line)
        # dest_pattern = re.search('> .*: ', line)

        if bytes_pattern:
            s = bytes_pattern.group(0)
            bytes = int(s[7:]) + 5
        else:
            # ARP packet
            bytes = 28 + 14

        byte_values[0] += bytes
        byte_values[1] += 1
        # time.sleep(1)

        # if dest_pattern:
        #     s = dest_pattern.group()
        #     dest = s[2:len(s)-2]

def get_bytes_from_proc(interface, byte_values):
    wrap = 2**32
    offset = read_proc(interface)
    while(1):
        current_bytes = read_proc(interface)
        increase = current_bytes - offset
        if increase < 0:
            increase = (wrap - (byte_values[0] % wrap)) + current_bytes
        byte_values[0] += increase
        offset = current_bytes
        time.sleep(1)

def get_bytes_from_ifconfig(interface, byte_values):
    offset = read_ifconfig(interface)
    while(1):
        bytes = read_ifconfig(interface)
        byte_values[0] += (bytes - offset)
        offset = bytes
        time.sleep(1)

def read_ifconfig(interface):
    command = Popen(['/sbin/ifconfig', interface], stdout=PIPE, stderr=PIPE)
    # received bytes
    # lines = command.communicate()[0].split()[34]
    # transmitted bytes
    try:
        s = command.communicate()
    except Exception, e:
        print "failed: %r" % e
    bytes = int(s[0].split()[38].split(':')[1])
    return bytes

def read_proc(interface):
    f = open('/proc/net/dev')
    for line in f:
        values = line.split()
        i = values[0].split(':')[0]
        if interface == i:
            bytes = int(values[8])
            # received bytes
            # bytes = int(values[0].split(':')[1])
            f.close()
            return bytes
    f.close()

def signal_handler(signum, frame):
#    print "bye"
    sys.exit(0)

def main(interface, host):
    signal.signal(signal.SIGINT, signal_handler)

    byte_value_tcpdump = [0, 0]
    byte_value_proc = [0]
    byte_value_ifconfig = [0]

    thread.start_new_thread(get_bytes_from_tcpdump,
                            (interface, host, byte_value_tcpdump))
    thread.start_new_thread(get_bytes_from_proc, (interface, byte_value_proc))
    # thread.start_new_thread(get_bytes_from_ifconfig, (interface,
    #                                                   byte_value_ifconfig))

    while 1:
        s = "TCPDUMP: %d (%d packets)\nPROC:    %d" % (byte_value_tcpdump[0],
                                                       byte_value_tcpdump[1],
                                                       byte_value_proc[0])
        print s
        time.sleep(1)

def usage():
    print "Usage: monitor -i interface (e.g. eth0) -m host_ip"

if __name__ == "__main__":
    interface = None
    ip = None
    opts, args = getopt.getopt(sys.argv[1:], "hi:m:", ["help"])
    for o, a in opts:
        if o == '-i':
            interface = a
        elif o == '-m':
            ip = a
        elif o in ['-h', '--help']:
            usage()
            sys.exit()
    if not interface or not ip:
        usage()
        sys.exit()

    main(interface, ip)


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]