While working on my post announcing the full support of bcc-tools in Red Hat Enterprise Linux 8.1, I noticed that NetworkManager was constantly talking to static.redhat.com. I noticed this while testing "gethostlatency", which is in bcc-tools and allows you to see which hosts the kernel is resolving and what the latency on this action is. The output from this tool showed NetworkManager talking to static.redhat.com and I thought that seemed curious and so I dug in further.
Identifying the Issue
[root@rhel-81 tools]# ./gethostlatency TIME PID COMM LATms HOST 14:16:53 7931 dnf 0.22 localhost 14:20:58 7959 NetworkManager 99.48 static.redhat.com 14:20:58 7957 NetworkManager 100.19 static.redhat.com 14:20:58 7958 NetworkManager 99.03 static.redhat.com 14:20:58 7960 NetworkManager 98.94 static.redhat.com 14:21:00 7966 NetworkManager 82.54 static.redhat.com
Looking at the above output, it shows us that several NetworkManager PIDs are resolving static.redhat.com and it took anywhere from 82 to 100 ms for the resolution. Wanting to get a little bit more information about this and not wanting to go through the fun of
tcpdump, I launched another bcc-tool,
[root@rhel-81 tools]# ./tcpconnect PID COMM IP SADDR DADDR DPORT 7947 NetworkManag 4 192.168.122.2 22.214.171.124 80 8410 NetworkManag 4 192.168.122.2 126.96.36.199 80 8410 NetworkManag 4 192.168.122.2 188.8.131.52 80 8410 NetworkManag 4 192.168.122.2 184.108.40.206 80
I decided to do a quick ping check of static.redhat.com to ensure that this traffic matches the hostnames requested before.
[root@rhel-81 ~]# ping static.redhat.com PING e1890.x.akamaiedge.net (220.127.116.11) 56(84) bytes of data. 64 bytes from a23-15-89-133.deploy.static.akamaitechnologies.com (18.104.22.168): icmp_seq=1 ttl=49 time=97.9 ms 64 bytes from a23-15-89-133.deploy.static.akamaitechnologies.com (22.214.171.124): icmp_seq=2 ttl=49 time=89.9 ms
Clearly we are accessing static.redhat.com on port 80, so I went ahead and did a quick
curl, with some extra verbosity, on static.redhat.com:
[root@rhel-81 ~]# curl -vvv static.redhat.com * Rebuilt URL to: static.redhat.com/ * Trying 126.96.36.199... * TCP_NODELAY set * Connected to static.redhat.com (188.8.131.52) port 80 (#0) > GET / HTTP/1.1 > Host: static.redhat.com > User-Agent: curl/7.61.1 > Accept: */* > < HTTP/1.1 301 Moved Permanently < Content-Length: 0 < Location: https://static.redhat.com/ < Cache-Control: max-age=86400 < Expires: Wed, 06 Nov 2019 13:32:18 GMT < Date: Tue, 05 Nov 2019 13:32:18 GMT < Connection: keep-alive < X-Robots-Tag: none < * Connection #0 to host static.redhat.com left intact
What’s interesting in this output is the 301 Moved Permanently and the location of https://static.redhat.com, which runs on port 443. Our host is communicating on port 80. This suggests that our port 80 connection must be getting a file other than the base index provided by static.redhat.com. How do we explore this further?
strace on the NetworkManager process and that yielded the answer:
[root@rhel-81 ~]# strace -p 890 -s 4096 -vvf -o strace.out [root@rhel-81 ~]# cat strace.out | grep "static.redhat.com" ... 890 sendto(24, "GET /test/rhel-networkmanager.txt HTTP/1.1\r\nHost: static.redhat.com\r\nAccept: */*\r\nConnection: close\r\n\r\n", 103, MSG_NOSIGNAL, NULL, 0) = 103 ...
But then I wanted to know if the bcc-tools shipped by Red Hat that utilize BPF could help me get to this answer as well.
Considering that BPF shows us what goes through the kernel, I wanted to find a way to track the system calls (syscalls) that are being made for the port 80 request to static.redhat.com. Not knowing which syscall we need to be looking for, I decided to monitor the top 100 syscalls for NetworkManager. To do that, I needed to get the PID of NetworkManager:
I got there by running:
[root@rhel-81 ~]# ps -eLF | grep NetworkManager UID PID PPID LWP C NLWS SZ RSS PSR STIME TTY TIME CMD root 8410 1 8410 0 3 222565 18272 1 14:28 ? 00:00:00 /usr/sbin/NetworkManager --no-daemon root 8410 1 8411 0 3 222565 18272 1 14:28 ? 00:00:00 /usr/sbin/NetworkManager --no-daemon root 8410 1 8412 0 3 222565 18272 0 14:28 ? 00:00:00 /usr/sbin/NetworkManager --no-daemon root 15855 7558 15855 0 1 55475 1040 1 14:35 pts/0 00:00:00 grep --color=auto NetworkManager
The PID is 8410, and I’ll use that with the bcc-tools command
syscount to get the top 100 calls. I will then monitor
gethostlatency in another terminal to determine when the request has been made. After this has happened, I’ll issue Ctrl+C to terminate syscount.
[root@rhel-81 tools]# ./syscount -p 8410 -T 100 Tracing syscalls, printing top 100... Ctrl+C to quit. ^C[14:42:34] SYSCALL COUNT rt_sigaction 960 poll 414 clock_gettime 332 epoll_wait 328 inotify_add_watch 68 write 17 read 13 close 4 recvmsg 4 fcntl 3 setsockopt 3 lseek 2 rt_sigprocmask 2 recvfrom 2 clone 2 connect 2 socket 2 fstat 2 sendmsg 2 ... Detaching...
sendmsg looks interesting since I determined the host is sending a message over TCP to a particular host, as indicated in the output of tcpconnect. It also helps that I know the host would not have made the call that many times during the time the syscount tool ran. Let’s see if it’s possible to get a kernel stack around
./stackcount -p 8410 '*sendmsg*' > /tmp/test-out ... b'tcp_sendmsg' b'sock_sendmsg' b'__sys_sendto' b'__x64_sys_sendto' b'do_syscall_64' b'entry_SYSCALL_64_after_hwframe' b'[unknown]' 4 …
From this output, I can see that the tool caught a
tcp_sendmsg stack with the
sys_sendto call embedded.
Next I want to see what message
tcp_sendmsg is sending
via __sys_sendto. For that, I’ll run a bcc-tools trace on
sendto syscall has the following definition (which I found by running
man 2 sendto):
ssize_t sendto(int sockfd, const void *buf, size_t len, int flags, const struct sockaddr *dest_addr, socklen_t addrlen);
Given this, what I am most interested in is the buffer, referenced by
*buf, which is the second argument to the call and is represented in BPF by
arg2. The first argument is sockfd, which stands for the socket file descriptor; I would like to see that as well. In BPF it is represented by arg1. As such, I will do:
[root@rhel-81 tools]# ./trace '__sys_sendto "%d %s", arg1, arg2' PID TID COMM FUNC
Then wait for NetworkManager to trigger the request again. Once this request triggered again, I saw the following output:
8410 8410 NetworkManager __sys_sendto 21 b'GET /test/rhel-networkmanager.txt HTTP/1.1\r\nHost: static.redhat.com\r\nAccept: */*'
Combining this with the hostname, I have been able to determine that this is the address that NetworkManager is calling: http://static.redhat.com/test/rhel-networkmanager.txt . Let’s run a
curl on this:
[root@rhel-81 tools]# curl http://static.redhat.com/test/rhel-networkmanager.txt OK
At this point, it is clear that the data going back to Network Manager is the text "OK". This looks like a captive portal check.
How do I find out where this is coming from? I’ll start by checking if Network Manager is opening any files when this activity happens.
[root@rhel-81 tools]# ./opensnoop -n NetworkManager PID COMM FD ERR PATH 8410 NetworkManager 21 0 /etc/hosts
After a few minutes of letting this run, I have figured out that no new files are opened by NetworkManager as part of this issue, so maybe it’s worth seeing what is loaded when NetworkManager is started. In another terminal, let’s issue:
[root@rhel8-11 tools]# systemctl reload NetworkManager [root@rhel8-11 tools]# Now in the opensnoop terminal, I see: ... 16861 NetworkManager 5 0 /usr/lib/NetworkManager/conf.d 16861 NetworkManager 4 0 /etc/NetworkManager/conf.d 16861 NetworkManager 4 0 /run/NetworkManager/conf.d 16861 NetworkManager -1 2 /usr/lib/NetworkManager/conf.d/20-connectivity-redhat.conf 16861 NetworkManager 4 0 /etc/NetworkManager/NetworkManager.conf 16861 NetworkManager 4 0 /var/lib/NetworkManager/no-auto-default.state 16861 NetworkManager -1 2 /var/lib/NetworkManager/NetworkManager-intern.conf 16861 NetworkManager 4 0 /var/lib/NetworkManager/NetworkManager.state ...
Let’s take a look at the configuration files that are opened, starting with
# Enable connectivity checking for NetworkManager. # See `man NetworkManager.conf`. # # Note that connectivity checking works badly with rp_filter set to # strict. Check "/proc/sys/net/ipv4/conf/*/rp_filter". [connectivity] uri=http://static.redhat.com/test/rhel-networkmanager.txt response=OK interval=300
And there it is—a configuration to reach out to a particular uri every 300 seconds expecting the response of OK !
How did this get on the system?
[root@rhel-81 tools]# rpm -qf /usr/lib/NetworkManager/conf.d/20-connectivity-redhat.conf NetworkManager-config-connectivity-redhat-1.20.0-3.el8.noarch
There is actually a connectivity checking package that was installed to do this.
[root@rhel-81 tools]# rpm -qi NetworkManager-config-connectivity-redhat Name : NetworkManager-config-connectivity-redhat Epoch : 1 Version : 1.20.0 Release : 3.el8 Architecture: noarch Install Date: Tue 05 Nov 2019 02:08:50 PM CET <...output truncated…> Vendor : Red Hat, Inc. URL : http://www.gnome.org/projects/NetworkManager/ Summary : NetworkManager config file for connectivity checking via Red Hat servers Description : This adds a NetworkManager configuration file to enable connectivity checking via Red Hat infrastructure.
This analysis has shown us that the NetworkManager-config-connectivity-redhat package causes NetworkManager to do the connectivity checking. For systems with wireless interfaces, this connectivity checking can be very valuable as it allows NetworkManager to determine if it’s directly connected to the internet or behind a captive portal. For systems without wireless interfaces or where you wish to reduce outbound network traffic, you may wish to disable this checking. To do that, we can either remove the aforementioned package or edit
/usr/lib/NetworkManager/conf.d/20-connectivity-redhat.conf and change:
interval = 300 To: interval = 0 And then run: systemctl reload NetworkManager
As such, while it was very efficient to use strace to determine what was happening, I have shown that it’s possible to get to the root of the issue using the dynamic kernel tracing provided by bcc-tools as well. In doing this, I’ve further demonstrated the power of these tools which we provide in Red Hat Enterprise Linux.
As you can see, these tools give you great power of observability over what is going on under the covers of a Red Hat Enterprise Linux system.