Rechercher

Français

Connexion Account

Connexion / Inscription Account

Sites web

While working on my post announcing the full support of bcc-tools in Red Hat Enterprise Linux 8.1, I noticed that NetworkManager was constantly talking to static.redhat.com. I noticed this while testing "gethostlatency", which is in bcc-tools and allows you to see which hosts the kernel is resolving and what the latency on this action is. The output from this tool showed NetworkManager talking to static.redhat.com and I thought that seemed curious and so I dug in further.

Identifying the Issue

[root@rhel-81 tools]# ./gethostlatency
TIME   PID COMM              LATms HOST
14:16:53  7931 dnf                 0.22 localhost
14:20:58  7959 NetworkManager     99.48 static.redhat.com
14:20:58  7957 NetworkManager    100.19 static.redhat.com
14:20:58  7958 NetworkManager     99.03 static.redhat.com
14:20:58  7960 NetworkManager     98.94 static.redhat.com
14:21:00  7966 NetworkManager     82.54 static.redhat.com

Looking at the above output, it shows us that several NetworkManager PIDs are resolving static.redhat.com and it took anywhere from 82 to 100 ms for the resolution. Wanting to get a little bit more information about this and not wanting to go through the fun of tcpdump, I launched another bcc-tool, tcpconnect:

[root@rhel-81 tools]# ./tcpconnect
PID COMM     IP SADDR        DADDR        DPORT
7947   NetworkManag 4  192.168.122.2 23.15.89.133 80  
8410   NetworkManag 4  192.168.122.2 23.15.89.133 80  
8410   NetworkManag 4  192.168.122.2 23.15.89.133 80  
8410   NetworkManag 4  192.168.122.2 23.15.89.133 80  

I decided to do a quick ping check of static.redhat.com to ensure that this traffic matches the hostnames requested before.

[root@rhel-81 ~]# ping static.redhat.com
PING e1890.x.akamaiedge.net (23.15.89.133) 56(84) bytes of data.
64 bytes from a23-15-89-133.deploy.static.akamaitechnologies.com (23.15.89.133): icmp_seq=1 ttl=49 time=97.9 ms
64 bytes from a23-15-89-133.deploy.static.akamaitechnologies.com (23.15.89.133): icmp_seq=2 ttl=49 time=89.9 ms

Clearly we are accessing static.redhat.com on port 80, so I went ahead and did a quick curl, with some extra verbosity, on static.redhat.com:

[root@rhel-81 ~]# curl -vvv static.redhat.com
* Rebuilt URL to: static.redhat.com/
*   Trying 23.15.89.133...
* TCP_NODELAY set
* Connected to static.redhat.com (23.15.89.133) port 80 (#0)
> GET / HTTP/1.1
> Host: static.redhat.com
> User-Agent: curl/7.61.1
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Content-Length: 0
< Location: https://static.redhat.com/
< Cache-Control: max-age=86400
< Expires: Wed, 06 Nov 2019 13:32:18 GMT
< Date: Tue, 05 Nov 2019 13:32:18 GMT
< Connection: keep-alive
< X-Robots-Tag: none
<
* Connection #0 to host static.redhat.com left intact

What’s interesting in this output is the 301 Moved Permanently and the location of https://static.redhat.com, which runs on port 443. Our host is communicating on port 80. This suggests that our port 80 connection must be getting a file other than the base index provided by static.redhat.com. How do we explore this further?

I used strace on the NetworkManager process and that yielded the answer:

[root@rhel-81 ~]# strace -p 890 -s 4096 -vvf -o strace.out
[root@rhel-81 ~]# cat strace.out | grep "static.redhat.com"
...
890   sendto(24, "GET /test/rhel-networkmanager.txt HTTP/1.1\r\nHost: static.redhat.com\r\nAccept: */*\r\nConnection: close\r\n\r\n", 103, MSG_NOSIGNAL, NULL, 0) = 103
...

But then I wanted to know if the bcc-tools shipped by Red Hat that utilize BPF could help me get to this answer as well.

Considering that BPF shows us what goes through the kernel, I wanted to find a way to track the system calls (syscalls) that are being made for the port 80 request to static.redhat.com. Not knowing which syscall we need to be looking for, I decided to monitor the top 100 syscalls for NetworkManager. To do that, I needed to get the PID of NetworkManager:

I got there by running:

[root@rhel-81 ~]# ps -eLF | grep NetworkManager
UID      PID PPID LWP C     NLWS SZ RSS PSR STIME TTY    TIME CMD 
root   8410 1  8410  0 3 222565 18272  1 14:28 ? 00:00:00 /usr/sbin/NetworkManager --no-daemon
root   8410 1  8411  0 3 222565 18272  1 14:28 ? 00:00:00 /usr/sbin/NetworkManager --no-daemon
root   8410 1  8412  0 3 222565 18272  0 14:28 ? 00:00:00 /usr/sbin/NetworkManager --no-daemon
root  15855  7558 15855  0 1 55475  1040 1 14:35 pts/0 00:00:00 grep --color=auto NetworkManager

The PID is 8410, and I’ll use that with the bcc-tools command syscount to get the top 100 calls. I will then monitor gethostlatency in another terminal to determine when the request has been made. After this has happened, I’ll issue Ctrl+C to terminate syscount.

[root@rhel-81 tools]# ./syscount -p 8410 -T 100
Tracing syscalls, printing top 100... Ctrl+C to quit.
^C[14:42:34]
SYSCALL                COUNT
rt_sigaction             960
poll                     414
clock_gettime            332
epoll_wait               328
inotify_add_watch         68
write                     17
read                      13
close                      4
recvmsg                    4
fcntl                      3
setsockopt                 3
lseek                      2
rt_sigprocmask             2
recvfrom                   2
clone                      2
connect                    2
socket                     2
fstat                      2
sendmsg                    2
...

Detaching...

Here sendmsg looks interesting since I determined the host is sending a message over TCP to a particular host, as indicated in the output of tcpconnect. It also helps that I know the host would not have made the call that many times during the time the syscount tool ran. Let’s see if it’s possible to get a kernel stack around sendmsg calls:

./stackcount -p 8410 '*sendmsg*' > /tmp/test-out
...
  b'tcp_sendmsg'
  b'sock_sendmsg'
  b'__sys_sendto'
  b'__x64_sys_sendto'
  b'do_syscall_64'
  b'entry_SYSCALL_64_after_hwframe'
  b'[unknown]'
4

From this output, I can see that the tool caught a tcp_sendmsg stack with the sys_sendto call embedded.

Next I want to see what message tcp_sendmsg is sending via __sys_sendto. For that, I’ll run a bcc-tools trace on __sys_sendto. The sendto syscall has the following definition (which I found by running man 2 sendto):

ssize_t sendto(int sockfd, const void *buf, size_t len, int flags, const struct sockaddr *dest_addr, socklen_t addrlen);

Given this, what I am most interested in is the buffer, referenced by *buf, which is the second argument to the call and is represented in BPF by arg2. The first argument is sockfd, which stands for the socket file descriptor; I would like to see that as well. In BPF it is represented by arg1. As such, I will do:

[root@rhel-81 tools]# ./trace '__sys_sendto "%d %s", arg1, arg2'
PID  TID COMM        FUNC

Then wait for NetworkManager to trigger the request again. Once this request triggered again, I saw the following output:

8410 8410 NetworkManager  __sys_sendto 21 b'GET /test/rhel-networkmanager.txt HTTP/1.1\r\nHost: static.redhat.com\r\nAccept: */*'

Combining this with the hostname, I have been able to determine that this is the address that NetworkManager is calling: http://static.redhat.com/test/rhel-networkmanager.txt . Let’s run a curl on this:

[root@rhel-81 tools]# curl http://static.redhat.com/test/rhel-networkmanager.txt
OK

At this point, it is clear that the data going back to Network Manager is the text "OK". This looks like a captive portal check.

How do I find out where this is coming from? I’ll start by checking if Network Manager is opening any files when this activity happens.

[root@rhel-81 tools]# ./opensnoop -n NetworkManager
PID COMM           FD ERR PATH
8410   NetworkManager  21   0 /etc/hosts

After a few minutes of letting this run, I have figured out that no new files are opened by NetworkManager as part of this issue, so maybe it’s worth seeing what is loaded when NetworkManager is started. In another terminal, let’s issue:

[root@rhel8-11 tools]# systemctl reload NetworkManager  
[root@rhel8-11 tools]#
Now in the opensnoop terminal, I see:
...
16861  NetworkManager   5   0 /usr/lib/NetworkManager/conf.d
16861  NetworkManager   4   0 /etc/NetworkManager/conf.d
16861  NetworkManager   4   0 /run/NetworkManager/conf.d
16861  NetworkManager  -1   2 /usr/lib/NetworkManager/conf.d/20-connectivity-redhat.conf
16861  NetworkManager   4   0 /etc/NetworkManager/NetworkManager.conf
16861  NetworkManager   4   0 /var/lib/NetworkManager/no-auto-default.state
16861  NetworkManager  -1   2 /var/lib/NetworkManager/NetworkManager-intern.conf
16861  NetworkManager   4   0 /var/lib/NetworkManager/NetworkManager.state
...

Let’s take a look at the configuration files that are opened, starting with /usr/lib/NetworkManager/conf.d/20-connectivity-redhat.conf:

# Enable connectivity checking for NetworkManager.
# See `man NetworkManager.conf`.
#
# Note that connectivity checking works badly with rp_filter set to
# strict. Check "/proc/sys/net/ipv4/conf/*/rp_filter".
[connectivity]
uri=http://static.redhat.com/test/rhel-networkmanager.txt
response=OK
interval=300

And there it isa configuration to reach out to a particular uri every 300 seconds expecting the response of OK !

How did this get on the system?

[root@rhel-81 tools]# rpm -qf /usr/lib/NetworkManager/conf.d/20-connectivity-redhat.conf
NetworkManager-config-connectivity-redhat-1.20.0-3.el8.noarch

There is actually a connectivity checking package that was installed to do this.

[root@rhel-81 tools]# rpm -qi NetworkManager-config-connectivity-redhat
Name     : NetworkManager-config-connectivity-redhat
Epoch    : 1
Version  : 1.20.0
Release  : 3.el8
Architecture: noarch
Install Date: Tue 05 Nov 2019 02:08:50 PM CET
<...output truncated…>
Vendor   : Red Hat, Inc.
URL      : http://www.gnome.org/projects/NetworkManager/
Summary  : NetworkManager config file for connectivity checking via Red Hat servers
Description :
This adds a NetworkManager configuration file to enable connectivity checking
via Red Hat infrastructure.

This analysis has shown us that the NetworkManager-config-connectivity-redhat package causes NetworkManager to do the connectivity checking. For systems with wireless interfaces, this connectivity checking can be very valuable as it allows NetworkManager to determine if it’s directly connected to the internet or behind a captive portal. For systems without wireless interfaces or where you wish to reduce outbound network traffic, you may wish to disable this checking. To do that, we can either remove the aforementioned package or edit /usr/lib/NetworkManager/conf.d/20-connectivity-redhat.conf and change:

interval = 300
To:
interval = 0
And then run:
systemctl reload NetworkManager

As such, while it was very efficient to use strace to determine what was happening, I have shown that it’s possible to get to the root of the issue using the dynamic kernel tracing provided by bcc-tools as well. In doing this, I’ve further demonstrated the power of these tools which we provide in Red Hat Enterprise Linux. 

As you can see, these tools give you great power of observability over what is going on under the covers of a Red Hat Enterprise Linux system.