In Red Hat Enterprise Linux 8.1, Red Hat ships a set of fully supported on x86_64 dynamic kernel tracing tools, called bcc-tools, that make use of a kernel technology called extended Berkeley Packet Filter (eBPF). With these tools, you can quickly gain insight into certain aspects of system performance that would have previously required more time and effort from the system and operator.

The eBPF technology allows dynamic kernel tracing without requiring kernel modules (like systemtap) or rebooting of the kernel (as with debug kernels). eBPF accomplishes this while maintaining minimal overhead for each trace point, making these tools an ideal way to instrument running kernels in production.

To ensure that an eBPF program will not harm the running kernel, tools built on eBPF go through the following process when instantiated by root on the command line: 

  • The program is compiled into eBPF bytecode.

  • Loaded into the kernel.

  • Run through a technology called the eBPF verifier to ensure that the program will not harm the running kernel.

  • Upon passing the verifier, it begins execution. If it does not pass the verifier, the code is unloaded and does not execute.

That said, bear in mind that you are still inserting tracing and some system calls are called significantly more than others, so depending on what you are tracing, there may be increased overhead. 

If you are interested in more information on eBPF in general, please see Stanislav Kozina’s blog: Introduction to eBPF in Red Hat Enterprise Linux 7.

Installing bcc-tools

We ship bcc-tools in RHEL 7 and RHEL 8. With RHEL 8.1, bcc-tools has gone fully supported on x86. To install bcc-tools on RHEL 7 (7.6+) and RHEL 8, run yum install as root:

yum install bcc-tools 

bcc-tools Framework

Before we dive into the different types of tools that are included in bcc-tools, it’s important to note a few things:

  • All of these tools live in /usr/share/bcc/tools.

  • These tools must run as the root user as any eBPF program can read kernel data. As such, injecting eBPF bytecode as a regular user is not allowed in RHEL 8.1.

  • Each tool has a man page. To view the man page, run man <tool name>. These man pages include descriptions of the tools, provide the options that can be called, and have information on the expected overhead of the specific tool.

Since there are a lot of tools in bcc-tools, I’m going to divide the tools into the following classes and then we’ll dive into each class:

  • The Snoops

  • Latency Detectors

  • Slower

  • Top Up with bcc-tools

  • Java/Perl/Python/Ruby

The Snoops

The first class of programs that I’d like to talk about are the snoops. These programs give real time data from the kernel on:

 

Table: bcc-tools snooping tools

Most of the snoop programs have negligible overhead, unless your application is hitting a large amount of the calls that you are tracing. An exception to this though is the failed dcache lookup tool, dcsnoop, which will exhibit noticeable to significant overhead.

All of the Snoops run in the same manner. Run the snoop as root and watch in real time what is happening on your system!

Let’s look at what this looks like with killsnoop:

In one terminal, I’ll run:

[karl1@rhel-81 ~]$ cd /usr/share/bcc/tools
[karl1@rhel-81 tools]$ sudo ./killsnoop 

In another terminal, I’ll log in with a Bash shell, manually invoke a new Bash shell, determine the PID of my new shell, and then issue a kill -0 and a kill -9 to that PID:

bluenote:~ karl1$ ssh karl1@rhel8-vm
karl1@rhel8-vm's password:
Activate the web console with: systemctl enable --now cockpit.socket

Last login: Wed Jul 17 00:50:14 2019 from 172.16.96.13
[karl1@rhel-81 ~]$ bash
[karl1@rhel-81 ~]$ echo $$
3693
[karl1@rhel-81 ~]$ kill -0 3693
[karl1@rhel-81 ~]$ kill -9 3693
Killed
[karl1@rhel-81 ~]$ echo $$
3653
[karl1@rhel-81 ~]$

The corresponding output from killsnoop shows:

[karl1@rhel-81 tools]$ sudo ./killsnoop
TIME   PID COMM         SIG  TPID   RESULT
00:54:20  3693 bash          0 3693   0
00:54:24  3693 bash          9 3693   0

As you can see there, it has captured that the Bash command issued a kill -0 and a kill -9 to the pid 3693 and both times was successful. The PID and COMM columns represent the process doing the killing. The TPID column represents the process id on the receiving end of the kill.

The snoops are very effective at being able to monitor the system in real-time! 

Latency Detectors

The next class of programs that we have made available in bcc-tools is the latency detectors. They do the following things:

 

Table: bcc-tools latency detectors

Let’s look at biolatency. In one terminal:

 

Figure 1: biolatency

Now, in another terminal, let's generate some I/O:

 

Figure 2: generating I/O

At this point, we will hit Ctrl-C on the biolatency process and it prints the following:

 

Figure 3: showing distribution of I/O

This shows the distribution of I/O over the recorded time by number of I/Os and how long they took in microseconds. As we can see, most of the I/O took between 262,144 and 524,287 microseconds.

These tools can be really helpful when trying to determine sources of latency on a running system.

Slower

The tools that make up the "slower" family show us calls that execute longer than a defined threshold for a number of different areas. They are:

 

Table 3: eBPF slower family

Each of these tools follows a similar pattern, so let’s take a look at xfsslower, since I have XFS filesystems on my virtual machine and they are the default filesystem in RHEL 8.

First, let’s start xfsslower with no arguments:

[karl1@rhel-81 tools]$ sudo ./xfsslower
Tracing XFS operations slower than 10 ms
TIME COMM       PID T BYTES   OFF_KB LAT(ms) FILENAME

In another terminal, generate some load by running:

for i in $(seq 1 80); do seq 3 | xargs -P0 -n1 md5sum /dev/zero & done

This starts 240 processes that are doing md5sum on /dev/zero, which is a never ending file, so this is going to generate significant CPU load.

Next, start a dd command:

[karl1@rhel-81 ~]$ dd if=/dev/urandom of=bigfile bs=1024 count=1000000

We start to get some outliers in the latency data as some of our writes are taking a good deal longer than 10ms:

13:30:26 b'dd'       7294   W 1024 715  2629.01 b'bigfile'

13:30:31 b'dd'       7294   W 1024 3944 2647.03 b'bigfile'

13:31:05 b'dd'       7294   W 1024 22836 2638.03 b'bigfile'

13:31:08 b'dd'       7294   W 1024 24393 2642.01 b'bigfile'

13:31:24 b'dd'       7294   W 1024 34187 2634.02 b'bigfile'

13:31:26 b'(tmpfiles)'  7375 R 256  0     406.96 b'systemd-tmpfiles'

13:31:29 b'systemd-tmpf 7375   R 35  0    2316.90 b'cryptsetup.conf'

13:31:29 b'systemd-tmpf 7375   R 365  0      54.90 b'dbus.conf'

13:31:29 b'systemd-tmpf 7375   R 164  0      32.96 b'dnf.conf'

13:31:29 b'systemd-tmpf 7375   R 515  0      10.95 b'etc.conf'

13:31:29 b'systemd-tmpf 7375   R 362  0      10.94 b'home.conf'

13:31:29 b'systemd-tmpf 7375   R 1096 0      10.96 b'journal-nocow.conf'

13:31:29 b'systemd-tmpf 7375   R 841  0      21.81 b'legacy.conf'

13:31:29 b'systemd-tmpf 7375   R 30  0      10.93 b'libselinux.conf'

13:31:29 b'systemd-tmpf 7375   R 61  0      10.96 b'lvm2.conf'

13:31:29 b'systemd-tmpf 7375   R 35  0      21.96 b'man-db.conf'

13:31:29 b'systemd-tmpf 7375   R 34  0      10.95 b'openssh.conf'

13:31:29 b'systemd-tmpf 7375   R 98  0      10.96 b'pam.conf'

13:31:29 b'systemd-tmpf 7375   R 96  0      10.96 b'portables.conf'

13:31:29 b'systemd-tmpf 7375   R 22  0      21.96 b'rpm.conf'

13:31:29 b'systemd-tmpf 7375   R 137  0      21.96 b'selinux-policy.conf'

13:31:29 b'systemd-tmpf 7375   R 33  0      10.95 b'subscription-manager.conf'

13:31:29 b'systemd-tmpf 7375   R 305  0      22.94 b'sudo.conf'

13:31:30 b'systemd-tmpf 7375   R 596  0      22.95 b'systemd-nologin.conf'

13:31:30 b'systemd-tmpf 7375   R 1676 0      11.03 b'systemd.conf'

13:31:30 b'systemd-tmpf 7375   R 757  0      10.86 b'tmp.conf'

13:31:30 b'systemd-tmpf 7375   R 56  0      54.87 b'tuned.conf'

13:31:30 b'systemd-tmpf 7375   R 568  0      21.96 b'var.conf'

13:31:30 b'systemd-tmpf 7375   R 655  0      10.45 b'x11.conf'

13:31:32 b'dd'       7294   W 1024 38913 2657.04 b'bigfile'

13:31:40 b'dd'       7294   W 1024 43233 2647.02 b'bigfile'

You will notice some systemd file reads in addition to the dd command, their latencies are also outliers based on a 10ms metric.

Being able to get this data in realtime from the kernel without having to code anything is very nice for being able to start looking at where the problems are on a system. 

One thing to keep in mind with the Slower tools is that overhead can become an issue depending on how loaded the system is. You will want to try profiling a system prior to putting load on it to make sure that these tools will not be problematic in your situation.

Top up with bcc-tools

Everybody knows about top and maybe you even know about powertop, but get ready to top-up! We have:

 

Table 4: bcc-tools top tools

Let’s take a look at cachetop. To get started, I will start cachetop:

[karl1@rhel-81 tools]$ sudo ./cachetop

And then in another terminal, I will start our dd worker:

[karl1@rhel-81 ~]$ dd if=/dev/urandom of=bigfile bs=1024 count=1000000

Here is what shows up in cachetop:

04:51:56 Buffers MB: 0 / Cached MB: 712 / Sort: HITS / Order: ascending

PID   UID  CMD          HITS MISSES   DIRTIES READ_HIT%  WRITE_HIT%

   12637 root  cachetop            1    0    0 100.0%   0.0%

    1888 gdm      gsd-color        2    0    0 100.0%   0.0%

    7443 karl1 gsd-color           2    0    0 100.0%   0.0%

   12639 karl1 pool                6    0    0 100.0%   0.0%

   12638 karl1 dd                 31  187077 0   0.0%  50.0%

    7546 karl1 pool               48     217 0  18.1%   0.4%

   12640 karl1 wal-checkpoint     50      25 0  66.7%   0.0%

   12642 karl1 wal-checkpoint     50      25 0  66.7%   0.0%

   12644 karl1 wal-checkpoint     50      25 0  66.7%   0.0%

   12648 karl1 wal-checkpoint     50      25 0  66.7%   0.0%

   12650 karl1 wal-checkpoint     52      26 0  66.7%   0.0%

   12646 karl1 wal-checkpoint     58      29 0  66.7%   0.0%

   12638 karl1 bash              128      0  0 100.0%   0.0%

    7539 karl1 tracker-miner-f    246     0  0 100.0%   0.0%

It makes sense that dd has a pretty high miss rate given that it’s reading from /dev/urandom.

Java/Perl/Python/Ruby

The language tracers that have been implemented in bcc-tools will incur some overhead on the actual programs being run in the language you are working with. That said, we ship with the following tools in this space:

 

Table 5: bcc-tools for profiling programs

Since I am a Python programmer and subscription-manager is written in Python, I will demonstrate the use of pythoncalls to show which calls are most frequently made by Python during a run of subscription-manager list --available

To trace the Python calls, you have to have the PID that you wish to attach to. As subscription-manager is pretty quick at execution, I use watch on ps in one terminal to quickly grab the PID and then after starting:

[root@rhel-81 ~]# subscription-manager list --available

I quickly switch and run in another terminal:

[karl1@rhel-81 tools]$ sudo ./pythoncalls 13746
Tracing calls in process 13746 (language: python)... Ctrl-C to quit.

Once subscription-manager completes, I Ctrl-C pythoncalls and then here is the type of output that I get. I’m only showing the most called items here as there is a lot of output:

/usr/lib64/python3.6/site-packages/subscription_manager/printing_utils.py.ljust_wide   574
/usr/lib/python3.6/site-packages/dateutil/tz/tz.py._naive_is_dst   858
/usr/lib/python3.6/site-packages/dateutil/tz/tz.py._datetime_to_timestamp   858
/usr/lib/python3.6/site-packages/dateutil/parser.py.get_token  1165
/usr/lib/python3.6/site-packages/dateutil/parser.py.__next__  1165
/usr/lib/python3.6/site-packages/dateutil/parser.py.isword  1172
/usr/lib64/python3.6/http/client.py._get_chunk_left  1627
/usr/lib64/python3.6/http/client.py._read_next_chunk_size  1627
/usr/lib64/python3.6/site-packages/subscription_manager/printing_utils.py.highlight_by_filter_string_columnize_cb  1696
/usr/lib64/python3.6/site-packages/subscription_manager/printing_utils.py.format_name  2267
/usr/lib/python3.6/site-packages/dateutil/parser.py.isnum  2403
/usr/lib64/python3.6/site-packages/subscription_manager/printing_utils.py.add_line  2420
/usr/lib64/python3.6/http/client.py._safe_read      3252
/usr/lib/python3.6/site-packages/dateutil/parser.py.<genexpr>  9861
/usr/lib64/python3.6/site-packages/subscription_manager/unicode_width.py.textual_width 21235
/usr/lib64/python3.6/site-packages/subscription_manager/unicode_width.py._interval_bisearch   124358
/usr/lib64/python3.6/site-packages/subscription_manager/unicode_width.py._ucp_width   124358
/usr/lib64/python3.6/site-packages/subscription_manager/unicode_width.py.<genexpr>   145593

I did just barely get this attached to the process in time, so a large part of what the program was doing was printing to the screen; we can see that there are a significant number of calls in this stack around printing text to the console. We know this as this is what subscription manager’s printing_utils.py library does.

Conclusion

bcc-tools provides a number of easy to use tools that allow you to quickly gain insight into the performance of your system with minimal efforts. As these tools maintain minimal overhead, their design fits well with the demands of modern production workloads. We encourage you to install the package and familiarize yourself with bcc-tools as it is now fully supported on x86_64!


執筆者紹介

Karl Abbott is a Senior Product Manager for Red Hat Enterprise Linux focused on the kernel and performance. Abbott has been at Red Hat for more than 15 years, previously working with customers in the financial services industry as a Technical Account Manager.

Read full bio