In Red Hat Enterprise Linux 8.1, Red Hat ships a set of fully supported on x86_64 dynamic kernel tracing tools, called bcc-tools, that make use of a kernel technology called extended Berkeley Packet Filter (eBPF). With these tools, you can quickly gain insight into certain aspects of system performance that would have previously required more time and effort from the system and operator.
The eBPF technology allows dynamic kernel tracing without requiring kernel modules (like systemtap) or rebooting of the kernel (as with debug kernels). eBPF accomplishes this while maintaining minimal overhead for each trace point, making these tools an ideal way to instrument running kernels in production.
To ensure that an eBPF program will not harm the running kernel, tools built on eBPF go through the following process when instantiated by root on the command line:
The program is compiled into eBPF bytecode.
Loaded into the kernel.
Run through a technology called the eBPF verifier to ensure that the program will not harm the running kernel.
Upon passing the verifier, it begins execution. If it does not pass the verifier, the code is unloaded and does not execute.
That said, bear in mind that you are still inserting tracing and some system calls are called significantly more than others, so depending on what you are tracing, there may be increased overhead.
If you are interested in more information on eBPF in general, please see Stanislav Kozina’s blog: Introduction to eBPF in Red Hat Enterprise Linux 7.
bcc-tools in RHEL 7 and RHEL 8. With RHEL 8.1,
bcc-tools has gone fully supported on x86. To install
bcc-tools on RHEL 7 (7.6+) and RHEL 8, run
yum install as root:
yum install bcc-tools
Before we dive into the different types of tools that are included in
bcc-tools, it’s important to note a few things:
All of these tools live in
These tools must run as the root user as any eBPF program can read kernel data. As such, injecting eBPF bytecode as a regular user is not allowed in RHEL 8.1.
Each tool has a man page. To view the man page, run
man <tool name>. These man pages include descriptions of the tools, provide the options that can be called, and have information on the expected overhead of the specific tool.
Since there are a lot of tools in
bcc-tools, I’m going to divide the tools into the following classes and then we’ll dive into each class:
Top Up with bcc-tools
The first class of programs that I’d like to talk about are the snoops. These programs give real time data from the kernel on:
Most of the snoop programs have negligible overhead, unless your application is hitting a large amount of the calls that you are tracing. An exception to this though is the failed dcache lookup tool,
dcsnoop, which will exhibit noticeable to significant overhead.
All of the Snoops run in the same manner. Run the snoop as root and watch in real time what is happening on your system!
Let’s look at what this looks like with
In one terminal, I’ll run:
[karl1@rhel-81 ~]$ cd /usr/share/bcc/tools [karl1@rhel-81 tools]$ sudo ./killsnoop
In another terminal, I’ll log in with a Bash shell, manually invoke a new Bash shell, determine the PID of my new shell, and then issue a
kill -0 and a
kill -9 to that PID:
bluenote:~ karl1$ ssh karl1@rhel8-vm karl1@rhel8-vm's password: Activate the web console with: systemctl enable --now cockpit.socket Last login: Wed Jul 17 00:50:14 2019 from 172.16.96.13 [karl1@rhel-81 ~]$ bash [karl1@rhel-81 ~]$ echo $$ 3693 [karl1@rhel-81 ~]$ kill -0 3693 [karl1@rhel-81 ~]$ kill -9 3693 Killed [karl1@rhel-81 ~]$ echo $$ 3653 [karl1@rhel-81 ~]$
The corresponding output from
[karl1@rhel-81 tools]$ sudo ./killsnoop TIME PID COMM SIG TPID RESULT 00:54:20 3693 bash 0 3693 0 00:54:24 3693 bash 9 3693 0
As you can see there, it has captured that the Bash command issued a
kill -0 and a
kill -9 to the
pid 3693 and both times was successful. The PID and COMM columns represent the process doing the killing. The TPID column represents the process id on the receiving end of the
The snoops are very effective at being able to monitor the system in real-time!
The next class of programs that we have made available in bcc-tools is the latency detectors. They do the following things:
Let’s look at biolatency. In one terminal:
Now, in another terminal, let's generate some I/O:
At this point, we will hit Ctrl-C on the biolatency process and it prints the following:
This shows the distribution of I/O over the recorded time by number of I/Os and how long they took in microseconds. As we can see, most of the I/O took between 262,144 and 524,287 microseconds.
These tools can be really helpful when trying to determine sources of latency on a running system.
The tools that make up the "slower" family show us calls that execute longer than a defined threshold for a number of different areas. They are:
Each of these tools follows a similar pattern, so let’s take a look at
xfsslower, since I have XFS filesystems on my virtual machine and they are the default filesystem in RHEL 8.
First, let’s start
xfsslower with no arguments:
[karl1@rhel-81 tools]$ sudo ./xfsslower Tracing XFS operations slower than 10 ms TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME
In another terminal, generate some load by running:
for i in $(seq 1 80); do seq 3 | xargs -P0 -n1 md5sum /dev/zero & done
This starts 240 processes that are doing
/dev/zero, which is a never ending file, so this is going to generate significant CPU load.
Next, start a
[karl1@rhel-81 ~]$ dd if=/dev/urandom of=bigfile bs=1024 count=1000000
We start to get some outliers in the latency data as some of our writes are taking a good deal longer than 10ms:
13:30:26 b'dd' 7294 W 1024 715 2629.01 b'bigfile' 13:30:31 b'dd' 7294 W 1024 3944 2647.03 b'bigfile' 13:31:05 b'dd' 7294 W 1024 22836 2638.03 b'bigfile' 13:31:08 b'dd' 7294 W 1024 24393 2642.01 b'bigfile' 13:31:24 b'dd' 7294 W 1024 34187 2634.02 b'bigfile' 13:31:26 b'(tmpfiles)' 7375 R 256 0 406.96 b'systemd-tmpfiles' 13:31:29 b'systemd-tmpf 7375 R 35 0 2316.90 b'cryptsetup.conf' 13:31:29 b'systemd-tmpf 7375 R 365 0 54.90 b'dbus.conf' 13:31:29 b'systemd-tmpf 7375 R 164 0 32.96 b'dnf.conf' 13:31:29 b'systemd-tmpf 7375 R 515 0 10.95 b'etc.conf' 13:31:29 b'systemd-tmpf 7375 R 362 0 10.94 b'home.conf' 13:31:29 b'systemd-tmpf 7375 R 1096 0 10.96 b'journal-nocow.conf' 13:31:29 b'systemd-tmpf 7375 R 841 0 21.81 b'legacy.conf' 13:31:29 b'systemd-tmpf 7375 R 30 0 10.93 b'libselinux.conf' 13:31:29 b'systemd-tmpf 7375 R 61 0 10.96 b'lvm2.conf' 13:31:29 b'systemd-tmpf 7375 R 35 0 21.96 b'man-db.conf' 13:31:29 b'systemd-tmpf 7375 R 34 0 10.95 b'openssh.conf' 13:31:29 b'systemd-tmpf 7375 R 98 0 10.96 b'pam.conf' 13:31:29 b'systemd-tmpf 7375 R 96 0 10.96 b'portables.conf' 13:31:29 b'systemd-tmpf 7375 R 22 0 21.96 b'rpm.conf' 13:31:29 b'systemd-tmpf 7375 R 137 0 21.96 b'selinux-policy.conf' 13:31:29 b'systemd-tmpf 7375 R 33 0 10.95 b'subscription-manager.conf' 13:31:29 b'systemd-tmpf 7375 R 305 0 22.94 b'sudo.conf' 13:31:30 b'systemd-tmpf 7375 R 596 0 22.95 b'systemd-nologin.conf' 13:31:30 b'systemd-tmpf 7375 R 1676 0 11.03 b'systemd.conf' 13:31:30 b'systemd-tmpf 7375 R 757 0 10.86 b'tmp.conf' 13:31:30 b'systemd-tmpf 7375 R 56 0 54.87 b'tuned.conf' 13:31:30 b'systemd-tmpf 7375 R 568 0 21.96 b'var.conf' 13:31:30 b'systemd-tmpf 7375 R 655 0 10.45 b'x11.conf' 13:31:32 b'dd' 7294 W 1024 38913 2657.04 b'bigfile' 13:31:40 b'dd' 7294 W 1024 43233 2647.02 b'bigfile'
You will notice some
systemd file reads in addition to the
dd command, their latencies are also outliers based on a 10ms metric.
Being able to get this data in realtime from the kernel without having to code anything is very nice for being able to start looking at where the problems are on a system.
One thing to keep in mind with the Slower tools is that overhead can become an issue depending on how loaded the system is. You will want to try profiling a system prior to putting load on it to make sure that these tools will not be problematic in your situation.
Top up with bcc-tools
Everybody knows about
top and maybe you even know about
powertop, but get ready to top-up! We have:
Let’s take a look at cachetop. To get started, I will start
[karl1@rhel-81 tools]$ sudo ./cachetop
And then in another terminal, I will start our
[karl1@rhel-81 ~]$ dd if=/dev/urandom of=bigfile bs=1024 count=1000000
Here is what shows up in
04:51:56 Buffers MB: 0 / Cached MB: 712 / Sort: HITS / Order: ascending PID UID CMD HITS MISSES DIRTIES READ_HIT% WRITE_HIT% 12637 root cachetop 1 0 0 100.0% 0.0% 1888 gdm gsd-color 2 0 0 100.0% 0.0% 7443 karl1 gsd-color 2 0 0 100.0% 0.0% 12639 karl1 pool 6 0 0 100.0% 0.0% 12638 karl1 dd 31 187077 0 0.0% 50.0% 7546 karl1 pool 48 217 0 18.1% 0.4% 12640 karl1 wal-checkpoint 50 25 0 66.7% 0.0% 12642 karl1 wal-checkpoint 50 25 0 66.7% 0.0% 12644 karl1 wal-checkpoint 50 25 0 66.7% 0.0% 12648 karl1 wal-checkpoint 50 25 0 66.7% 0.0% 12650 karl1 wal-checkpoint 52 26 0 66.7% 0.0% 12646 karl1 wal-checkpoint 58 29 0 66.7% 0.0% 12638 karl1 bash 128 0 0 100.0% 0.0% 7539 karl1 tracker-miner-f 246 0 0 100.0% 0.0%
It makes sense that dd has a pretty high miss rate given that it’s reading from
The language tracers that have been implemented in bcc-tools will incur some overhead on the actual programs being run in the language you are working with. That said, we ship with the following tools in this space:
Since I am a Python programmer and
subscription-manager is written in Python, I will demonstrate the use of
pythoncalls to show which calls are most frequently made by Python during a run of
subscription-manager list --available.
To trace the Python calls, you have to have the PID that you wish to attach to. As subscription-manager is pretty quick at execution, I use
ps in one terminal to quickly grab the PID and then after starting:
[root@rhel-81 ~]# subscription-manager list --available
I quickly switch and run in another terminal:
[karl1@rhel-81 tools]$ sudo ./pythoncalls 13746 Tracing calls in process 13746 (language: python)... Ctrl-C to quit.
Once subscription-manager completes, I Ctrl-C pythoncalls and then here is the type of output that I get. I’m only showing the most called items here as there is a lot of output:
/usr/lib64/python3.6/site-packages/subscription_manager/printing_utils.py.ljust_wide 574 /usr/lib/python3.6/site-packages/dateutil/tz/tz.py._naive_is_dst 858 /usr/lib/python3.6/site-packages/dateutil/tz/tz.py._datetime_to_timestamp 858 /usr/lib/python3.6/site-packages/dateutil/parser.py.get_token 1165 /usr/lib/python3.6/site-packages/dateutil/parser.py.__next__ 1165 /usr/lib/python3.6/site-packages/dateutil/parser.py.isword 1172 /usr/lib64/python3.6/http/client.py._get_chunk_left 1627 /usr/lib64/python3.6/http/client.py._read_next_chunk_size 1627 /usr/lib64/python3.6/site-packages/subscription_manager/printing_utils.py.highlight_by_filter_string_columnize_cb 1696 /usr/lib64/python3.6/site-packages/subscription_manager/printing_utils.py.format_name 2267 /usr/lib/python3.6/site-packages/dateutil/parser.py.isnum 2403 /usr/lib64/python3.6/site-packages/subscription_manager/printing_utils.py.add_line 2420 /usr/lib64/python3.6/http/client.py._safe_read 3252 /usr/lib/python3.6/site-packages/dateutil/parser.py.<genexpr> 9861 /usr/lib64/python3.6/site-packages/subscription_manager/unicode_width.py.textual_width 21235 /usr/lib64/python3.6/site-packages/subscription_manager/unicode_width.py._interval_bisearch 124358 /usr/lib64/python3.6/site-packages/subscription_manager/unicode_width.py._ucp_width 124358 /usr/lib64/python3.6/site-packages/subscription_manager/unicode_width.py.<genexpr> 145593
I did just barely get this attached to the process in time, so a large part of what the program was doing was printing to the screen; we can see that there are a significant number of calls in this stack around printing text to the console. We know this as this is what subscription manager’s
printing_utils.py library does.
bcc-tools provides a number of easy to use tools that allow you to quickly gain insight into the performance of your system with minimal efforts. As these tools maintain minimal overhead, their design fits well with the demands of modern production workloads. We encourage you to install the package and familiarize yourself with
bcc-tools as it is now fully supported on x86_64!