Image

When installing software on a Linux system, your package manager keeps track of what's installed, what it's dependent upon, what it provides, and much more.
The usual way to look at that metadata is through your package manager. In the case of Fedora or Red Hat Enterprise Linux, it is the RPM database.
The RPM database can be queried from the command line with the rpm
command, which supports some very nice formatting options. For example, to get a list of all packages sorted by size, I can use a little bit of Bash glue to do the following:
$ rpm -qa --queryformat "%{NAME}-%{VERSION} %{SIZE}\n"|sort -r -k 2 -n
...
linux-firmware-20200421 256914779
conda-4.8.4 228202733
glibc-all-langpacks-2.29 217752640
docker-ce-cli-19.03.12 168609825
clang-libs-8.0.0 117688777
wireshark-cli-3.2.3 108810552
llvm-libs-8.0.0 108310728
docker-ce-19.03.12 106517368
ansible-2.9.9 102477070
The rpm
command has many options, but what if I want to format the numbers in the output? Or be able to display the results with scrolling? Or integrate RPM output into other applications?
Once you've written a script to manage the query, what about unit testing the code? Bash is not exactly good at that.
[ Download the Bash shell scripting cheat sheet to keep the basics close at hand. ]
This is where other languages like Python shine. In this tutorial, I demonstrate how to:
unittest
I'll cover a lot here, so basic knowledge of Python (for example, object-oriented, or OO, programming) is necessary. You should be able to grasp how to create the Python class even if you don't know much about OO programming. (I will keep out advanced features, like virtual classes, data classes, and other features.)
Also, you should know what the RPM database is. Even if you don't know much, the code is simple to follow, and the boilerplate code is small.
There are several chapters dedicated to interacting with the RPM database in the Fedora documentation. However, in this article, I'll write a simple program that prints a list of RPM packages sorted by size.
Python is a great default language for system administrators, and the RPM database comes with bindings that make it easy to query with Python.
[ Sign up for the free online course Red Hat Enterprise Linux Technical Overview. ]
For this tutorial to work, you need to install the python3-rpm
package.
RPM has deep ties with the system. That's one of the reasons it's not offered through the pip
command and the PyPi module repository.
Install the python3-rpm
package with the dnf
command instead:
$ sudo dnf install -y python3-rpm
Finally, clone the code for this tutorial:
$ git clone git@github.com:josevnz/tutorials.git
$ cd rpm_query
Python has a feature called virtual environments. A virtual environment provides a sandbox location to:
For this tutorial, use a virtual environment with the following features:
rpm_query
.python3-rpm
, and it's not available through pip
. (You can provide access to site packages using the --system-site-packages
option.)Create and activate it like this:
$ python3 -m venv --system-site-packages ~/virtualenv/rpm_query
. ~/virtualenv/rpm_query/bin/activate
For this example application, I'll wrap the RPM instance in a context manager. This makes it easier to use without worrying about manually closing the RPM database. Also, I'll take a few shortcuts to return the result of the database query. Specifically, I'm limiting the number of results and managing sorting.
[ Learn 16 steps for building production-ready Kubernetes clusters. ]
Putting all this functionality into a class (a collection of data and methods) together is what makes object orientation so useful. In this example, the RPM functionality is in a class called QueryHelper
and its purpose is:
First, use the QueryHelper
class to get a list of a maximum of five packages, sorted by size:
from reporter.rpm_query import QueryHelper
with QueryHelper(limit=5, sorted_val=True) as rpm_query:
for package in rpm_query:
print(f"{package['name']}-{package['version']}: {package['size']:,.0f}")
You can use a Python feature called named arguments, which makes using the class much easier.
What if you are happy with the default arguments? Not a problem:
from reporter.rpm_query import QueryHelper
with QueryHelper() as rpm_query:
for package in rpm_query:
print(f"{package['name']}-{package['version']}: {package['size']:,.0f}")
Here is how to implement it:
rpm
needs to be installed.__get__
function takes care of returning the results sorted or as-is. Pass a reference to this function to the code that queries the database.QueryHelper
class, with named parameters.QueryHelper.__enter__
method.
"""
Wrapper around RPM database
"""
import sys
from typing import Any
try:
import rpm
except ModuleNotFoundError:
print((
"You must install the following package:\n"
"sudo dnf install -y python3-rpm\n"
"'rpm' doesn't come as a pip but as a system dependency.\n"
), file=sys.stderr)
raise
def __get__(is_sorted: bool, dbMatch: Any) -> Any:
"""
If is_sorted is true then sort the results by item size in bytes, otherwise
return 'as-is'
:param is_sorted:
:param dbMatch:
:return:
"""
if is_sorted:
return sorted(
dbMatch,
key=lambda item: item['size'], reverse=True)
return dbMatch
class QueryHelper:
MAX_NUMBER_OF_RESULTS = 10_000
def __init__(self, *, limit: int = MAX_NUMBER_OF_RESULTS, name: str = None, sorted_val: bool = True):
"""
:param limit: How many results to return
:param name: Filter by package name, if any
:param sorted_val: Sort results
"""
self.ts = rpm.TransactionSet()
self.name = name
self.limit = limit
self.sorted = sorted_val
def __enter__(self):
"""
Returns list of items on the RPM database
:return:
"""
if self.name:
db = self.db = self.ts.dbMatch("name", self.name)
else:
db = self.db = self.ts.dbMatch()
count = 0
for package in __get__(self.sorted, db):
if count >= self.limit:
break
yield package
count += 1
def __exit__(self, exc_type, exc_val, exc_tb):
self.ts.closeDB()
A few things to note:
limit: int
is an integer)? This helps an integrated development environment (IDE) like PyCharm provide auto-completion for you and your users. Python doesn't require it, but it is a good practice.Here's an important question: How good is this code without testing?
It's a good idea to automate testing code. Unit testing is different from other types of testing. Overall, it makes an application more robust because it ensures even minor components behave correctly (and it works better if it's run after every change).
In this case, Python unittest is nothing more than a class that automates testing for whether a function behaves correctly.
I wrote a unit test for the reporter.rpm_query.QueryHelper
class:
"""
Unit tests for the QueryHelper class
How to write unit tests: https://docs.python.org/3/library/unittest.html
"""
import os
import unittest
from reporter.rpm_query import QueryHelper
DEBUG = True if os.getenv("DEBUG_RPM_QUERY") else False
class QueryHelperTestCase(unittest.TestCase):
def test_default(self):
with QueryHelper() as rpm_query:
for package in rpm_query:
self.assertIn('name', package, "Could not get 'name' in package?")
def test_get_unsorted_counted_packages(self):
"""
Test retrieval or unsorted counted packages
:return:
"""
LIMIT = 10
with QueryHelper(limit=LIMIT, sorted_val=False) as rpm_query:
count = 0
for package in rpm_query:
count += 1
self.assertIn('name', package, "Could not get 'name' in package?")
self.assertEqual(LIMIT, count, f"Limit ({count}) did not worked!")
def test_get_all_packages(self):
"""
Default query is all packages, sorted by size
:return:
"""
with QueryHelper() as rpm_query:
previous_size = 0
previous_package = None
for package in rpm_query:
size = package['size']
if DEBUG:
print(f"name={package['name']} ({size}) bytes")
self.assertIn('name', package, "Could not get 'name' in package?")
if previous_size > 0:
self.assertGreaterEqual(
previous_size,
size,
f"Returned entries not sorted by size in bytes ({previous_package}, {package['name']})!")
previous_size = size
previous_package = package['name']
def test_get_named_package(self):
"""
Test named queries
:return:
"""
package_name = "glibc-common"
with QueryHelper(name=package_name, limit=1) as rpm_query:
found = 0
for package in rpm_query:
self.assertIn('name', package, "Could not get 'name' in package?")
if DEBUG:
print(f"name={package['name']}, version={package['version']}")
found += 1
self.assertGreater(found, 0, f"Could not find a single package with name {package_name}")
if __name__ == '__main__':
unittest.main()
I strongly recommend reading the official unittest documentation for more details, as there's so much more to unit testing than this simple code conveys. For example, there is mock testing, which is particularly useful for complex system-dependency scenarios.
You can write a small CLI using the QueryHelper
class created earlier, and to make it easier to customize, use argparse.
Argparse allows you to:
validator
on the reporter/init.py
module. You could also do it in the class constructor (but I wanted to show this feature). You can use it to add extra logic not present in the original code:
def __is_valid_limit__(limit: str) -> int:
try:
int_limit = int(limit)
if int_limit <= 0:
raise ValueError(f"Invalid limit!: {limit}")
return int_limit
except ValueError:
raise
help=
, --help
flag).Using this class to query the RPM database becomes very easy by parsing the options and then calling QueryHelper:
#!/usr/bin/env python
"""
# rpmq_simple.py - A simple CLI to query RPM sizes on your system
Author: Jose Vicente Nunez
"""
import argparse
import textwrap
from reporter import __is_valid_limit__
from reporter.rpm_query import QueryHelper
if __name__ == "__main__":
parser = argparse.ArgumentParser(description=textwrap.dedent(__doc__))
parser.add_argument(
"--limit",
type=__is_valid_limit__, # Custom limit validator
action="store",
default=QueryHelper.MAX_NUMBER_OF_RESULTS,
help="By default results are unlimited but you can cap the results"
)
parser.add_argument(
"--name",
type=str,
action="store",
help="You can filter by a package name."
)
parser.add_argument(
"--sort",
action="store_false",
help="Sorted results are enabled bu default, but you fan turn it off"
)
args = parser.parse_args()
with QueryHelper(
name=args.name,
limit=args.limit,
sorted_val=args.sort
) as rpm_query:
current = 0
for package in rpm_query:
if current >= args.limit:
break
print(f"{package['name']}-{package['version']}: {package['size']:,.0f}")
current += 1
So how does the output look now? Ask for all installed RPMs, sorted and limited to the first 20 entries:
$ rpmqa_simple.py --limit 20
linux-firmware-20210818: 395,099,476
code-1.61.2: 303,882,220
brave-browser-1.31.87: 293,857,731
libreoffice-core-7.0.6.2: 287,370,064
thunderbird-91.1.0: 271,239,962
firefox-92.0: 266,349,777
glibc-all-langpacks-2.32: 227,552,812
mysql-workbench-community-8.0.23: 190,641,403
java-11-openjdk-headless-11.0.13.0.8: 179,469,639
iwl7260-firmware-25.30.13.0: 148,167,043
docker-ce-cli-20.10.10: 145,890,250
google-noto-sans-cjk-ttc-fonts-20190416: 136,611,853
containerd.io-1.4.11: 113,368,911
ansible-2.9.25: 101,106,247
docker-ce-20.10.10: 100,134,880
ibus-1.5.23: 90,840,441
llvm-libs-11.0.0: 87,593,600
gcc-10.3.1: 84,899,923
cldr-emoji-annotation-38: 80,832,870
kernel-core-5.14.12: 79,447,964
There's a lot of information in this article. I covered:
In my next article, I'll explore packaging an application to install on another machine with Python.
Proud dad and husband, software developer and sysadmin. Recreational runner and geek. More about me