Issue #2 December 2004

Better Living Through RPM, Part 2

Better Living Through RPM, Part 2

Introduction

In the first part of this series, the basic usage of RPM was explored, with particular emphasis on how to examine a system and modify the packages installed upon it. This article delves into slightly more advanced territory, into the very heart of RPM itself — the actual creation of RPMs.

To some, it might be immediately obvious why creating your own RPMs would be valuable, but it bears emphasis here because the highly deterministic deployment and maintenance of systems achievable with RPM is not immediately evident. This is a somewhat complicated way to say that RPM makes it very easy to know exactly what is installed on a system and reproduce other systems like it very quickly. Imagine you are the administrator of five servers, four of which are Web servers talking to the fifth server, a database server. Business soars, and you find you need to deploy a fifth Web server. If every piece of software running on those servers is in RPM format (either part of the OS itself, a third party, or created by yourself), it is very easy to set up a new box to match exactly what the other four Web servers look like. This is in stark contrast to other Linux distributions and other UNIX environments where packaging is not quite so rigorous. Of course, the key here is discipline — if you compile the software and then install it directly from source instead of using RPMs, then although the base OS is easily restored, the entire system won't be quite so easy to bring online.

The building of software, by its very nature, is more complicated than simply installing or removing software. Thus, this article is aimed at a slightly more advanced audience than the previous; in particular, it is aimed at system administrators who want to modify others' packages or create their own from scratch. That is not to say that it isn't of benefit to all users, but some assumptions about being familiar with compiling from source and other basic administrator tasks are assumed.

Though not emphasized in the previous article, there are actually two kinds of RPMs — source RPMs and binary RPMs. Source RPMs, or SRPMs for short, share many characteristics with binary RPMs; they have names, versions, and releases, they contain files inside them, and they can be queried with most of the familiar command line tools and options presented in the previous article. The main difference, though, is that (as the name implies) an SRPM contains the original source files used to create the binary RPMs instead of the actual compiled binaries.

Quite often, you find SRPMs available along side RPMs when downloading software. Suffixed with .src.rpm instead of .i386.rpm (or .noarch.rpm in some cases), these SRPMs are more than just containers of source code. As mentioned, they can be queried like normal binary RPMs, but they can also be built directly with a single, consistent command. This is an important concept — regardless of whether the SRPM contains a package that uses configure and make install or some more esoteric compilation method, the SRPM abstracts that away. In fact, it is this abstraction that this article focuses on.

Every SRPM contains a spec file in addition to the actual source files inside of it. This spec file contains all of the information necessary to compile the source code into the binary RPMs, as well as other data about the resulting RPM such as the name, version information, and description. By far the most complicated part of the typical spec file are the parts related to the compilation of the source files. Anyone who has compiled a number of open source projects is well aware of the diversity of compilation methods and the wide range of maturity in the compile and install phases of most builds. It is for this reason that the spec file can sometimes be complex; RPM has a fairly specific idea of how it wants sources to become binaries, and so it is up to the author of the spec file to shepherd the build process to fit this.

Before showing a sample of a spec file, it is worth walking through the major sections of a build. Although technically some of these sections are optional, and you can sometimes get away with performing something in one step that technically belongs in another, the vast majority of spec files have each section and flow in the normal way.

The first step of a build is called the prep step. It is the responsibility of the prep section to decompress and expand the source tarball (if present), change into the directory contained in the tarball, and apply any patches included in the RPM (if present). If building by hand, this is the equivalent of the command tar zxfv foo-1.2.3.tar.gz followed by cd foo-1.2.3.

The second step is the build step. In this step, the source package, already expanded and patched from the prep step, is now compiled. Typically this corresponds to ./configure and make steps familiar to most users. In addition, it is usually a good idea at this time to run any tests that the software comes with to ensure the build was successful.

The third step is the install step. As one might expect, this corresponds to the make install step when compiling a package by source, but with one very important exception. This step should install the software into a build root. Put simply, a build root is just a temporary subdirectory created while building the RPM and under which the software will be installed, as opposed to installing under the root partition. This is critical, so it bears repeating; the install step should not put files where they would go if you were installing a piece of software in the normal way, instead it should place them in a separate directory. For example, if the make install would normally put a file in /etc/sysconfig/network/, inside a spec file it should put the file into $RPM_BUILD_ROOT/etc/sysconfig/network/ (more on $RPM_BUILD_ROOT later; for now, just think of it as the directory created to be the shadow tree.

The install step is the last step that the author of the spec file has explicit control over. The rest of the build process is RPM acting upon data elsewhere in the spec file. The fourth step is when RPM does what is called dependency discovery. In effect it walks over every file in $RPM_BUILD_ROOT and examines each one in different ways to determine if it can find if that file needs something else to work properly. For example, if RPM finds a typical binary executable, it determines what shared libraries it needs. Likewise, if it finds a script executable, it figures out what scripting language to use by looking at the first line (so it might find a script needing /bin/bash or /usr/bin/perl, for instance). RPM also notices shared libraries when it walks the build root and flags the resulting RPM not as requiring those libraries but instead providing them, thus perhaps satisfying the dependencies of some other package.

The fifth step is when RPM takes the build root and places all of the files inside the binary RPM it is building. It also constructs the header, placing all metadata (name, description, dependencies, etc.) in the resulting binary RPM as well. Also resulting from the build, if building from a spec file and sources (as opposed to rebuilding an already built SRPM) is an SRPM containing the spec file, sources, and patches required for the build.

Versions, Releases, and Epochs, Oh My

Every package has associated with it two visible fields whose purpose is to make it easier to tell, given another package of the same name, which is newer. This seems fairly straight forward but it actually is very important and the unambiguous ability to see if a package needs updating or not is a very important thing to administrators. These two fields are called the Version and Release fields. Often Version comes straight from the upstream software that you are bundling (such as 2.4.21 for the kernel). Release is best thought of as the revision of the packaging of the upstream software itself. So the first time you build kernel 2.4.21, you would probably use release 1; the next time, you would use release 2, etc. There is, however, a third, invisible field used for versioning that actually is compared before even Version when determining if a package is newer — the Epoch. Basically Epoch is an integer that is basically used when upstream versions change in such a way as to not compare properly using RPM's version comparison algorithm. Epoch's are one of those things that you should never use in your own packaging unless you have a specific reason and understand the possible abuses of epochs.

The Details

The building of RPMs, be it just rebuilding an SRPM or building from split out spec file and sources, is accomplished through the rpmbuild command line utility. The first and simplest use of rpmbuild is rebuilding an SRPM. This is accomplished by the rpmbuild --rebuild foo-1.2.3-1.src.rpm command and is fairly straight forward in what it does. It begins by extracting the contents of the SRPM, examining the spec file and ensuring all dependencies are met (build dependencies — software needed to build, not runtime dependencies, which your package may subsequently need to actually run). After this, it begins the build process described above by following the rules set forth in the spec file.

The second way to use rpmbuild is directly on spec files. This gives more control than from a SRPM and is the invocation one uses when perfecting a spec file (make a change, attempt a build, make a change, attempt a build, etc.). The first thing you can do with a spec file and the sources is the obvious — build a binary RPM. This is accomplished via the rpmbuild -ba foo.spec command. Much like --rebuild, RPM verifies the spec file and dependencies and then begins the build process described therein. Another common use is to produce only the SRPM and not the binary RPM. This is accomplished via the rpmbuild -bs foo.spec command. Instead of following the spec file, though, rpmbuild creates the SRPM.

Up until now, one detail that has been purposefully ignored is exactly where files need to be located for all of this to work. As one might imagine, since an RPM build involves a spec file, one or more source files, and multiple patches, it isn't necessarily just a matter of tossing everything in one directory. In fact, RPM uses a configurable layout, based by default in /usr/src/redhat/:

/usr/src/redhat/
/usr/src/redhat/SOURCES
/usr/src/redhat/SRPMS
/usr/src/redhat/RPMS
/usr/src/redhat/RPMS/noarch
/usr/src/redhat/RPMS/x86_64
/usr/src/redhat/BUILD
/usr/src/redhat/SPECS

In this layout, the spec file goes in the SPECS/ directory, the source tarballs and patches go in SOURCES/, and (assuming a successful build) SRPMs and RPMs end up in the SRPMS/ and RPMS/<arch> directories, respectively.

A cardinal rule of package building is to never build as root. There are a number of dangers with building as root, not the least of which is a bad spec file could completely destroy the system the build is running on. So, given that /usr/src/redhat/ is owned by root in a default installation, how does one actually build as someone other than root? The easiest way is to chown /usr/src/redhat to the user you will build as. This is the approach we will use here, though another approach is to configure RPM via a .rpmmacros file to use a tree anywhere on your system.

Simple Example

So now that the theory has been described, an example is in order. Example 1, “simplest.spec” contains simplest.spec, which is pretty much the simplest possible spec file. The first seven lines describe data about the package (name, summary, and so forth). Next comes the definition of a BuildRoot (referred earlier as $RPM_BUILD_ROOT); this is a suitable default for any spec file. The next statement, BuildArch, says that this package is a noarch package since it contains no files that are architecture-specific; without this statement, rpmbuild defaults to the architecture you are building the package on. Next comes the Description, which is free form text.

Summary: A very simple package.
Name: simplest
Version: 1.0
Release: 1
License: GPL
Group: Development/Tools
URL: http://www.redhat.com/

BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root
BuildArch: noarch

%description
This is a very simple package to demonstrate an RPM build.

%prep

%build

%install
rm -rf $RPM_BUILD_ROOT

mkdir -p $RPM_BUILD_ROOT/etc
touch $RPM_BUILD_ROOT/etc/empty-file

%clean
rm -rf $RPM_BUILD_ROOT

%files
%defattr(-,root,root,-)
/etc/empty-file
Example 1. simplest.spec

Now comes the sections previously described — prep, build, and install. In this case, note that prep and build are completely empty. After all, we have no source tarball to build from or patch to apply. The install section is also fairly small. First it clears out the $RPM_BUILD_ROOT (to ensure a clean build with no risk of previous builds leaving droppings behind). Second, it creates an etc/ subdirectory underneath the build root, then touches a file in that dir. That's it; nothing more. The end result is the buildroot containing a directory containing a single empty file. The clean section represents what RPM does to clean up after itself when a build is successful; this is almost always just deleting the buildroot.

Lastly, we have the files section, which tells RPM which files in the buildroot should become part of the RPM. The files section contains not only the list of files but things like the owner and modes they should have (after all, if you don't build as root, you can't make files in the buildroot owned by root, so you specify here that the owner should be root and not the user you are building as). Files also are sometimes flagged as configuration files (which affects how RPM treats them across package upgrades) in this section.

To test this, copy the example into /usr/src/redhat/SPECS/, and run rpmbuild -bs simplest.spec to produce an SRPM or rpmbuild -ba simplest.spec to make both an SRPM and an RPM. You can even install the resulting package, if you wish, and watch as the thrilling /etc/empty-file is created on your system.

Real World Example: CVSps

The simplest.spec example illustrates the basic concepts of packaging, just as 'Hello, World!' illustrates the basic concepts of a programming language. However, in both cases, the first step is immediately met with a desire for a more complex example. We turn our attention to a real world example.

CVSps is a handy utility that analyzes a given CVS repository and split the checkins into patchsets, much like other version control systems. CVS is limited in the way it works, though, and thus what is a native facility in version control systems like Subversion and Perforce requires an external “best guess” approach. CVSps does just that, and it does it pretty well. This example packages version 1.3.3 of CVSps.

Example 2, “CVSps spec File” shows the spec file. Note that it is barely any more complicated than simplest.spec. However, a few differences are immediately evident. First is the presence of two new headers early in the spec file, Source0 and Patch0. As the names imply, these are a source file and a patch file respectively, and as the numbering implies, you can have multiple sources and multiple patches in a given package.

Summary: A program to view patchsets of CVS checkins
Name: cvsps
Version: 1.3.3
Release: 1
URL: http://www.cobite.com/cvsps/
Source0: http://www.cobite.com/cvsps/%{name}-%{version}.tar.gz
Patch0: cvsps-1.3.1-fhs.patch
License: GPL
Group: Development/Tools
BuildRoot: %{_tmppath}/%{name}-root

%description
CVSps is a program for generating 'patchset' information from a CVS
repository. A patchset in this case is defined as a set of changes
made to a collection of files, and all committed at the same time
(using a single 'cvs commit' command). This information is valuable to
seeing the big picture of the evolution of a cvs project. While cvs
tracks revision information, it is often difficult to see what changes
were committed 'atomically' to the repository.

%prep
%setup
%patch0 -p1 -b .fhs

%build
make

%install
rm -Rf $RPM_BUILD_ROOT
make install prefix=$RPM_BUILD_ROOT/usr

%clean
rm -Rf $RPM_BUILD_ROOT

%files
%defattr(-,root,root)
/usr/bin/*
/usr/share/man/*/*


%changelog
* Sat May 22 2004 Chip Turner <cturner@redhat.com> - 1.3.3-1
- update to 1.3.3

* Sun Dec 16 2001 Chip Turner <cturner@redhat.com>
- Initial build.
Example 2. CVSps spec File

Also note that the prep section in this example is not empty. It contains two statements (setup and patch0), both of which at first glance would appear to be sections of their own, but in fact they are commands to rpmbuild. As discussed before, the prep section is responsible for untarring the sources and applying the patches. Here, it is easy enough to guess that the patch0 statement applies our patch, but it isn't obvious that setup untars Source0. In fact it does, as well as changing into the directory contained inside the tarball. setup is a convenience of rpmbuild. The -q parameter tells rpmbuild to not show the contents of the tarball as it is expanded. Other setup parameters are listed in Table 1, “Parameters to setup”, though by far -q and -n are the most commonly seen in the wild.

Parameter Description
-n DIRNAME Directory name to change into (default: %{name}-%{version})
-q Expand source0 quietly
-c Expand source0 quietly
-T Skip default action (don't untar, usefil with -c)
Table 1. Parameters to setup

For the moment, we will skip over the contents of the patch file in the package, but the patch0 statement is what applies the patch. The -p1 and -b parameters are the same as seen on the command line for the command line utility patch. The former strips one leading directory from the files listed in the patch, and the later saves a backup of all changed files before updating the originals. In this case, the backups are suffixed with .lfs.

Next is the build section, which is quite simple and straight forward. Because CVSps uses a standard Makefile and no GNU Autoconf configure script, we run the make command. Also, as there are no tests included with CVSps, we don't run them (often invoked through make test, but much like the actual compilation steps of any given program, this can vary widely).

The install phase is equally simple. Note that instead of simply make install, a prefix is specified. This tells the install routines where to deposit files. Although it is specified via prefix=PATH in this case, often it is PREFIX=PATH or some other mechanism completely. Again, consult the instructions for building any given piece of software to determine exactly how it expects such settings. Occasionally, simple or newer software will not have the capacity to install anywhere besides the root file system. This is the most complicated case and a time when patches are often necessary to teach the software how to install into build roots. If you find such an example and make the necessary changes, be sure to submit patches upstream to the original project, as it is a welcome addition to any piece of software and something that is of use to other users (even those not necessarily building RPMs).

The files section is mostly as one would expect, with the exception of the use of wildcards to locate files. This can be a considerable time saver when a package build results in dozens, hundreds, or even thousands of files. A changelog finishes the file, indicating the packaging history (as well as showing quite a bit of neglect between the original package and it being updated to the latest version of the software). Changelog formats are fairly self explanatory, and it is of incredible importance to have changelogs, especially when sharing RPMs with others.

We now return our attention to the previously ignored patch file. We begin first with the name of the patch: cvsps-1.3.1-fhs.patch. It may seem odd at first that the patch file has version 1.3.1 whereas the package has version 1.3.3, but this is actually a fairly common convention. When you make a new patch, including the version of the source file in the patch name is useful so as to know when the patch came into existence and against which source tree it was created. As time goes on and versions increase, there is no need to change the version on the patch unless you have to change the patch file to apply cleanly. So since the patch has not needed modification, it remains listed as 1.3.1. The fhs after the version is where a one or two word description of the patch resides. In this case, fhs means Filesystem Hierarchy Standard which is a standard adopted by UNIXes, notably many popular Linux distributions.

The patch itself is small, modifying only one file, the Makefile. Examining the patch, listed in Example 3, “CVSps Patch Example”, reveals only a few small changes. In effect, this patch tells the Makefile to install manpages not in /usr/man/ but in /usr/share/man/, which is FHS compliant and, since both Fedora Core and Red Hat Enterprise Linux are FHS-compliant distributions, necessary.

--- cvsps-1.3.1/Makefile.lfs	Thu Jun 27 11:02:46 2002
+++ cvsps-1.3.1/Makefile	Thu Jun 27 11:03:02 2002
@@ -15,9 +15,9 @@
 
 install:
 	[ -d $(prefix)/bin ] || mkdir -p $(prefix)/bin
-	[ -d $(prefix)/man/man1 ] || mkdir -p $(prefix)/man/man1
+	[ -d $(prefix)/share/man/man1 ] || mkdir -p $(prefix)/share/man/man1
 	install cvsps $(prefix)/bin
-	install -m 644 cvsps.1 $(prefix)/man/man1
+	install -m 644 cvsps.1 $(prefix)/share/man/man1
 
 clean:
 	rm -f cvsps *.o cbtcommon/*.o core
Example 3. CVSps Patch Example

It is easy enough to see how the patch is applied and what it is done, but creating the patch can be a bit tricky, and the more patches in a package, the trickier things become. RPM includes a utility, however, called gendiff which makes this considerably easier. To use gendiff, extract the pristine tarball into a directory of your choosing. Go into this expanded tarball and copy each file you want to edit to a different name, appending a common suffix to each file (such as, in this case, .fhs). Now edit the original files until you are satisfied with the changes, and change to the previous directory into which you extracted the tarball. Now run gendiff, passing it first the directory (such as cvsps-1.3.3) and second the common suffix (such as .fhs). gendiff then outputs a unified diff, suitable for use either directly by the patch program or by a spec file. Save this diff into the SOURCES/ directory of your build root and reference it in your spec file. There are many other ways of creating diffs; you could copy the entire tree before making changes, for instance, then use diff -Naur to diff the entire trees. Or you could diff each file individually. However, the advantage of gendiff is that it doesn't pick up files you don't specifically want it to catch, and it easily allows for modifying a single file or multiple files.

Odds and Ends

Now that you can make packages, it is generally a good idea to sign them, especially if you plan to share them with others. The first step to signing packages is to create a GPG key. This can be somewhat involved, but basically you should run the gpg --gen-key command and follow the default options. Once you have a GPG key, you must tell RPM to use it and which key to use (you could have multiple keys, after all; RPM must handle the general case). To do this, create a .rpmmacros in your home directory and add the following two lines:

%_signature gpg
%_gpg_name email@example.com

where email@example.com is the email address you used when creating your GPG key (also discoverable via gpg --list-keys). Now run rpm --resign /path/to/rpms/ to sign one or more packages in the directory (both binary and source RPMs can be signed). To see what signatures have been used to sign a package, run rpm -Kv on the RPM in question. The lines referencing a DSA signature will have an eight digit hex string that corresponds to the public key used to sign the package.

Speaking of .rpmmacros, as mentioned earlier, you can build from /usr/src/redhat/, but this can be overridden. To change this, say to /home/username/rpm/, add the following to your .rpmmacros file:

%_topdir        /home/username/rpm

That tells RPM that the top of its build root is /home/username/rpm/ instead of /usr/src/redhat/. Under this directory, go ahead and create the subdirectories seen under /usr/src/redhat/; RPM expects them in many cases and fail at odd times if they aren't present.

Conclusion

No single article can teach everything about building RPM packages, and this article has not attempted that. Instead, the goal has been to provide sufficient information to understand what is going on when RPM is building packages and get a couple of simple, extensible, and (most importantly) understandable examples under your belt. Once the basics of how rpmbuild works are understood, the complexities of RPM are not quite so mysterious; this does leave the unfortunately open-ended problem of the diversity of software, though, and packaging a new piece of code almost always presents new challenges. Armed with the understanding from this article and with practice, over time those complexities and challenges will diminish and you will find yourself packaging everything you can find, be it a few simple scripts or the entire website for your company. And remember, as with most things open source, the Art of Theft will serve you well — read as many spec files as you can, and see what works and what doesn't. When it comes to building RPMs, once the basic science is learned, the rest is art, and one can always improve one's skills further.

About the Author

Chip Turner has been with Red Hat for three years and is the Lead Architect for Red Hat Network. He also maintains the perl package, all perl modules, and spamassassin for Fedora Core and Red Hat Enterprise Linux as well as authors the RPM2 perl bindings for RPM and a handful of other CPAN perl modules. In his spare time he enjoys playing with his dog and arguing for no apparent point.