-
Products
JBoss Enterprise Middleware
Web Server Developer Studio Portfolio Edition JBoss Operations Network FuseSource Integration Products Web Framework Kit Application Platform Data Grid Portal Platform SOA Platform Business Rules Management System (BRMS) Data Services Platform Messaging JBoss Community or JBoss enterprise -
Solutions
By IT challenge
Application development Business process management Enterprise application integration Interoperability Operational efficiency Security VirtualizationMigration Center
Migrate to Red Hat Enterprise Linux Systems management Upgrading to Red Hat Enterprise Linux JBoss Enterprise Middleware IBM AIX to Red Hat Enterprise Linux HP-UX to Red Hat Enterprise Linux Solaris to Red Hat Enterprise Linux UNIX to Red Hat Enterprise Linux Start a conversation with Red Hat Migration services
Issue #2 December 2004
Features
- Better Living Through RPM, Part 2
- How Red Hat Got Its Name
- Red Hat Summit: Bringing the Heat to the Big Easy
- Imagine Choice
- Improving Usability: Principles and Steps for Better Software
- Geek Giving Guide
- From Source to Binary: The Inner Workings of GCC
- Configuring Devices with udev
- Tux Paint: Mousing Your Way to a Masterpiece
- Unlimited Anytime Minutes: GnomeMeeting, the Softphone
From the Inside
In each Issue
- Editor's Blog
- Red Hat Speaks
- Ask Shadowman
- Tips & Tricks
- Fedora Status Report
- Magazine Archive
- Contest
Feedback
Better Living Through RPM, Part 2
by Chip Turner
- Introduction
- Versions, Releases, and Epochs, Oh My
- The Details
- Simple Example
- Real World Example: CVSps
- Odds and Ends
- Conclusion
- About the Author
Introduction
In the first part of this series, the basic usage of RPM was explored, with particular emphasis on how to examine a system and modify the packages installed upon it. This article delves into slightly more advanced territory, into the very heart of RPM itself — the actual creation of RPMs.
To some, it might be immediately obvious why creating your own RPMs would be valuable, but it bears emphasis here because the highly deterministic deployment and maintenance of systems achievable with RPM is not immediately evident. This is a somewhat complicated way to say that RPM makes it very easy to know exactly what is installed on a system and reproduce other systems like it very quickly. Imagine you are the administrator of five servers, four of which are Web servers talking to the fifth server, a database server. Business soars, and you find you need to deploy a fifth Web server. If every piece of software running on those servers is in RPM format (either part of the OS itself, a third party, or created by yourself), it is very easy to set up a new box to match exactly what the other four Web servers look like. This is in stark contrast to other Linux distributions and other UNIX environments where packaging is not quite so rigorous. Of course, the key here is discipline — if you compile the software and then install it directly from source instead of using RPMs, then although the base OS is easily restored, the entire system won't be quite so easy to bring online.
The building of software, by its very nature, is more complicated than simply installing or removing software. Thus, this article is aimed at a slightly more advanced audience than the previous; in particular, it is aimed at system administrators who want to modify others' packages or create their own from scratch. That is not to say that it isn't of benefit to all users, but some assumptions about being familiar with compiling from source and other basic administrator tasks are assumed.
Though not emphasized in the previous article, there are actually two kinds of RPMs — source RPMs and binary RPMs. Source RPMs, or SRPMs for short, share many characteristics with binary RPMs; they have names, versions, and releases, they contain files inside them, and they can be queried with most of the familiar command line tools and options presented in the previous article. The main difference, though, is that (as the name implies) an SRPM contains the original source files used to create the binary RPMs instead of the actual compiled binaries.
Quite often, you find SRPMs available along side RPMs when downloading
software. Suffixed with .src.rpm instead of
.i386.rpm (or .noarch.rpm in
some cases), these SRPMs are more than just containers of source code. As
mentioned, they can be queried like normal binary RPMs, but they can also
be built directly with a single, consistent command. This is an important
concept — regardless of whether the SRPM contains a package that
uses configure and make install or some more esoteric compilation method,
the SRPM abstracts that away. In fact, it is this abstraction that this
article focuses on.
Every SRPM contains a spec file in addition to the actual source files inside of it. This spec file contains all of the information necessary to compile the source code into the binary RPMs, as well as other data about the resulting RPM such as the name, version information, and description. By far the most complicated part of the typical spec file are the parts related to the compilation of the source files. Anyone who has compiled a number of open source projects is well aware of the diversity of compilation methods and the wide range of maturity in the compile and install phases of most builds. It is for this reason that the spec file can sometimes be complex; RPM has a fairly specific idea of how it wants sources to become binaries, and so it is up to the author of the spec file to shepherd the build process to fit this.
Before showing a sample of a spec file, it is worth walking through the major sections of a build. Although technically some of these sections are optional, and you can sometimes get away with performing something in one step that technically belongs in another, the vast majority of spec files have each section and flow in the normal way.
The first step of a build is called the prep step. It is the
responsibility of the prep section to decompress and expand the source
tarball (if present), change into the directory contained in the tarball,
and apply any patches included in the RPM (if present). If building by
hand, this is the equivalent of the command tar zxfv
foo-1.2.3.tar.gz followed by cd
foo-1.2.3.
The second step is the build step. In this step, the source package,
already expanded and patched from the prep step, is now compiled.
Typically this corresponds to ./configure and
make steps familiar to most users. In addition, it is
usually a good idea at this time to run any tests that the software comes
with to ensure the build was successful.
The third step is the install step. As one might expect, this corresponds
to the make install step when compiling a package by
source, but with one very important exception. This step should install
the software into a build root. Put simply, a build root is just a
temporary subdirectory created while building the RPM and under which the
software will be installed, as opposed to installing under the root
partition. This is critical, so it bears repeating; the install step
should not put files where they would go if you were installing a piece of
software in the normal way, instead it should place them in a separate
directory. For example, if the make install would
normally put a file in /etc/sysconfig/network/,
inside a spec file it should put the file into
$RPM_BUILD_ROOT/etc/sysconfig/network/ (more on
$RPM_BUILD_ROOT later; for now, just think of it as
the directory created to be the shadow tree.
The install step is the last step that the author of the spec file has
explicit control over. The rest of the build process is RPM acting upon
data elsewhere in the spec file. The fourth step is when RPM does what is
called dependency discovery. In effect it walks over every file in
$RPM_BUILD_ROOT and examines each one in different
ways to determine if it can find if that file needs something else to work
properly. For example, if RPM finds a typical binary executable, it
determines what shared libraries it needs. Likewise, if it finds a script
executable, it figures out what scripting language to use by looking at
the first line (so it might find a script needing
/bin/bash or /usr/bin/perl, for
instance). RPM also notices shared libraries when it walks the build root
and flags the resulting RPM not as requiring those libraries but instead
providing them, thus perhaps satisfying the dependencies of some other
package.
The fifth step is when RPM takes the build root and places all of the files inside the binary RPM it is building. It also constructs the header, placing all metadata (name, description, dependencies, etc.) in the resulting binary RPM as well. Also resulting from the build, if building from a spec file and sources (as opposed to rebuilding an already built SRPM) is an SRPM containing the spec file, sources, and patches required for the build.
Versions, Releases, and Epochs, Oh My
Every package has associated with it two visible fields whose purpose is
to make it easier to tell, given another package of the same name, which
is newer. This seems fairly straight forward but it actually is very
important and the unambiguous ability to see if a package needs updating
or not is a very important thing to administrators. These two fields are
called the Version and
Release fields. Often
Version comes straight from the upstream
software that you are bundling (such as 2.4.21 for the kernel).
Release is best thought of as the
revision of the packaging of the upstream software itself. So the first
time you build kernel 2.4.21, you would probably use release 1; the next
time, you would use release 2, etc. There is, however, a third, invisible
field used for versioning that actually is compared before even
Version when determining if a package is
newer — the Epoch. Basically
Epoch is an integer that is basically
used when upstream versions change in such a way as to not compare
properly using RPM's version comparison algorithm.
Epoch's are one of those things that you
should never use in your own packaging unless you have a specific reason
and understand the possible abuses of epochs.
The Details
The building of RPMs, be it just rebuilding an SRPM or building from split
out spec file and sources, is accomplished through the
rpmbuild command line utility. The first and simplest
use of rpmbuild is rebuilding an SRPM. This is
accomplished by the rpmbuild --rebuild
foo-1.2.3-1.src.rpm command and is fairly straight forward in
what it does. It begins by extracting the contents of the SRPM, examining
the spec file and ensuring all dependencies are met (build dependencies
— software needed to build, not runtime dependencies, which your
package may subsequently need to actually run). After this, it begins the
build process described above by following the rules set forth in the spec
file.
The second way to use rpmbuild is directly on spec
files. This gives more control than from a SRPM and is the invocation one
uses when perfecting a spec file (make a change, attempt a build, make a
change, attempt a build, etc.). The first thing you can do with a spec
file and the sources is the obvious — build a binary RPM. This is
accomplished via the rpmbuild -ba foo.spec command.
Much like --rebuild, RPM verifies the spec file and
dependencies and then begins the build process described therein. Another
common use is to produce only the SRPM and not the binary RPM. This is
accomplished via the rpmbuild -bs foo.spec
command. Instead of following the spec file, though,
rpmbuild creates the SRPM.
Up until now, one detail that has been purposefully ignored is exactly
where files need to be located for all of this to work. As one might
imagine, since an RPM build involves a spec file, one or more source
files, and multiple patches, it isn't necessarily just a matter of tossing
everything in one directory. In fact, RPM uses a configurable layout,
based by default in /usr/src/redhat/:
/usr/src/redhat/ /usr/src/redhat/SOURCES /usr/src/redhat/SRPMS /usr/src/redhat/RPMS /usr/src/redhat/RPMS/noarch /usr/src/redhat/RPMS/x86_64 /usr/src/redhat/BUILD /usr/src/redhat/SPECS
In this layout, the spec file goes in the SPECS/
directory, the source tarballs and patches go in
SOURCES/, and (assuming a successful build) SRPMs and
RPMs end up in the SRPMS/ and
RPMS/<arch> directories, respectively.
A cardinal rule of package building is to never build as root. There are
a number of dangers with building as root, not the least of which is a bad
spec file could completely destroy the system the build is running on.
So, given that /usr/src/redhat/ is owned by root in a
default installation, how does one actually build as someone other than
root? The easiest way is to chown /usr/src/redhat to
the user you will build as. This is the approach we will use here, though
another approach is to configure RPM via a .rpmmacros
file to use a tree anywhere on your system.
Simple Example
So now that the theory has been described, an example is in order. Example 1, “simplest.spec” contains simplest.spec,
which is pretty much the simplest possible spec file. The first seven
lines describe data about the package (name, summary, and so forth).
Next comes the definition of a BuildRoot (referred earlier as
$RPM_BUILD_ROOT); this is a suitable default for
any spec file. The next statement,
BuildArch, says that this package is a
noarch package since it contains no files that are
architecture-specific; without this statement,
rpmbuild defaults to the architecture you are
building the package on. Next comes the
Description, which is free form
text.
Summary: A very simple package.
Name: simplest
Version: 1.0
Release: 1
License: GPL
Group: Development/Tools
URL: http://www.redhat.com/
BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root
BuildArch: noarch
%description
This is a very simple package to demonstrate an RPM build.
%prep
%build
%install
rm -rf $RPM_BUILD_ROOT
mkdir -p $RPM_BUILD_ROOT/etc
touch $RPM_BUILD_ROOT/etc/empty-file
%clean
rm -rf $RPM_BUILD_ROOT
%files
%defattr(-,root,root,-)
/etc/empty-file
simplest.spec
Now comes the sections previously described —
prep,
build, and
install. In this case, note that
prep and
build are completely empty. After all,
we have no source tarball to build from or patch to apply. The
install section is also fairly small.
First it clears out the $RPM_BUILD_ROOT (to ensure a
clean build with no risk of previous builds leaving droppings behind).
Second, it creates an etc/ subdirectory underneath
the build root, then touches a file in that dir. That's it; nothing more.
The end result is the buildroot containing a directory containing a single
empty file. The clean section represents what RPM does to clean up after
itself when a build is successful; this is almost always just deleting the
buildroot.
Lastly, we have the files section, which tells RPM which files in the buildroot should become part of the RPM. The files section contains not only the list of files but things like the owner and modes they should have (after all, if you don't build as root, you can't make files in the buildroot owned by root, so you specify here that the owner should be root and not the user you are building as). Files also are sometimes flagged as configuration files (which affects how RPM treats them across package upgrades) in this section.
To test this, copy the example into
/usr/src/redhat/SPECS/, and run rpmbuild -bs
simplest.spec to produce an SRPM or rpmbuild -ba
simplest.spec to make both an SRPM and an RPM. You can even
install the resulting package, if you wish, and watch as the thrilling
/etc/empty-file is created on your system.
Real World Example: CVSps
The simplest.spec example illustrates the basic
concepts of packaging, just as 'Hello, World!' illustrates the basic
concepts of a programming language. However, in both cases, the first
step is immediately met with a desire for a more complex example. We turn
our attention to a real world example.
CVSps is a handy utility that analyzes a given CVS repository and split the checkins into patchsets, much like other version control systems. CVS is limited in the way it works, though, and thus what is a native facility in version control systems like Subversion and Perforce requires an external “best guess” approach. CVSps does just that, and it does it pretty well. This example packages version 1.3.3 of CVSps.
Example 2, “CVSps spec File” shows the spec file. Note that it is barely any more complicated than simplest.spec. However, a few differences are immediately evident. First is the presence of two new headers early in the spec file, Source0 and Patch0. As the names imply, these are a source file and a patch file respectively, and as the numbering implies, you can have multiple sources and multiple patches in a given package.
Summary: A program to view patchsets of CVS checkins
Name: cvsps
Version: 1.3.3
Release: 1
URL: http://www.cobite.com/cvsps/
Source0: http://www.cobite.com/cvsps/%{name}-%{version}.tar.gz
Patch0: cvsps-1.3.1-fhs.patch
License: GPL
Group: Development/Tools
BuildRoot: %{_tmppath}/%{name}-root
%description
CVSps is a program for generating 'patchset' information from a CVS
repository. A patchset in this case is defined as a set of changes
made to a collection of files, and all committed at the same time
(using a single 'cvs commit' command). This information is valuable to
seeing the big picture of the evolution of a cvs project. While cvs
tracks revision information, it is often difficult to see what changes
were committed 'atomically' to the repository.
%prep
%setup
%patch0 -p1 -b .fhs
%build
make
%install
rm -Rf $RPM_BUILD_ROOT
make install prefix=$RPM_BUILD_ROOT/usr
%clean
rm -Rf $RPM_BUILD_ROOT
%files
%defattr(-,root,root)
/usr/bin/*
/usr/share/man/*/*
%changelog
* Sat May 22 2004 Chip Turner <cturner@redhat.com> - 1.3.3-1
- update to 1.3.3
* Sun Dec 16 2001 Chip Turner <cturner@redhat.com>
- Initial build.
Also note that the prep section in this
example is not empty. It contains two statements
(setup and
patch0), both of which at first glance
would appear to be sections of their own, but in fact they are commands to
rpmbuild. As discussed before, the
prep section is responsible for untarring
the sources and applying the patches. Here, it is easy enough to guess
that the patch0 statement applies our
patch, but it isn't obvious that setup
untars Source0. In fact it does, as well
as changing into the directory contained inside the tarball.
setup is a convenience of
rpmbuild. The -q
parameter tells rpmbuild to not show the contents of
the tarball as it is expanded. Other
setup parameters are listed in Table 1, “Parameters to setup”, though by far
-q and
-n are the most commonly seen in the
wild.
| Parameter | Description |
|---|---|
-n DIRNAME |
Directory name to change into (default:
%{name}-%{version}) |
-q
|
Expand source0 quietly |
-c
|
Expand source0 quietly |
-T
|
Skip default action (don't untar, usefil with -c) |
setup
For the moment, we will skip over the contents of the patch file in the
package, but the patch0 statement is what
applies the patch. The -p1 and
-b parameters are the same as seen on the
command line for the command line utility patch. The former strips one
leading directory from the files listed in the patch, and the later saves
a backup of all changed files before updating the originals. In this
case, the backups are suffixed with .lfs.
Next is the build section, which is quite simple and straight forward.
Because CVSps uses a standard Makefile and no GNU Autoconf configure
script, we run the make command. Also, as there are no
tests included with CVSps, we don't run them (often invoked through make
test, but much like the actual compilation steps of any given program,
this can vary widely).
The install phase is equally simple. Note that instead of simply
make install, a prefix is specified. This tells the
install routines where to deposit files. Although it is specified via
prefix=PATH in this case, often it is PREFIX=PATH or some other mechanism
completely. Again, consult the instructions for building any given piece
of software to determine exactly how it expects such settings.
Occasionally, simple or newer software will not have the capacity to
install anywhere besides the root file system. This is the most
complicated case and a time when patches are often necessary to teach the
software how to install into build roots. If you find such an example and
make the necessary changes, be sure to submit patches upstream to the
original project, as it is a welcome addition to any piece of software and
something that is of use to other users (even those not necessarily
building RPMs).
The files section is mostly as one would expect, with the exception of the use of wildcards to locate files. This can be a considerable time saver when a package build results in dozens, hundreds, or even thousands of files. A changelog finishes the file, indicating the packaging history (as well as showing quite a bit of neglect between the original package and it being updated to the latest version of the software). Changelog formats are fairly self explanatory, and it is of incredible importance to have changelogs, especially when sharing RPMs with others.
We now return our attention to the previously ignored patch file. We
begin first with the name of the patch:
cvsps-1.3.1-fhs.patch. It may seem odd at first that
the patch file has version 1.3.1 whereas the package has version 1.3.3,
but this is actually a fairly common convention. When you make a new
patch, including the version of the source file in the patch name is
useful so as to know when the patch came into existence and against which
source tree it was created. As time goes on and versions increase, there
is no need to change the version on the patch unless you have to change
the patch file to apply cleanly. So since the patch has not needed
modification, it remains listed as 1.3.1. The fhs
after the version is where a one or two word description of the patch
resides. In this case, fhs means Filesystem
Hierarchy Standard which is a standard adopted by UNIXes, notably many
popular Linux distributions.
The patch itself is small, modifying only one file, the
Makefile. Examining the patch, listed in Example 3, “CVSps Patch Example”, reveals only a few small changes. In effect,
this patch tells the Makefile to install manpages not
in /usr/man/ but in
/usr/share/man/, which is FHS compliant and, since
both Fedora Core and Red Hat Enterprise Linux are FHS-compliant
distributions, necessary.
--- cvsps-1.3.1/Makefile.lfs Thu Jun 27 11:02:46 2002 +++ cvsps-1.3.1/Makefile Thu Jun 27 11:03:02 2002 @@ -15,9 +15,9 @@ install: [ -d $(prefix)/bin ] || mkdir -p $(prefix)/bin - [ -d $(prefix)/man/man1 ] || mkdir -p $(prefix)/man/man1 + [ -d $(prefix)/share/man/man1 ] || mkdir -p $(prefix)/share/man/man1 install cvsps $(prefix)/bin - install -m 644 cvsps.1 $(prefix)/man/man1 + install -m 644 cvsps.1 $(prefix)/share/man/man1 clean: rm -f cvsps *.o cbtcommon/*.o core
It is easy enough to see how the patch is applied and what it is done, but
creating the patch can be a bit tricky, and the more patches in a package,
the trickier things become. RPM includes a utility, however, called
gendiff which makes this considerably easier. To use
gendiff, extract the pristine tarball into a directory
of your choosing. Go into this expanded tarball and copy each file you
want to edit to a different name, appending a common suffix to each file
(such as, in this case, .fhs). Now edit the original
files until you are satisfied with the changes, and change to the previous
directory into which you extracted the tarball. Now run
gendiff, passing it first the directory (such as
cvsps-1.3.3) and second the common suffix (such as
.fhs). gendiff then outputs a
unified diff, suitable for use either directly by the
patch program or by a spec file. Save this diff into
the SOURCES/ directory of your build root and
reference it in your spec file. There are many other ways of creating
diffs; you could copy the entire tree before making changes, for instance,
then use diff -Naur to diff the entire trees. Or you
could diff each file individually. However, the advantage of
gendiff is that it doesn't pick up files you don't
specifically want it to catch, and it easily allows for modifying a single
file or multiple files.
Odds and Ends
Now that you can make packages, it is generally a good idea to sign them,
especially if you plan to share them with others. The first step to
signing packages is to create a GPG key. This can be somewhat involved,
but basically you should run the gpg --gen-key command
and follow the default options. Once you have a GPG key, you must tell
RPM to use it and which key to use (you could have multiple keys, after
all; RPM must handle the general case). To do this, create a
.rpmmacros in your home directory and add the
following two lines:
%_signature gpg %_gpg_name email@example.com
where email@example.com is the email
address you used when creating your GPG key (also discoverable via
gpg --list-keys). Now run rpm --resign
/path/to/rpms/ to sign one or more packages in the directory
(both binary and source RPMs can be signed). To see what signatures have
been used to sign a package, run rpm -Kv on the RPM in
question. The lines referencing a DSA signature will have an eight digit
hex string that corresponds to the public key used to sign the
package.
Speaking of .rpmmacros, as mentioned earlier, you can
build from /usr/src/redhat/, but this can be
overridden. To change this, say to
/home/username/rpm/, add the following to your
.rpmmacros file:
%_topdir /home/username/rpm
That tells RPM that the top of its build root is
/home/username/rpm/ instead of
/usr/src/redhat/. Under this directory, go ahead and
create the subdirectories seen under
/usr/src/redhat/; RPM expects them in many cases and
fail at odd times if they aren't present.
Conclusion
No single article can teach everything about building RPM packages, and
this article has not attempted that. Instead, the goal has been to
provide sufficient information to understand what is going on when RPM is
building packages and get a couple of simple, extensible, and (most
importantly) understandable examples under your belt. Once the basics of
how rpmbuild works are understood, the complexities of
RPM are not quite so mysterious; this does leave the unfortunately
open-ended problem of the diversity of software, though, and packaging a
new piece of code almost always presents new challenges. Armed with the
understanding from this article and with practice, over time those
complexities and challenges will diminish and you will find yourself
packaging everything you can find, be it a few simple scripts or the
entire website for your company. And remember, as with most things open
source, the Art of Theft will serve you well — read as many spec
files as you can, and see what works and what doesn't. When it comes to
building RPMs, once the basic science is learned, the rest is art, and one
can always improve one's skills further.




