-
Products
JBoss Enterprise Middleware
Web Server Developer Studio Portfolio Edition JBoss Operations Network FuseSource Integration Products Web Framework Kit Application Platform Data Grid Portal Platform SOA Platform Business Rules Management System (BRMS) Data Services Platform Messaging JBoss Community or JBoss enterprise -
Solutions
By IT challenge
Application development Business process management Enterprise application integration Interoperability Operational efficiency Security VirtualizationMigration Center
Migrate to Red Hat Enterprise Linux Systems management Upgrading to Red Hat Enterprise Linux JBoss Enterprise Middleware IBM AIX to Red Hat Enterprise Linux HP-UX to Red Hat Enterprise Linux Solaris to Red Hat Enterprise Linux UNIX to Red Hat Enterprise Linux Start a conversation with Red Hat Migration services
Issue #4 February 2005
Features
- Introducing Enterprise Linux 4
- Videos: Hear what our partners have to say
- SELinux now integrated into Enterprise Linux 4
- Oracle products certified
- Demo: Take the Red Hat Network virtual tour
- How I learned to stop worrying and love the command line, part 1
- Xen, Virtualization on Linux
- Red Hat launches new Cool Stuff store
- Building the patent commons
From the Inside
In each Issue
- Editor's blog
- Red Hat speaks
- Ask Shadowman
- Tips & tricks
- Fedora status report
- Magazine archive
- Contest
Feedback
How I learned to stop worrying and love the command line, part 1
by Chip Turner
- Introduction
- The players
- Stringing commands together
- find
- A taste of perl
- Conclusion
- About the author
Introduction
You've always been told to write maintainable code. All of those fancy books on Extreme Programming and every computer science course you've ever had has emphasized commenting and clarity and all of those other broccoli-is-good-for-you-so-clean-your-plate directives. This article, and its second half, are about the opposite of that—unreadable code, inscrutable code, and disposable code. But, also, indispensable code. It is the editor we will use that will be the dominating factor in the way we write our code, however, and that editor is the bash command line prompt.
The first part of this series focuses on that versatile editor and the magic you can weave by combining fundamental concepts of UNIX with a healthy disregard for public safety. The second part will specialize a bit and focus on using what you learn in the first article when combined with the ubiquitous system administrator survival knife of a language, perl. The goal, however, isn't just to walk on the wrong side of the tracks and live to tell the tale; quite the opposite. The goal is to become more efficient, solve problems that otherwise would be very time consuming, and maybe, just maybe, impress people while we're at it. After all, any "sufficiently advanced technology is indistinguishable from magic.1"
After all, any "sufficiently advanced technology is indistinguishable from magic."
The players
Before we dive into those specific facilities, though, we should discuss
what bash brings to the party, because if perl is the magic we brew here,
then bash is the cauldron we brew it in. Central to all UNIX shells (and
even non-UNIX shells, though to a lesser extent) is the ability to take
the output of one command and save it to a file or to send it as input to
another program—that is, input and output redirection. Nearly every
trick we use here involves redirection through one or more pipes. Simple
things like grep https /etc/services | grep -v udp show
the power of how using grep twice in a row is much
simpler than coming up with a possibly complex regular expression to
achieve the same result—print every line of
/etc/services that contains the string
https but not the string
udp.
bash isn't just about running external programs, however. Built into bash
is as full-featured a list of programming constructs as you'd expect in
any language—conditionals such as if and
case as well as variables (even arrays) and iteration
constructs such as for and while.
bash even documents itself, via the help
command—if at any time you wonder the syntax of any of bash's
commands such as whether if clauses end with
fi or endif, all you need do is
invoke help with that command as a
parameter—help if.
For example, a very common task is perform a set of operations on a number
of files. Quite often, if that operation is simple, the facility may
already be there—rm * for instance removes every
file in the current directory. However, if your operation is more
complex, such as 'delete all of the symlinks in this directory' or 'delete
every file containing the word "violet" in the current directory' then
chances are no single command will solve that problem.
Using bash constructs such as for and
if, though, make this easy. Take the first example,
deleting all of the symlinks in the current directory. Quite simple, with
bash:
for FILE in *
do
if [ -l $FILE ]
then
rm $FILE
fi
done
In English, this just means: Iterate over every file in the current directory, assigning the name of the file to the variable FILE. If that file is a symlink, then remove the file.
Perhaps the least obvious part of this construct is the if
[ -l $FILE ] statement. Contrary to many languages, the
[ and ]
are not grouping like parentheses; instead,
[ is actually the name of a bash built-in
function, and the ] is just there for
decoration. The [ command is the same as
the test command, which performs a variety of tests
such as string equality, file existence, and, in this case, whether a
given file is a symlink or not. The full list of operations can be seen
via help test—definitely worth a read to see just
how many checks that may take many lines in other programs are quite
simple with bash.
Sometimes you will see the above shortened into this more compact form:
for FILE in *
do
[ -l $FILE ] && rm $FILE
done
or even:
for i in *; do [ -l $i ] && rm $i; done
Both of these are simply making the statement more compact. In the first
case, the && operates much like it would
in C or Perl—if the first condition is true (if the file is a
symlink) then evaluate the second condition (remove the file). Also like
C and Perl, though, is that bash won't evaluate the right hand side of the
&& (or a
||) if it doesn't have to. So if the
conditional check fails, it won't hit the rm command.
The third form simply changes FILE to
i (a common iteration variable) and crams
it all onto one line. Note the placement of semicolons—they are
crucial, else bash won't consider the command well-formed.
Stringing commands together
Another common bash construct you will see is inline expansion. Simply put, this runs the given command and places the output inside the current command. For instance:
echo "The current time is: $(date)"
displays something to the effect of:
The current time is: Thu Feb 3 20:50:35 EST 2005
You also often see the so-called backtick operator:
echo "The current time is: `date`"
The two forms basically do the same thing; however, the
$() form allows for nesting and is
considerably easier to read, and is encouraged over the `` form.
Since this expansion can occur at any point, it can be used in a
for statement. The basic syntax for the
for statement is for VARIABLE in LIST; do
COMMANDS; done where LIST is a space
separated list of values. Here is an example of how to create ten files,
1.txt through 10.txt:
for i in $(seq 1 10); do touch $i.txt; done
That creates 10 files named 1.txt through
10.txt each of which is empty. This is a quick and
easy way to repeat a command N times as well (there is no requirement that
the variable used for iteration be referenced in the command).
Another use of $() is to provide a list
of files to another command. For instance, to see how many lines each of
the text files in the current directory that contain the string
fedora contain, you could simply use:
wc -l $(grep -l fedora *.txt)
This introduces the grep command.
grep is a standard utility and not a bash built-in
command, and it is one of the most important commands you will use when
doing complex scripting. Basically grep searches
files for a given string or pattern. The invocation is
simple—grep PATTERN FILE [ FILE ... ].
PATTERN is a regular expression and can be rather
complex, but for our purposes here, the pattern
fedora simply matches the literal string
fedora. Usually
grep prints both the file and the matching line, but
in this case, we pass the -l option, which tells it
simply to print the filename—quite useful when wanting to operate
on files that contain a pattern.
There is a dirty secret to this, though: like most things when it comes to computers, there is a limit. In this case, the limit is on
how large the kernel allows a single command line to be. Although the
default is quite spacious for normal editing, using operations like
$() and even just normal wildcard expansion can run
past that limit very quickly. If there were, say, 5000 files in the
current directory, and they all contained
fedora, then we could run out of space on
the command line. There is an answer, though, and it, like
grep, isn't part of bash but it is exceedingly useful.
That command is xargs. Although an entire article
could likely be written on xargs alone, in a nutshell,
xargs is simply a way to do $()
without worrying about command line size limits. For example, the
previous grep -l fedora *.txt would transform into:
find . -name '*.txt' -maxdepth 1 | xargs grep -l fedora
Whew, that got more complicated. For the moment, ignore the
find bit and pretend it just lists all of the
.txt files in the current directory and prints them
to stdout (which is the same thing ls *.txt would do,
but remember we can't do *.txt because there are too
many files; find gets around this by being told the
pattern, in quotes, so the shell won't expand it and parsing the pattern
on its own). Next comes xargs, followed by the command
we want to run. That part, at least, is fairly simple, but what is going
on under the hood is a bit more involved.
What xargs does is read from stdin then construct a
command to execute. The parameters to xargs determine
what the command starts with. In other words, it
begins a command with grep -l fedora and then begins
tacking on everything it reads from stdin. It knows the limit on the
command line, though, so once it packs as many parameters as it can up
until the point where the next parameter would, it executes the command.
Once the command completes, it begins again. It repeats this, executing
the command as many times as necessary to process the input.
But we still don't have the line counts we were looking for—all we
have is a list of files containing
fedora. Ah-hah! That list is the output
of the xargs command. We can take that output and make
it the input of the wc command, once again using
xargs:
find . -name '*.txt' -maxdepth 1 | xargs grep -l fedora | xargs wc -l
There we go, just what we were looking for. It certainly became more complicated, but it also became more robust; it would work on any number of files, be it one or one million.
find
At first glance, the find command above was rather
complicated, not to mention fairly unlike most commands. In particular,
the parameters were specified with one dash, not two. However,
idiosyncrasies aside, find is a tremendously useful
command. Unlike bash and perl, of which there is only one implementation
for each, it is one of those tools that has major variants depending on
your UNIX of choice. The variants divide into two categories,
though— GNU find and everyone else's. GNU
find lets you get away with some laziness, and this
article uses GNU find syntax, but be aware it may not
transition exactly the same to other UNIX-like operating systems.
As the name suggest, find is good for finding things.
In this case, though, things turn out to be files.
find excels at very selectively finding files matching
some set of rules. One particularly useful feature of
find is that it, by default, recurses into
subdirectories. That means:
find /tmp -name '*.txt'
locates all .txt files in /tmp
and in all subdirectories of /tmp. Usually this is
what you want; quite often, you will find yourself working with entire
trees. Sometimes, though, you only want files in the current directory,
or those at most one directory deep. That is where the
-maxdepth option that appeared in a previous example
comes in; basically it is limiting the depth that find
recurses.
Quite often, you will see find uses in conjunction with
xargs, not even considering the issues with maximum
command line lengths. It is not easy with bash to specify all of the
.txt files in the current subdirectory and below.
Sure, you could use *.txt */*.txt */*/*.txt but that
will only go three levels deep, and for a deeply nested set of
directories, that simply isn't enough.
There is a hidden trap, though. Suppose we want to compress all
.txt files in an entire subtree. Simple enough, using what
we've learned before:
find /path/to/tree -name '*.txt' | xargs gzip
The trap, though, is what if one of the files had a space in its name?
Whitespace is what xargs uses to delimit files in its
input. Therefore, it treats /foo/bar/a file.txt as
two arguments—/foo/bar/a and
file.txt. Certainly not what we want. As this is a
rather common problem, find and
xargs both come with what is needed to make our example
function properly—using null characters (ASCII 0) to delimit the
files, instead of whitespace.
find /path/to/tree -name '*.txt' -print0 | xargs -0 gzip
To see what is going on, try running the find command
alone; depending on your terminal, you will likely see what looks like one
rather huge single line with all of the filenames smashed together.
Actually, though, there is the hidden null between each filename. The
-0 parameter on xargs tells xargs
that nulls will delimit incoming filenames. Problem solved. In practice,
most files don't contain spaces, but when they occur, it is essential to
know the proper response (much like find and
xargs themselves being the proper response when you run
into command line length limits).
Another extremely useful way find can sift through
files is to find files created or modified recently. Often you want to
know what has changed recently. For instance, to list all of the files in
your home directory that changed within the past two days:
find ~/ -mtime -2
To find the files that haven't been modified in the past two days, you can
change the -mtime parameter:
find ~/ -mtime +2
You can also select files by the last time they were accessed (atime) or
created (ctime). Like bash's test command,
find has a wide variety of options; reading the manpage
is advised (not just for reference, either; it will give you an idea of
the flexibility of this peculiar command).
Sometimes output isn't in the order you want it. For instance,
'find' doesn't print in alphabetical (or, more
accurately, lexicographical) order. Likewise, du
doesn't display files largest to smallest (or vice versa). Instead, we
must use another command to sort such output—a command appropriately
named 'sort'. Effectively, sort can take any sized input (subject to
local disk space) and sort it numerically or lexicographically by any
position in the string (not just starting at the first character of each
line). For example, suppose you want to find the files with the most
lines inside a directory tree:
find /usr | xargs wc -l | sort -n
Here we see our friends find and
xargs; what they do in this case should be fairly clear
this time. Next, though, comes the sort; referring to the output of
du, we know it produces, by default, the size in
kilobytes of the file, then the filename. sort will,
by default, begin sorting based on the first character of each line, and,
by default, it sorts lexicographically. The -n option,
though, tells it to sort numerically. The result here is we see the
smallest files first, all the way to the largest at the very bottom.
But, that certainly is a lot of output, especially if we just want to
know, say, the five longest files. Fortunately, there is a way to take
the output of a program and throw out all but the first or last few lines.
Respectively, those commands are head and
tail. Both operate basically the same way—they
read from stdin and only produce the first few lines or the last few
lines, respectively. So in this case, we would transform the command to:
find /usr | xargs wc -l | sort -n | tail
That would show the last ten lines of the output, or, in this case, the ten files with the most lines.
A taste of perl
The other big standalone program we care about is, of course, perl itself.
The perl executable is not only the binary we use to launch regular perl
scripts, but it also has a number of command line options that make it
very well suited for intermixing on the command line to filter or alter
the input and output of other programs. But first, we need to be able to
execute arbitrary perl, which is the very core of what we will do. This
is done via the -e operator:
perl -l -e 'print 1024 * 1024'
The result here is that we see what 1024 times 1024 is (the
-l, which we will almost always use, tells perl to
print a newline by default with every print statement; leave it off to see
what happens). In fact, I frequently just do a quick perl
-le instead of reaching for a calculator when I need a simple
calculation—it almost always is faster to drop to a command line
than start a separate application.
We finish with a small taste of what we will see in the next article. One of the most common changes one might want to make to a file or set of files is to replace one string with another. Certainly, you could open your favorite editor and do such a change to a single file, or even a handful, but what if you need to change dozens, or even hundreds? Thankfully, there is an easy way with a nice command line perl trick:
perl -p -i -e 's/XFree86/x.org/g' file1.txt file2.txt ...
Simply put, this replaces XFree86 any
time it occurs in the listed files with x.org. One can
easily imagine combining this with find and
xargs to change huge sets of files:
find /tmp/library -print0 | xargs -0 perl -p -i -e 's/XFree86/x.org/g'
But what does the -p and the -i
mean? Stay tuned, those questions and more will be answered in the second
part of this article.
Conclusion
Hopefully you now have a taste for some of the kinds of magic tricks that can be performed with creative use of bash and some of the more common utilities one finds in most UNIXes. This article is simply a set of building blocks that you should take and creatively reuse on your own. The second part will build upon this foundation and explore some more advanced tricks, including in-depth coverage of everything you need to become a command line deity.




