United States (change)
Shortcuts: Downloads Fedora Red Hat Network
Issue #5 March 2005
In part 1 of this article, a number of powerful techniques and commands were explored for abusing the command line for the sake of quickly manipulating and extracting information that we wanted to reach. That information was a good start, and it proved we could extract quite a bit of valuable information with relatively simple tools, but it was only the appetizer to our main course—Perl.
Perl has been called the Swiss-Army Knife in a variety of contexts, including sysadminery, and it is a title that fits. On the whole, Perl is an exceedingly versatile language that, unlike many languages, grew and evolved to fit a more practical need than to satisfy academic curiosities. In particular, it originally came about as a tool for system administrators, particularly as a more robust and powerful supplement and replacement for tools like Sed and Awk. This type of evolution has its upsides and downsides. On the upside, it means that you end up with a very powerful and flexible language; on the downside, it sometimes aims to please too many audiences and lacks some of the rigor more traditional languages provide.
It is that flexibility we aim to exploit here, though, making Perl an excellent choice. The concepts we use here, though, will be applicable to a class of other languages. While you likely will never use Java for command line trickery (bless your heart if you try), other languages are somewhat well suited. In particular, Ruby, a relatively new language, even goes so far as to mimic some of Perl's command line options. But for us, we focus on Perl.
At the conclusion of our previous article, an invocation of Perl was offered, but with no explanation of how it worked. That command was:
find /tmp/library -print0 | xargs -0 perl -p -i -e 's/XFree86/x.org/g'
The find and xargs are already
familiar to us, as is the -e option to
perl. What was left unexplained were the
-p and the -i options. Let us begin
with -p. Simply put, -p tells Perl
to execute whatever you tell it (either a script or, more commonly, an
expression supplied with -e) on each and every line of
input, and then print out the $_
variable. Let us examine a simpler version:
seq 1 10 | perl -p -e '$_ = "prefix: $_"'
The seq command simply provides the numbers 1 through
10 as standard output which gets directed to Perl who then begins reading
each line and executing what is specified by -e. In
this case, we overwrite the variable $_ with a new
value, "prefix: $_"—in other words, we put
"prefix: " in front of it. The resulting output is:
prefix: 1 prefix: 2 prefix: 3 ... snip ... prefix: 10
Simple, but important, so let's see another example, one that is perhaps a
bit more practical. In this one, we will process
/etc/password, removing everything after (and
including) the first colon:
perl -p -e 's/:.*//' /etc/passwd
This command displays the usernames of all users on your system. In this
case, the expression is 's/:.*//'—a regular
expression substitution. In Perl, the $_ is a magical
variable that many things use if not given another value. In this case,
when using the -p operator, it is set to be the entire
current line of input.
By itself, the -p operator is useful for transforming
streams of data, but when combined with -i, it is half
of one of the most powerful command line tricks in any arsenal. In
effect, -i means to perform changes in place. So
instead of -p printing to standard out, the effect is
that it reads a line from a file, evaluates the given expression on the
line, and prints the line back to the original file. This mechanic is the
core of a number of exceedingly powerful time saving tricks.
We now have the requisite pieces to determine what the command line from the previous article did. Revisiting:
find /tmp/library -print0 | xargs -0 perl -p -i -e 's/XFree86/x.org/g'
We know that the find/xargs
combination feeds each file found to the Perl command perl -p -i
-e 's/XFree86/x.org/g'. We also know that the
-p and -i mean that the given
expression is evaluated for each line and that line is replaced with the
results. The expression in this case replaces the string
XFree86 with x.org. The ultimate
result is that any file in /tmp/library will have
XFree86 replaced with x.org.
The -p operator has a cousin that acts very similar,
but instead of printing, it executes the expression. That cousin is the
-n operator, and it is somewhat useful on its own but
more useful in conjunction with other operators we shall see shortly. It
still has a use, though, with the operators we have seen so far:
perl -n -e '@fields = split /:/; print "$fields[0]\n"' /etc/passwd
This, much like one of the earlier examples, prints all of the users
listed in /etc/passwd. If you think about it a
moment, though, it is very common when working with Linux that you
encounter files, like /etc/passwd, which are
delimited by a common character—a colon in this case. So common, in
fact, that Perl offers you a shortcut. Let us consider the simplest
example—a file with fields delimited by whitespace. One such file
is a typical Apache access_log file:
172.31.29.101 - - [04/Jan/2005:21:56:44 -0500] "GET /favicon.ico HTTP/1.1" 404 291 "-" "Mozilla/5.0"
That's a lot of information! Sometimes we only want some subset of it.
For instance, suppose we want a list of the actual requested URLs.
Counting whitespace, that is the seventh field. We can get at those
fields by combining the -n parameter with
-a, which instructs Perl to automatically split each
line. Now inside of the evaluated expression you can not only access the
entire line via $_ but you can also access each field
through the @F array (counterintuitive, I know). In
this case, the @F array contains:
$F[0] = '172.31.29.101' $F[1] = '-' $F[2] = '-' $F[3] = '[04/Jan/2005:21:56:44' $F[4] = '-0500]' $F[5] = '"GET' $F[6] = '/favicon.ico' $F[7] = 'HTTP/1.1"' $F[8] = '404' $F[9] = '291' $F[10] = '"-"' $F[11] = '"Mozilla/5.0"'
So now to get a list of all of the URLs that were hit:
perl -l -a -n -e 'print $F[6]' /var/log/httpd/access_log
The -l and -e are familiar; all that
is new is the -a working in conjunction with
-n.
The last example begins to show us the power of combining tools together. Let's build on that example and do some log file exploration.
What are my website's most popular pages?
cat /var/log/httpd/access_log | perl -l -a -n -e 'print $F[6]' | sort | uniq -c | sort -n | tail -10
This builds on the previous example; all that is new are the sequences of
sort, uniq, and
tail. The first sort is merely to
prep for uniq (uniq expects its
input to already be sorted). In this case, though, we're not asking
uniq just for the uniq
lines—asking for the number of times each line occurred with the
-c parameter. The format of uniq's
output in this case is:
18 /robots.txt 37 /favicon.ico
The first number is the count; the second is the line. We then pipe this
back into sort, this time sorting
numerically, then we use tail to extract the last ten
lines (which in this case are the top ten most popular pages).
Who is attacking my web server?
cat /var/log/httpd/access_log | perl -l -a -n -e 'print $F[0]' | sort | uniq -c | sort -n | tail -10
This is exactly the same as the previous example, except instead of the sixth field, we are extracting the zeroth field, which, in this case, is the requesting IP address.
You need to make changes to a large number of files in-place.
find -type f -name '*.txt' | xargs perl -p -i -e 's/PLACEHOLDER/new_value/g'
The key to this example is we use find to select the
files and xargs to apply them to Perl. This is just an
extension of what we saw before with xargs, but now we
use it with Perl to make large amounts of changes, far more than might fit
on a single command line. The alternative would be to load your favorite
editor (Emacs, of course), maybe make a macro, and execute it on each
file... something you may not have time for if you need to change hundreds
of files. Not to mention, much more error prone.
You suspect there may be multiple users with the same numeric uid in /etc/passwd.
perl -F: -l -a -n -e 'print $F[2]' /etc/passwd | sort | uniq -d
The new concept here is the -F: parameter. This just
tells Perl that, instead of splitting on whitespace, split on colons (or,
in general, any string or regular expression immediately following the
-F parameter). In /etc/passwd,
the user's numeric id is the third field (which is
$F[2]. Remember, engineers count from zero!). We then
sort it in preparation to pass to
uniq, where we use the -d switch
(which prints only duplicates; uniq certainly is
versatile).
You want to know how much data your webserver transferred today.
cat /var/log/httpd/access_log | perl -l -a -n -e '$n += $F[9]; } END { print $n'
Things are starting to get weird. The parameters to Perl are all ones we
know, but the expression is peculiar. In fact, it's nearly unreadable
unless you already know how it works. How it works, though, is a bit odd.
In effect, the -n operator has Perl do this for you:
while (<>) {
# perl's magic to split $_ into @F
# code specified by -e goes here
}
Basically Perl just places what is specified with -e
into a while loop. So in this case, let's perform the substitution
manually:
while (<>) {
# perl's magic to split $_ into @F
$n += $F[9]; } END { print $n
}
Adjusting formatting and we have...
while (<>) {
# perl's magic to split $_ into @F
$n += $F[9];
}
END {
print $n
}
Ah, now it gets to be a little clearer. In Perl, a block specified via
END { ... } will be executed when the Perl interpreter exits. So in this
case, for each line of the input, Perl adds the tenth field to a variable
called $n. When the interpreter exits, we print that
value.
This is a pretty odd one, but it turns out it is extremely useful. Different than before, where we just manipulated each line, we instead accumulate information and display it at the end.
You have a logfile that has timestamps in epoch time but your brain doesn't read Unix epoch time. An example line is:
1104903117 0.3
cat /tmp/weirdlog | perl -l -a -n -e 'print scalar(localtime $F[0]), " @F[1..$#F]"'
Sometimes, log files have epoch time instead of human readable time. That's okay, considering how complicated timezones and locales can be, but it makes it hard on our brain's wetware to process effectively. So, we resort to another bit of magic. In this case, we see a few new Perl constructs but nothing new on the way we invoke Perl.
The first construct is turning $F[0], which contains a
number like 1104903117, into a human readable date. This is accomplished
by the localtime function.
localtime can return either an array of values
containing various parts of the date such as the month and year or it can
return a nice string representation; we force that string representation
with the scalar function.
The other construct is " @F[1..$#F]". This is
basically the same as "@F" except we only get entries
1, 2, 3, etc, through $#F. $#F is
Perl's way of saying :the last valid index in the @F array." Another way
of accomplishing this would be:
cat /tmp/weirdlog | perl -l -a -n -e '$F[0] = scalar localtime $F[0]; print "@F"'
Same idea, except we just change @F in place and then
print it. One key thing to remember is we aren't using
-i, so we won't actually change
/tmp/weirdlog (though in this case, the use of
cat protects us from that as well; even with
-i, Perl can only change files in place if they were on
the command line and not if they were streamed to it via stdin).
Hopefully you now have a taste for some of the kinds of magic tricks that
can be performed with creative use of Perl and bash. These articles
present a set of building blocks that you should take and creatively reuse
on your own. Next time you find yourself editing a file and making a
large number of the same changes or trying to string together
grep and sed and other commands in
painful ways, think about how you might solve the problem just by bringing
Perl into the mix.
A word of caution, though; with great power comes great responsibility, and the techniques presented here are indeed powerful. One of the main reasons we can get away with such unreadable, arguably ugly code is that no one else will ever read it. We aren't saving it to a file for reuse, and it isn't like someone else will take our bash history and extend it. The corollary of that, though, is that if you are writing a script, you might want to shy away from overly clever constructs like some of what we've seen here. The only thing worse than saving one of these constructs and having to debug it six months later is running across one you didn't write and having to debug it six months later! Have compassion for your fellow coworkers when it comes to scripting.
One of the great things about using a tool like Perl in this case is that everything you learn about Perl, you can apply both in larger, more formal scripts as well as on the command line. In effect, you double the use of things you learn which is a very good thing. The 'scalar localtime' idiom from the last example, for instance, is very common in non-command line Perl, but it ports right over from more regular scripting to command line.
One of the best ways to learn more creative command line magic is to learn
more about Perl itself. There are a variety of resources for this, but
assuming you know basic Perl, then the Perl
Cookbook is a wonderful resource well-suited for tricks like
the ones demonstrated in this article. Likewise, the Perl man pages
(which are excellent in quality) provide quite a bit of useful
information, most especially man perlrun which
documents the various command line switches used in this article.