3 must-know Linux commands for text manipulation
Sysadmins use an untold number of command-line tools, and you probably regularly use the three discussed in this article: grep
, sed
, and awk
. But do you know all the ways you can use them to manipulate text? If not (or you're not sure), continue reading.
Before I get started, here are the origins of the commands' names:
grep
: According to Wikipedia, the name "comes from theed
command g/re/p (globally search for a regular expression and print matching lines), which has the same effect."ed
is a "line-oriented text editor." Even for someone who likes the command line, editing files line-by-line seems too old-fashioned, but people had to start with something in ancient times ).sed
: The name comes from its main use, as a stream editor.awk
: Its name comes from its authors' initials (Aho, Weinberger, and Kernighan). If the name Kernighan rings any bells (pun intended) for you, it is because this Canadian computer scientist contributed to the creation of Unix and co-authored the first book about the C language.
It's excellent to trace the commands' genealogical tree, but what really matters is that these commands are pretty helpful for text manipulation.
In the following examples, I will use a file named quotes.txt
to illustrate how to use the commands. Here are the contents of this file:
$ cat quotes.txt
"God does not play dice with the universe."
- Albert Einstein, The Born-Einstein Letters 1916-55
"Not only does God play dice but... he sometimes throws them where they cannot be seen."
- Stephen Hawking
"I regard consciousness as fundamental..."
- Max Planck
"The cosmos is within us. We are made of star-stuff. We are a way for the universe to know itself."
- Carl Sagan
"[T]he atoms or elementary particles themselves are not real; they form a world of potentialities or possibilities rather than one of things or facts."
- Werner Heisenberg
grep
The simplest way to use grep
is:
$ grep universe quotes.txt
"God does not play dice with the universe."
"The cosmos is within us. We are made of star-stuff. We are a way for the universe to know itself."
This example provides the string to search for (universe) and the place to look for it (quotes.txt).
If there are spaces in the string you want to search, you must put quotes around it:
$ grep "the universe" quotes.txt
"God does not play dice with the universe."
"The cosmos is within us. We are made of star-stuff. We are a way for the universe to know itself."
Some common variations when using grep
are:
- Ignore case:
grep -i string-to-search filename
- Search in multiple files:
grep -i string-to-search *.txt
You can search for a regular expression:
$ grep "191[0-9]" quotes.txt
- Albert Einstein, The Born-Einstein Letters 1916-55
If you want to enable extended regexp patterns to use symbols like +
, ?
, or |
, you can use the egrep
command, which is a shortcut for adding the -E
flag to grep
. This also enables you to search for multiple strings:
$ egrep -i "albe|hawk" quotes.txt
- Albert Einstein, The Born-Einstein Letters 1916-55
- Stephen Hawking
To show lines that include the word "universe" plus the next line (in order to include the author's name):
$ grep -i universe -A 1 quotes.txt
"God does not play dice with the universe."
- Albert Einstein, The Born-Einstein Letters 1916-55
--
"The cosmos is within us. We are made of star-stuff. We are a way for the universe to know itself."
- Carl Sagan
As you can probably guess, you could display more lines by passing a different number. Or you could show the lines before by using the flag -B
.
So far, I've showed grep
running alone, but it is very common to have it in a chain of commands:
$ echo "Authors who mentioned 'universe'"; cat quotes.txt | grep -i universe -A 1 | grep "^-"
- Albert Einstein, The Born-Einstein Letters 1916-55
- Carl Sagan
[ You might also be interested in reading 11 Linux commands I can't live without. ]
sed
My favorite use for sed
is to replace strings in files. For example:
$ cat quotes.txt | sed 's/universe/Universe/g'
This will replace universe
with Universe
and send the result to stdout. The g
flag means "replace all occurrences of the string in each line."
Some variations for this are:
- Replace the string only if it's found in the first three lines:
sed '1,3 s/universe/Universe/g' quotes.txt
- Replace the n-th occurrence of a pattern in a line (for example, the second occurrence):
sed 's/universe/Universe/2' quotes.txt
These examples don't change the original file. If you want sed
to change the file in place, use -i
:
$ sed -i 's/universe/Universe/g' quotes.txt
If you use the -i
flag, make sure that you know exactly what and how many occurrences will be affected, as it will modify the original file. To find out, you can run a grep
and search for the pattern first.
[ Want to test your sysadmin skills? Take a skills assessment today. ]
awk
The awk
utility is very powerful, offering many options for processing text files.
Most of the situations where I use awk
involve processing files with a structure (columns) that is reasonably predictable, including the character used as a column separator.
When awk
processes a file, it splits each line using the "field separator" (internal variable FS
, which by default is the space character). Each field is assigned to positional variables ($1
contains the first field, $2
contains the second, and so forth. $0
represents the full line).
You can also apply filters to each line. For example:
$ cat quotes.txt | awk '/universe/ { print NR " - " $0 }'
1 - "God does not play dice with the universe."
10 - "The cosmos is within us. We are made of star-stuff. We are a way for the universe to know itself."
The commands passed to awk
use single quotes (it is like passing a mini-program to be interpreted):
- The
/universe/
part tellsawk
to select only the lines that match this pattern. - The "main" program goes between the curly brackets.
NR
is the internal variable that contains the number of the current record, for example, the current line number.- I added the
" -"
string for aesthetics.
The internal variables in awk
are:
NR
: The total number of input records seen so far by the commandNF
: The number of fields in the current input recordFS
: The input field separator (a space by default)
Here is an example using a more "predictable" file format:
$ cat /etc/passwd | awk '/nologin/ { FS=":"; print $1 }'
(output omitted)
...
redis
akmods
cjdns
haproxy
systemd-oom
In this last example:
/nologin/
selects only the lines that contain this pattern.FS=": ";
sets the field separator to:
instead of the default (space).print $1
prints the first field in each line (considering that the separator is:
).
Learn more
Those were some simple examples for using grep
, sed
, and awk
.
If you read the man
pages for each, you will notice plenty of additional parameters and uses for these handy commands.
For simple use cases and things you do only once in a while, it is always good to have tools like these in your toolbox.
If the required action is more complex, it is worth considering if these tools still make sense for you to use. For a corporate use case or managing "everything-as-code," I recommend using Ansible. Ansible modules have similar features that let you emulate the operations described above, with the advantage that Ansible modules usually have idempotency and that the full process will be documented somewhere (such as in your internal Git repo).
Roberto Nozaki
Roberto Nozaki (RHCSA/RHCE/RHCA) is an Automation Principal Consultant at Red Hat Canada where he specializes in IT automation with Ansible. More about me