Skip to main content

A beginner's guide to gawk

The gawk command is a standard sysadmin tool. Learn to use it to extract information from files and your system.
Image
Beginner's guide to awk
Image by Free-Photos from Pixabay

gawk is the GNU implementation of the Awk programming language, first developed for the UNIX operating system in the 1970s. The Awk programming language specializes in dealing with data formatting in text files, particularly text data organized in columns.

Using the Awk programming language, you can manipulate or extract data, generate reports, match patterns, perform calculations, and more, with great flexibility. Awk allows you to accomplish somewhat difficult tasks with a single line of code. To achieve the same results using traditional programming languages such as C or Python would require additional effort and many lines of code.

gawk also refers to the command-line utility available by default with most Linux distributions. Most distributions also provide a symbolic link for awk pointing to gawk. For simplicity, from now on, we'll refer to the utility only as awk.

awk processes data straight from standard input - STDIN. A common pattern is to pipe the output of other programs into awk to extract and print data, but awk can also process data from files.

In this article, you'll use awk to analyze data from a file with space-separated columns. Let's start by reviewing the sample data.

Example data

For the examples in this guide, let's use the output of the command ps ux saved in the file psux.out. Here's a sample of the data in the file:

$ head psux.out
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
ricardo     1446  0.0  0.2  21644 11536 ?        Ss   Sep10   0:00 /usr/lib/systemd/systemd --user
ricardo     1448  0.0  0.1  49212  5848 ?        S    Sep10   0:00 (sd-pam)
ricardo     1459  0.0  0.1 447560  7148 ?        Sl   Sep10   0:00 /usr/bin/gnome-keyring-daemon --daemonize --login
ricardo     1467  0.0  0.1 369144  6080 tty2     Ssl+ Sep10   0:00 /usr/libexec/gdm-wayland-session /usr/bin/gnome-session
ricardo     1469  0.0  0.1 277692  4112 ?        Ss   Sep10   0:00 /usr/bin/dbus-broker-launch --scope user
ricardo     1471  0.0  0.1   6836  4408 ?        S    Sep10   0:00 dbus-broker --log 4 --controller 11 --machine-id 16355057c7274843823dd747f8e2978b --max-bytes 100000000000000 --max-fds 25000000000000 --max-matches 5000000000
ricardo     1474  0.0  0.3 467744 14132 tty2     Sl+  Sep10   0:00 /usr/libexec/gnome-session-binary
ricardo     1531  0.0  0.1 297456  4280 ?        Ssl  Sep10   0:00 /usr/libexec/gnome-session-ctl --monitor
ricardo     1532  0.0  0.3 1230908 12920 ?       S<sl Sep10   0:01 /usr/bin/pulseaudio --daemonize=no

You can download the complete file from here, using this command:

$ curl -o psux.out https://gitlab.com/-/snippets/2013935/raw\?inline\=false

If you decide to use the output of ps ux on your system, adjust the values shown in the examples to match your results.

Next, let's use awk to view data from the sample file.

Basic usage

A basic awk program consists of a pattern followed by an action enclosed in curly braces. You can provide a program to the awk utility inline by enclosing it in single quotation marks, like this:

$ awk 'pattern { action }'

awk processes the input data—standard input or file—line by line, executing the given action for each line—or record—that matches the pattern. If the pattern is omitted, awk executes the action on all records. An action can be as simple as printing data from the line or as complex as a full program. For example, to print all lines from the example file, use this command:

$ awk '{ print }' psux.out
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
ricardo     1446  0.0  0.2  21644 11536 ?        Ss   Sep10   0:00 /usr/lib/systemd/systemd --user
.... OUTPUT TRUNCATED ....

While this example is not really useful, it illustrates the awk command's basic utilization.

If you're using the command ps ux on your machine, you can pipe its output directly into awk, instead of providing the input file name:

$ ps ux | awk '{ print }'

Next, let's use awk column processing capabilities to extract part of the data from the sample file.

Printing fields

The power of awk starts to become evident when you use its column processing features. awk automatically splits each line—or record—into fields. By default, it uses the space character to separate each field, but you can change that by providing the command line parameter -F followed by the desired separator.

After splitting, awk assigns each field to a numbered variable, starting with the character $. For example, the first field is $1, the second $2, and so on. The special variable $0 contains the entire record before splitting.

By using the field variables, you can extract data from the input. For example, to print only the command name from the sample file, use the variable $11 because the command name is the eleventh column on each line:

$ awk '{ print $11 }' psux.out
COMMAND
/usr/lib/systemd/systemd
(sd-pam)
/usr/bin/gnome-keyring-daemon
.... OUTPUT TRUNCATED ....

You can also print multiple fields by separating them with commas. For example, to print the command name and the CPU utilization on column three, use this command:

$ awk '{ print $11, $3 }' psux.out
COMMAND %CPU
/usr/lib/systemd/systemd 0.0
(sd-pam) 0.0
/usr/bin/gnome-keyring-daemon 0.0
.... OUTPUT TRUNCATED ....

Finally, use the built-in printf function to format the output and align the columns. Provide a 40 character padding to the right of first columns to accommodate longer command names:

$ awk '{ printf("%-40s %s\n", $11, $3) }' psux.out
COMMAND                                  %CPU
/usr/lib/systemd/systemd                 0.0
(sd-pam)                                 0.0
/usr/bin/gnome-keyring-daemon            0.0
/usr/libexec/gdm-wayland-session         0.0
.... OUTPUT TRUNCATED ....

Now that you can manipulate and extract individual fields from each record, let's apply the pattern feature to filter the records.

[ You might also like: Manipulating text at the command line with sed ]

Pattern matching

In addition to manipulating fields, awk allows you to filter which records to execute actions on through a powerful pattern matching feature. In its most basic usage, provide a regular expression enclosed by slash / characters to match records. For example, to filter by records that match firefox, use /firefox/:

$ awk '/firefox/ { print $11, $3 }' psux.out
/usr/lib64/firefox/firefox 66.2
/usr/lib64/firefox/firefox 8.3
/usr/lib64/firefox/firefox 15.6
/usr/lib64/firefox/firefox 9.0
/usr/lib64/firefox/firefox 31.5
/usr/lib64/firefox/firefox 20.6
/usr/lib64/firefox/firefox 31.0
/usr/lib64/firefox/firefox 0.0
/usr/lib64/firefox/firefox 0.0
/usr/lib64/firefox/firefox 0.0
/usr/lib64/firefox/firefox 0.0
/usr/lib64/firefox/firefox 0.0
/usr/lib64/firefox/firefox 0.0

You can also use fields and a comparison expression as pattern matching criteria. For example, to print data from the process that matches PID 6685, compare field $2, like this:

$ awk '$2==6685 { print $11, $3 }' psux.out
/usr/lib64/firefox/firefox 0.0

awk is smart enough to understand numeric fields, allowing you to use relative comparisons like greater than or less than. For example, to show all process that use over 5% CPU, use $3 > 5:

$ awk '$3 > 5 { print $11, $3 }' psux.out
/usr/bin/gnome-shell 5.1
/usr/lib64/firefox/firefox 66.2
/usr/lib64/firefox/firefox 8.3
/usr/lib64/firefox/firefox 15.6
/usr/lib64/firefox/firefox 9.0
/usr/lib64/firefox/firefox 31.5
/usr/lib64/firefox/firefox 20.6
/usr/lib64/firefox/firefox 31.0

You can combine patterns with operators. For example, to show all processes that match firefox and use over 5% CPU, combine both patterns with the && operator for a logical AND:

$ awk '/firefox/ && $3 > 5 { print $11, $3 }' psux.out
/usr/lib64/firefox/firefox 66.2
/usr/lib64/firefox/firefox 8.3
/usr/lib64/firefox/firefox 15.6
/usr/lib64/firefox/firefox 9.0
/usr/lib64/firefox/firefox 31.5
/usr/lib64/firefox/firefox 20.6
/usr/lib64/firefox/firefox 31.0

Finally, because you're using pattern matching, awk no longer prints the header line. You can add your own header line by using the BEGIN pattern to execute a single action before processing any records:

$ awk 'BEGIN { printf("%-26s %s\n", "Command", "CPU%")} $3 > 10 { print $11, $3 }' psux.out
Command                    CPU%
/usr/lib64/firefox/firefox 66.2
/usr/lib64/firefox/firefox 15.6
/usr/lib64/firefox/firefox 31.5
/usr/lib64/firefox/firefox 20.6
/usr/lib64/firefox/firefox 31.0

Next, let's manipulate the data in individual fields.

Field manipulation

As we discussed in the previous section, awk understands numeric fields. This allows you to perform data manipulation, including numeric calculations. For example, consider printing the memory utilization on column six for all firefox processes:

$ awk '/firefox/ { print $11, $6 }' psux.out
/usr/lib64/firefox/firefox 301212
/usr/lib64/firefox/firefox 118220
/usr/lib64/firefox/firefox 168468
/usr/lib64/firefox/firefox 101520
/usr/lib64/firefox/firefox 194336
/usr/lib64/firefox/firefox 111864
/usr/lib64/firefox/firefox 163440
/usr/lib64/firefox/firefox 38496
/usr/lib64/firefox/firefox 174636
/usr/lib64/firefox/firefox 37264
/usr/lib64/firefox/firefox 30608
/usr/lib64/firefox/firefox 174636
/usr/lib64/firefox/firefox 174660

The command ps ux displays the memory utilization in Kilobytes, which is hard to read. Let's convert it to Megabytes by diving the field value by 1024:

$ awk '/firefox/ { print $11, $6/1024 }' psux.out
/usr/lib64/firefox/firefox 294.152
/usr/lib64/firefox/firefox 115.449
/usr/lib64/firefox/firefox 164.52
/usr/lib64/firefox/firefox 99.1406
/usr/lib64/firefox/firefox 189.781
/usr/lib64/firefox/firefox 109.242
/usr/lib64/firefox/firefox 159.609
/usr/lib64/firefox/firefox 37.5938
/usr/lib64/firefox/firefox 170.543
/usr/lib64/firefox/firefox 36.3906
/usr/lib64/firefox/firefox 29.8906
/usr/lib64/firefox/firefox 170.543
/usr/lib64/firefox/firefox 170.566

You can also round numbers up and add the suffix MB using printf to improve readability:

$ awk '/firefox/ { printf("%s %4.0f MB\n", $11, $6/1024) }' psux.out
/usr/lib64/firefox/firefox  294 MB
/usr/lib64/firefox/firefox  115 MB
/usr/lib64/firefox/firefox  165 MB
/usr/lib64/firefox/firefox   99 MB
/usr/lib64/firefox/firefox  190 MB
/usr/lib64/firefox/firefox  109 MB
/usr/lib64/firefox/firefox  160 MB
/usr/lib64/firefox/firefox   38 MB
/usr/lib64/firefox/firefox  171 MB
/usr/lib64/firefox/firefox   36 MB
/usr/lib64/firefox/firefox   30 MB
/usr/lib64/firefox/firefox  171 MB
/usr/lib64/firefox/firefox  171 MB

Finally, combine this idea with the BEGIN and END patterns to perform more advanced data manipulation. For example, let's calculate the total memory utilization for all firefox processes by defining a variable sum in the BEGIN action, adding the value of column six $6 for each line that matches firefox to the sum variable, and then printing it out with the END action in Megabytes:

$ awk 'BEGIN { sum=0 } /firefox/ { sum+=$6 } END { printf("Total Firefox memory: %.0f MB\n", sum/1024) }' psux.out
Total Firefox memory: 1747 MB

[ Download now: A sysadmin's guide to Bash scripting. ] 

What's next?

gawk is a powerful and flexible tool to process text data, particularly data arranged in columns. This article provided a few useful examples of using this tool to extract and manipulate data, but gawk can do much more. For additional information about gawk, consult the manual pages in your Linux distribution.

The Awk language has many more resources than what we explored in this guide. For detailed information about it, consult the official GNU Awk User's Guide.

Check out these related articles on Enable Sysadmin

Topics:   Linux   Command line utilities  
Author’s photo

Ricardo Gerardi

Ricardo Gerardi is a Senior Consultant at Red Hat Canada where he specializes in IT automation with Ansible and Openshift. More about me

Related Content

OUR BEST CONTENT, DELIVERED TO YOUR INBOX