Introducing regular expressions

October 14, 20193-minute readLinux

We have all used file globbing with wildcard characters like * and ? as a means to select specific files or lines of data from a data stream. These tools are powerful and I use them many times a day. Yet, there are things that cannot be done with wildcards.

Regular expressions (regexes or REs) provide us with more complex and flexible pattern

matching capabilities. Just as certain characters take on special meaning when using file globbing, REs also have special characters. There are two main types of regular expressions (REs), Basic Regular Expressions (BREs) and Extended Regular Expressions (EREs).

The first thing we need are some definitions. There are many definitions for the term regular expressions, but many are dry and uninformative. Here are mine.

Regular Expressions are strings of literal and metacharacters that can be used as patterns by various Linux utilities to match strings of ASCII plain text data in a data stream. When a match occurs, it can be used to extract or eliminate a line of data from the stream, or to modify the matched string in some way.

Basic Regular Expressions (BREs) and Extended Regular Expressions (EREs) are not significantly different in terms of functionality. (See the grep info page’s Section 3.6, "Basic vs. Extended Regular Expressions.") The primary difference is in the syntax used and how metacharacters are specified. In basic regular expressions, the metacharacters ?, +, {, |, (, and ) lose their special meaning. Instead, it is necessary to use the backslashed versions: \?, \+, \{, \|, \(, and \). The ERE syntax is believed by many to be easier to use.

Note: When I talk about regular expressions, in a general sense I usually mean to include both basic and extended regular expressions. If there is a differentiation to be made I will use the acronyms BRE for basic regular expressions or ERE for extended regular expressions.

Regular expressions (REs) take the concept of using metacharacters to match patterns in data streams much further than file globbing, and give us even more control over the items we select from a data stream. REs are used by various tools to parse a data stream to match patterns of characters in order to perform some transformation on the data.

Note: One general meaning of parse is to examine something by studying its component parts. For our purposes, we parse a data stream to locate sequences of characters that match a specified pattern.

Regular expressions have a reputation for being obscure and arcane incantations that only those with special wizardly sysadmin powers use. This single line of code below (that I used to transform a file that was sent to me into a usable form) would seem to confirm this:

$ cat Experiment_6-1.txt | grep -v Team | grep -v "^\s*$" | sed -e "s/[Ll]eader//" -e "s/\[//g" -e "s/\]//g" -e "s/)//g" | awk '{print $1" "$2" <"$3">"}' > addresses.txt

This command pipeline appears to be an intractable sequence of meaningless gibberish to anyone without the knowledge of regex. It certainly seemed that way to me the first time I encountered something similar early in my career. As you will see, regexes are relatively simple once they are explained.

We can only begin to touch upon all of the possibilities opened to us by regular expressions in a single article (even in a single series). There are entire books devoted exclusively to regular expressions, so we will explore the basics in a series of articles here on Enable Sysadmin over the coming week. By the end, you will know just enough to get started with tasks common to sysadmins. Hopefully, you’ll be hungry to learn more on your own after that.

Note: This article is a slightly modified version of Chapter 6 from Volume 2 of my Linux book, Using and Administering Linux: Zero to SysAdmin, due out from Apress in late 2019.

About the author

David Both

David Both is an open source software and GNU/Linux advocate, trainer,
writer, and speaker who lives in Raleigh, NC. He is a strong
proponent of and evangelist for the "Linux Philosophy."

David has been in the IT industry for over 50 years. He has taught RHCE
classes for Red Hat and has worked at MCI Worldcom, Cisco, and the State
of North Carolina. He has been working with Linux and open source
software for over 20 years.

David likes to purchase the components and build his own computers from
scratch to ensure that each new computer meets his exacting
specifications. His primary workstation is an ASUS TUF X299 motherboard
and an Intel i9 CPU with 16 cores (32 CPUs) and 64GB of RAM in a
CoolerMaster MasterFrame 700.

David has written articles for magazines including Linux Magazine and
Linux Journal. His article "Complete Kickstart," co-authored with a
colleague at Cisco, was ranked 9th in the Linux Magazine Top Ten Best
System Administration Articles list for 2008. David currently writes
prolifically for OpenSource.com and Enable Sysadmin.

David currently has five books published with Apress, "The Linux
Philosophy for SysAdmins," a self-study training course in three
volumes "Using and Administering Linux: Zero to SysAdmin," that was
released in late 2019, and "Linux for Small Business Owners" with
co-author Cyndi Bulka.

David can be reached at LinuxGeek46@both.org or on Twitter @LinuxGeek46.

Browse by channel

Explore all channels

Introducing regular expressions

About the author

David Both

More like this

Browse by channel

Platforms

Tools

Try, buy, & sell

Communicate

About Red Hat

Change page language

Red Hat legal and privacy links

Red Hat legal and privacy links