We have all used file globbing with wildcard characters like * and ? as a means to select specific files or lines of data from a data stream. These tools are powerful and I use them many times a day. Yet, there are things that cannot be done with wildcards.
Regular expressions (regexes or REs) provide us with more complex and flexible pattern
matching capabilities. Just as certain characters take on special meaning when using file globbing, REs also have special characters. There are two main types of regular expressions (REs), Basic Regular Expressions (BREs) and Extended Regular Expressions (EREs).
The first thing we need are some definitions. There are many definitions for the term regular expressions, but many are dry and uninformative. Here are mine.
Regular Expressions are strings of literal and metacharacters that can be used as patterns by various Linux utilities to match strings of ASCII plain text data in a data stream. When a match occurs, it can be used to extract or eliminate a line of data from the stream, or to modify the matched string in some way.
Basic Regular Expressions (BREs) and Extended Regular Expressions (EREs) are not significantly different in terms of functionality. (See the grep info page’s Section 3.6, "Basic vs. Extended Regular Expressions.") The primary difference is in the syntax used and how metacharacters are specified. In basic regular expressions, the metacharacters ?, +, {, |, (, and ) lose their special meaning. Instead, it is necessary to use the backslashed versions: \?, \+, \{, \|, \(, and \). The ERE syntax is believed by many to be easier to use.
Note: When I talk about regular expressions, in a general sense I usually mean to include both basic and extended regular expressions. If there is a differentiation to be made I will use the acronyms BRE for basic regular expressions or ERE for extended regular expressions.
Regular expressions (REs) take the concept of using metacharacters to match patterns in data streams much further than file globbing, and give us even more control over the items we select from a data stream. REs are used by various tools to parse a data stream to match patterns of characters in order to perform some transformation on the data.
Note: One general meaning of parse is to examine something by studying its component parts. For our purposes, we parse a data stream to locate sequences of characters that match a specified pattern.
Regular expressions have a reputation for being obscure and arcane incantations that only those with special wizardly sysadmin powers use. This single line of code below (that I used to transform a file that was sent to me into a usable form) would seem to confirm this:
$ cat Experiment_6-1.txt | grep -v Team | grep -v "^\s*$" | sed -e "s/[Ll]eader//" -e "s/\[//g" -e "s/\]//g" -e "s/)//g" | awk '{print $1" "$2" <"$3">"}' > addresses.txt
This command pipeline appears to be an intractable sequence of meaningless gibberish to anyone without the knowledge of regex. It certainly seemed that way to me the first time I encountered something similar early in my career. As you will see, regexes are relatively simple once they are explained.
We can only begin to touch upon all of the possibilities opened to us by regular expressions in a single article (even in a single series). There are entire books devoted exclusively to regular expressions, so we will explore the basics in a series of articles here on Enable Sysadmin over the coming week. By the end, you will know just enough to get started with tasks common to sysadmins. Hopefully, you’ll be hungry to learn more on your own after that.
Note: This article is a slightly modified version of Chapter 6 from Volume 2 of my Linux book, Using and Administering Linux: Zero to SysAdmin, due out from Apress in late 2019.
저자 소개
David Both is an open source software and GNU/Linux advocate, trainer,
writer, and speaker who lives in Raleigh, NC. He is a strong
proponent of and evangelist for the "Linux Philosophy."
David has been in the IT industry for over 50 years. He has taught RHCE
classes for Red Hat and has worked at MCI Worldcom, Cisco, and the State
of North Carolina. He has been working with Linux and open source
software for over 20 years.
David likes to purchase the components and build his own computers from
scratch to ensure that each new computer meets his exacting
specifications. His primary workstation is an ASUS TUF X299 motherboard
and an Intel i9 CPU with 16 cores (32 CPUs) and 64GB of RAM in a
CoolerMaster MasterFrame 700.
David has written articles for magazines including Linux Magazine and
Linux Journal. His article "Complete Kickstart," co-authored with a
colleague at Cisco, was ranked 9th in the Linux Magazine Top Ten Best
System Administration Articles list for 2008. David currently writes
prolifically for OpenSource.com and Enable Sysadmin.
David currently has five books published with Apress, "The Linux
Philosophy for SysAdmins," a self-study training course in three
volumes "Using and Administering Linux: Zero to SysAdmin," that was
released in late 2019, and "Linux for Small Business Owners" with
co-author Cyndi Bulka.
David can be reached at LinuxGeek46@both.org or on Twitter @LinuxGeek46.
유사한 검색 결과
Behind the scenes of RHEL 10, part 3
Alliander modernises its electricity grid with Red Hat for long-term reliability in balance with rapid innovation
The Overlooked Operating System | Compiler: Stack/Unstuck
Linux, Shadowman, And Open Source Spirit | Compiler
채널별 검색
오토메이션
기술, 팀, 인프라를 위한 IT 자동화 최신 동향
인공지능
고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트
오픈 하이브리드 클라우드
하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요
보안
환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보
엣지 컴퓨팅
엣지에서의 운영을 단순화하는 플랫폼 업데이트
인프라
세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보
애플리케이션
복잡한 애플리케이션에 대한 솔루션 더 보기
가상화
온프레미스와 클라우드 환경에서 워크로드를 유연하게 운영하기 위한 엔터프라이즈 가상화의 미래