Skip to main content

Sysadmin toolbox: How to use the sort command to process text in Linux

As a handy sysadmin tool, sort makes it easy to rearrange text data in various ways without changing the original files.
Image
Using sort to manipulate text in Linux
Image by StarzySpringer from Pixabay

The sort command is used in Linux to print the output of a file in given order. This command processes on your data (the content of the file or output of any command) and reorders it in the specified way, which helps us to read the data efficiently. It is very useful in cases where large quantities of information are available and need to be arranged in alphabetical or ascending or descending numerical order.

In alphabetical sorting, the command checks the first letter of each line and moves the lines upward or downward to arrange each line in alphabetical order.

In numerical sorting, the command checks numbers on each line and arranges the lines in ascending or descending order. This organization displays a smaller number at the top of your output. There is a very small difference in sort and grep command. The sort command arranges data alphabetically or numerically in ascending or descending order. The grep command displays or hides only the required information you want.

In short, sort is a useful command when you need to read a big file or list that is not arranged correctly, and it's become tough and time-consuming to read these files. To resolve this problem, use the sort command in Linux to organize the content of files or lists in the required format, which may help to read the required contents. The sort command assumes the data is in the ASCII format. There are some useful options for sort which can change the behavior of output. Some of the examples are given below, along with the syntax of the command.

Syntax

sort [OPTION]... [FILE]...

sort [OPTION]... --files0-from=F

Examples

In the first example, we use the sort command without any options. This organizes each line in alphabetical order by considering the first letter of each line. Note: Lines starting with a lowercase letter appear before lines beginning with an uppercase letter. Therefore b (lowercase) comes in the first position, and B (uppercase) is in the second position.

$ cat test.txt

Dr.B.R.Ambedkar
MahatmaJyotibaPhule
Budhha
ChatrapatiShahuMaharaj
budhha
Ramaai
$ sort test.txt

budhha
Budhha
ChatrapatiShahuMaharaj
Dr.B.R.Ambedkar
MahatmaJyotibaPhule
Ramaai

Sometimes, we need data in reverse order i.e., the opposite of alphabetical order. This is accomplished by using the -r option, as seen below:

$ sort test.txt

budhha
Budhha
ChatrapatiShahuMaharaj
Dr.B.R.Ambedkar
MahatmaJyotibaPhule
Ramaai
$ sort -r test.txt

Ramaai
MahatmaJyotibaPhule
Dr.B.R.Ambedkar
ChatrapatiShahuMaharaj
Budhha
budhha

Like letter sorting, we can sort numerically as well. Option -n organizes the numerical and reverses your results using -r option. Below, using the -n option, we've arranged the numbers in ascending order. Therefore the smallest number is at the top, and the largest number is at the bottom. We can also reverse the output using the same above option -r with -n and display the largest number at the top.

$ cat numeric.txt

14
04
34
1891
938
378
2356
$ sort -n numeric.txt

04
14
34
378
938
1891
2356
$ sort -nr numeric.txt

2356
1891
938
378
34
14
04

You can sort the specific column as well. To sort a particular column, use the -k option along with a column number. Please note that in the below example, we've used only the -k option to select the column, therefore the sort command arranges data by considering the first digit of the second column, not the whole number of the second column. Thus, the line containing digit 278 displays before the line containing digit 28. If we use the -n option with -k (for column select), then the data displays in ascending order, and the sequence will be from smallest to largest number (which has been covered in the second example):

$ cat file2.txt

Advika 1
Amit 30
Ajit 28
Abhi 278
Chirag 2
$ sort -k 2 file2.txt

Advika 1
Chirag 2
Abhi 278
Ajit 28
Amit 30

Here, we use column number and numerical sorting together. In the following example, we sort the fifth column numerically in ascending order.

$ ls -l

total 0
-rw-r--r-- 1 amwaghma hpcapp 42 Aug 20 19:30 file2.txt
-rw-r--r-- 1 amwaghma hpcapp 31 Aug 20 19:51 months.txt
-rw-r--r-- 1 amwaghma hpcapp 27 Aug 20 19:20 numeric.txt
-rw-r--r-- 1 amwaghma hpcapp 73 Aug 20 19:49 test.txt
$ ls -l | sort -nk 5

total 0
-rw-r--r-- 1 amwaghma hpcapp 27 Aug 20 19:20 numeric.txt
-rw-r--r-- 1 amwaghma hpcapp 31 Aug 20 19:51 months.txt
-rw-r--r-- 1 amwaghma hpcapp 42 Aug 20 19:30 file2.txt
-rw-r--r-- 1 amwaghma hpcapp 73 Aug 20 19:49 test.txt

Often, there are many duplicate entries in some lines. Those can be eliminated by using the -u option. In the following example, we display the behavior of the -u option to eliminate the duplicate entries:

$ cat test.txt

Dr.B.R.Ambedkar
MahatmaJyotibaPhule
ChatrapatiShahuMaharaj
Dr.B.R.Ambedkar
budhha
Ramaai
Dr.B.R.Ambedkar
$ sort test.txt

budhha
ChatrapatiShahuMaharaj
Dr.B.R.Ambedkar
Dr.B.R.Ambedkar
Dr.B.R.Ambedkar
MahatmaJyotibaPhule
Ramaai
$ sort -u test.txt

budhha
ChatrapatiShahuMaharaj
Dr.B.R.Ambedkar
MahatmaJyotibaPhule
Ramaai

There is one interesting option by which we can check whether the file is sorted or not. Using the -c option, the sort command reports the first out of place line. If the existing file is sorted already, then sort doesn't give any output. It checks to each line one after another, and when it finds any line not sorted, it provides a message. In the example below, the -c option of the sort command helps to check each line. In the first attempt, it compares the first letter of first two lines, which it finds to be correct i.e., letter D and letter M are alphabetically arranged. In the second attempt, it examines the third line's first letter with the first two line's initial letter and finds that the third line is not arranged alphabetically. Therefore, it prints the first mismatched line of the file with the line number.

$ cat test.txt

Dr.B.R.Ambedkar
MahatmaJyotibaPhule
ChatrapatiShahuMaharaj
budhha
Ramaai
$ sort test.txt

budhha
ChatrapatiShahuMaharaj
Dr.B.R.Ambedkar
MahatmaJyotibaPhule
Ramaai
$ sort -c test.txt

sort: test.txt:3: disorder: ChatrapatiShahuMaharaj

Just like data or numerical arrangements, we can also arrange the months in the file. Do this by using the -M option and reverse the order using the -r option.

$ cat months.txt

February
December
January
July
$ sort -M months.txt

January
February
July
December
$ sort -Mr months.txt

December
July
February
January

We can also sort more than one file simultaneously by using respective file names as arguments separated with a space. The output prints one after another. Below, I've used the -n option to sort the second numerical file.

$ cat test.txt numeric.txt

Dr.B.R.Ambedkar
MahatmaJyotibaPhule
ChatrapatiShahuMaharaj
budhha
Ramaai
14
04
34
1891
938
378
2356
$ sort test.txt -n numeric.txt

budhha
ChatrapatiShahuMaharaj
Dr.B.R.Ambedkar
MahatmaJyotibaPhule
Ramaai
04
14
34
378
938
1891
2356

You can also redirect the sorted output to another file by using the -o option:

$ sort test.txt > sortfile

OR

$ sort -o sortfile test.txt

$ cat sortfile

budhha
Budhha
ChatrapatiShahuMaharaj
Dr.B.R.Ambedkar
MahatmaJyotibaPhule
Ramaai

Wrap up

Using the above options, we see that sorting can be done with file content or the output of any command. It makes it easy to arrange large data sets in ascending or descending order. There are so many options which we can use to re-arrange the data in all possible ways. The most amazing thing is that we didn't make any changes to the original file. Therefore our data is safe.

[ Free download: Advanced Linux commands cheat sheet. ]

Topics:   Linux  
Author’s photo

Amit Waghmare

I'm a techie guy with lots of love for Linux. I've started my career with a US-based project as Linux Administrator. Later, I got an opportunity to work with HPC clusters, where I learned several other products. More about me

Try Red Hat Enterprise Linux

Download it at no charge from the Red Hat Developer program.