How to use the uniq command to process lists in Linux

2020년 10월 6일Amit Waghmare3분 읽기

We have seen the sort command in our previous article, but sorting any file will often result in many duplicate lines adjacent to each other. It becomes too difficult to properly view those lines.

In this scenario, the uniq command helps you to print duplicate lines once in the output. It actually discards the lines which are repeated and prints the first adjacent repeated line, which enables us to view the output properly.

The lines used in the input file for the uniq command can neither exceed 2048 bytes in length (including any newline characters) nor contain null characters.

Syntax

uniq [OPTION]... [INPUT [OUTPUT]]

Examples

Below are a series of examples, beginning with no options. We'll walk through several use cases. Some involve only uniq, and others rely on additional commands.

Without any option

Below is a file named file2, which contains some data. Note that this file is not sorted, and the duplicate lines are not adjacent to each other. Before using the uniq command with this file, we should sort it. In the example, I have tried the uniq command with the original file, but it only prints the output as it is, much like a cat output. In the next example, we take output from a sort command and pipe it with uniq command. This helps us understand the behavior of the uniq command:

$ cat file2
ChhatrapatiShahuMaharaj
Dr.B.R.Ambedkar
Budhha
Dr.B.R.Ambedkar
Budhha
Dr.B.R.Ambedkar
Budhha

$ uniq file2
ChhatrapatiShahuMaharaj
Dr.B.R.Ambedkar
Budhha
Dr.B.R.Ambedkar
Budhha
Dr.B.R.Ambedkar
Budhha

$ sort file2
Budhha
Budhha
Budhha
ChhatrapatiShahuMaharaj
Dr.B.R.Ambedkar
Dr.B.R.Ambedkar
Dr.B.R.Ambedkar

$ sort file2 | uniq
Budhha
ChhatrapatiShahuMaharaj
Dr.B.R.Ambedkar

With -c, --count option

Below, in the next example, we’re using the -c option to count the repeated lines. The uniq command prints that count as a prefix with the line. The below example tells us that the first line is repeated three times, the second line one time, and the third line three times:

$ sort file2 | uniq -c
    3 Budhha
    1 ChhatrapatiShahuMaharaj
    3 Dr.B.R.Ambedkar

With -d, --repeated option

The -d option prints only lines that are repeated. It discards non-duplicate lines. Therefore, line ChhatrapatiShahuMaharaj has been discarded in the below example:

$ sort file2 | uniq -d
Budhha
Dr.B.R.Ambedkar

In the below example, I’ve used the -c option to cross-check whether the -d option is only printing the repeated lines or not:

$ sort file2 | uniq -cd
    3 Budhha
    3 Dr.B.R.Ambedkar

With -D, --all-repeated option

The -D option prints repeated lines and discards the non-duplicate lines. In the below example, the uniq command prints all duplicate lines only and discards non-duplicate lines:

$ sort file2 | uniq -D
Budhha
Budhha
Budhha
Dr.B.R.Ambedkar
Dr.B.R.Ambedkar
Dr.B.R.Ambedkar

With -u, --unique option

Opposite of the above option, the -u option prints unique lines i.e., non-duplicate lines. Therefore, in the below example, it prints ChhatrapatiShahuMaharaj as an output:

$ sort file2 | uniq -u
ChhatrapatiShahuMaharaj

With -i, --ignore-case option

Using the -i option, we can ignore the case sensitivity of characters. Below I’ve given an output of the uniq command with and without the -i option to compare:

$ cat file3
aaaa
aaaa
AAAA
AAAA
bbbb
BBBB

$ uniq file3
aaaa
AAAA
bbbb
BBBB

$ uniq -i file3
aaaa
bbbb

With -f, --skip-fields=N

Sometimes we need to skip some fields to filter duplicate lines. This is possible using the -f option. In the following example, we’re skipping the first field (first column) to compare the duplicate lines from the second field. I’ve given both examples, with and without the -f option, for a better understanding of the option’s behavior:

$ cat file5
Amit aaaa
Ajit aaaa
Advi bbbb
Kaju bbbb

$ uniq file5
Amit aaaa
Ajit aaaa
Advi bbbb
Kaju bbbb

$ uniq -f 1 file5
Amit aaaa
Advi bbbb

[ Readers also liked: Working with pipes on the Linux command line ]

With -s, --skip-char=N option

Just like the field, we can skip characters as well by using the -s option. Please keep in mind that the uniq command prints only the first duplicate line and discards other duplicate lines. Therefore 33aa and 55bb have been discarded. Here is the example:

$ cat file4
22aa
33aa
44bb
55bb

$ uniq file4
22aa
33aa
44bb
55bb

$ uniq -s 2 file4
22aa
44bb

With -w, --check-chars=N option

Just like skipping characters, we can consider characters as well using the -w option, such as in the example:

$ cat file6
aa12
aa34
bb56
bb78

$ uniq file6
aa12
aa34
bb56
bb78

$ uniq -w 2 file6
aa12
bb56

With --version option

Use the --version option to check the version of the uniq command.

$ uniq --version
uniq (GNU coreutils) 8.4
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Richard M. Stallman and David MacKenzie.

[ Free download: Advanced Linux commands cheat sheet. ]

Wrap up

uniq does not detect repeated lines unless they are adjacent. The uniq command can count and print the number of repeated lines. Just like duplicate lines, we can filter unique lines (non-duplicate lines) as well and can also ignore case sensitivity. We can skip fields and characters before comparing duplicate lines and also consider characters for filtering lines.

After reviewing the multiple uniq command options, I would like to share a small image to keep it with you for reference.

저자 소개

Amit Waghmare

I'm a techie guy with lots of love for Linux. I've started my career with a US-based project as Linux Administrator. Later, I got an opportunity to work with HPC clusters, where I learned several other products. I love to teach, write blogs, troubleshoot complex issues, and write scripts to automate tasks. I also love to read books and watch movies/web series.

Read full bio

유사한 검색 결과

Blog post

채널별 검색

모든 채널 탐색