How to find duplicated words in a text file

Duplicated words on the same line
Duplicated words on one-word-per-line files

You can find and remove duplicated words using one-line commands.

If you want to find duplicated files, check Finding duplicated files: several command-line methods.

Duplicated words on the same line

Let’s assume we have this text file:

$ cat dupl.txt
Hi, how are are you?
I'm fine, thanks.

I'm working on on a new project.

We have some duplicated adyacent words, like ‘are’ and ‘on’. You can use awk (GNU awk) to find these words and delete them.

awk -v RS='[[:space:]]' '$1 != D {printf "%s", $1 RT; D=$1}' < dupl.txt > new.txt

-v RS='[[:space:]]': assign variable RS (the input record separator, by default a newline) to the ‘space’ character.
'$1 != D: means “if the word is not equal to ‘D’”, a variable that is defined at the end.
{printf "%s", $1 RT;: print the word and the variable ‘RT’ (the record terminator. Gawk sets RT to the input text that matched the character or regular expression specified by RS, in this case the ‘space’ character).
D=$1: assign ‘D’ to the word, so in the next iteration it can check if the word is the same.

This assumes that every line ends in a newline character, If not, you can modify the command to add the newline character manually:

awk -v RS='[[:space:]]' '$1 != D {printf "%s", $1 (RT?RT:ORS); D=$1}' < dupl.txt > new.txt

Duplicated words on one-word-per-line files

We have a wordlist file like this one:

$ cat wordlist.txt
one
two
three
two
four

We want to remove repeated words, in this case ‘two’. If you don’t mind about the line order, you can run sort.

sort -u wordlist.txt > newwordlist.txt

If repeated words are always in adyacent lines, you can run uniq:

uniq wordlist.txt > newwordlist.txt

In order to keep line order, we can use cat -n to print the line number, then sort the words and remove the duplicates, and finally reorder the lines.

cat -n wordlist.txt | sort -k2 | uniq -f1 | sort -n | cut -f2

Test with this online terminal:

If you have any suggestion, feel free to contact me via social media or email.

How to find duplicated words in a text file

Table of Contents

Duplicated words on the same line

Duplicated words on one-word-per-line files

HestiaCP: a free and open source CPanel alternative

Create multiple WordPress sites easily with AWS Lightsail

Creating static websites with Astro

Speech Note: Text-To-Speech, Speech-To-Text and Translations within the same application

Fixing WebGL issues in LibreWolf

How to run CLI scripts inside a GUI environment

Open source projects to follow (XI)

Convert between several markup formats with Pandoc

timeout: run a command with a time limit

Export a manpage to (almost) any format

RSS readers: read feeds with these graphical and command line tools

RS1 Linux News: news aggregator focused on Linux and open source

How to set system language and other regional settings

Poppler: command-line PDF tools

Bliss OS: Android on your PC

How to run Linux commands on a Google Colab notebook

scan4all: a new vulnerability scanner

Popular alternative repositories

(ES) Configurar Linux para usar el DNI electrónico

List of Linux and FOSS websites