awk, sed, cut and tr: processing a text file

When you want to process a text file using the command line, awk, sed, cut and tr are the most used programs. Here you will learn the basic use of each one.

awk

This powerful tool can do simple and complex text processing tasks. It searches for a pattern on a file and performs some actions. This process of searching and performing actions is called a ‘program’. When you run awk, you specify an awk ‘program’.

awk 'program' <input files>

When the program is long, it is more covenient to put it on a file and then run it like this:
```
awk -f programfile <input files>
```

A program consist of one or several ‘rules’. A program must be enclosed between single quotes. A rule consist of a pattern and an action:

pattern { action }

You can omit the pattern (everything matches) or the action (prints all the line). You can add several rules separated by newlines or ;.

A basic use of awk is for printing (showing) all contents of a file.

awk '{print}' text.txt
# or
awk '{print $0}' text.txt

Pipes

You can pipe the output of another command to awk.

df -h | awk '{print $0}'

awk '{print}' < text.txt

To Export the output to a file.
```
df -h | awk '{print}' > output.txt
```

Columns

Print first column of a tab-separator file (or command).
```
df -h | awk '{print $1}'
```
Print last column
```
awk '{print $NF}' text.txt
```
- NF stands for “Number of Field”.
Print several columns.
```
df -h | awk '{print $1,$3}'
```

Using a diferent separator.

awk -F ":" '{print $1}' /etc/passwd

You can also type BEGIN {FS=":"}:

awk 'BEGIN {FS=":"}; {print $1}' /etc/passwd

You can add several code blocks inside the single quotes (awk '{code block}; {code block}')

Rows

Print a specific line
```
awk 'NR==3 {print}' text.txt
```
- NR stands for “Number of Record”.

Formating the output

Sum column values row by row (same for other mathematical operations).
```
$ awk -F ':' '{print $3+$4}' /etc/passwd
0
2
4
```

Separate column values by some character (like a tab).

$ awk -F ',' '{print NR"\t"$1"\t"$2}' < test.csv
fechalectura	lectura
2021-06-04	177
2021-08-10	184
2021-10-07	190

Print the total number of lines (records).

# END: this code block runs when all input has been processed
awk -F ':' 'END {print NR}' /etc/passwd

Using printf.

# Print second column as a floating number with two decimals
awk -F ',' '{printf "%.2f\n",$2}' < testfile

Define the OFS (Output Field Separator).

awk -F ',' '{OFS="\t"; $1=$1; print}' < test.csv

Conditions

Search for a RegExp.

# look for 'sshd'
awk -F ':' '/sshd/ {print $1,$7}' /etc/passwd

Standard “if” (always inside ‘action’ section).

# Don't print first value of column 2
awk -F ',' '{if (NR != 1) print $2}' test.csv

# Split the file based on the third column value
awk -F ',' '{if ($3 >= 1000) print $0}' < test.csv > testmore1000.csv
awk -F ',' '{if ($3 < 1000) print $0}' < test.csv > testless1000.csv

Condition outside curly braces.

# Print odd lines
awk 'NR % 2 != 0 {print}' file

# Print lines where $1 contains 'a'
awk '$1 ~ /a/ {print}' urls.txt

# Print lines if contains abc AND xyz
awk '/abc/ && /xyz/' file

Built-in functions

tolower(<string>), toupper(<string>): change uppercase to lowercase and viceversa.
```
$ awk '{print tolower($0)}' <<< 'Hello World'
hello world
```

length(<string>): show the length of a string.

$ awk '{print length($0)}' <<< 'Hello World'
11

mktime(<date spec>): transform a date spec into a timestamp.

$ awk '{print mktime($0)}' <<< '2022 01 01 02 00 00'
1640998800

strftime(<format>, <timestamp>): format a timestamp.

$ awk '{print strftime("%d-%m-%Y",$0)}' <<< '1640998800'
01-01-2022

substr(<string>, <start>, <length>): return a substring.

Some examples

Sum values for each year

$ awk -F ',' '/2022/ {sum22 += $2} ; /2021/ {sum21 += $2} END {print "2022: " sum22 ", 2021: " sum21}' file.csv
2022: 627, 2021: 748

Add a new calculated column and new column names

# BEGIN: this will run before file processing
# current, diff and prev are variables

$ awk -F ',' 'BEGIN {print "date,m3,diff"}; {current = $2}; {diff = current - prev}; {prev = $2}; {if (NR != 1) print $1","$2","diff}' ../file.csv | column -s ',' -t
date        m3   diff
2021-06-04  177  177
2021-08-10  184  7
2021-10-07  190  6
2021-12-06  197  7

Remove two lines (and add line numbers)

$ awk -F ',' '{if (NR==1 || NR==3) {} else {print NR " "$0}}' test.csv | head -n5
2021-07-21 00:00,247
2021-07-21 02:00,136
2021-07-21 03:00,82
2021-07-21 04:00,84
2021-07-21 05:00,115

Search multiple patterns (&&)

$ ps aux | awk '/pts/ && /bash/'
ricardo     1521  0.0  0.0  11528  5892 pts/1    Ss   abr19   0:00 /bin/bash
ricardo     9482  0.0  0.0  11132  5392 pts/39   Ss+  12:45   0:00 /bin/bash
ricardo    15174  0.0  0.0  12224  3756 pts/1    S+   14:02   0:00 awk /pts/ && /bash/

$ awk -F ',' '/2021/ && NR==2 {start21=$2}; /2021/ {end21=$2}; /2022/ {end22=$2}; END {print "2021: "end21-start21"; 2022: "end22-end21}' file.csv
2021: 20; 2022: 17

Display a random line (look at the use of double quotes)

$ awk "NR==$(($RANDOM % `wc -l < urls.txt`))+1 {print}"  urls.txt
https://eldiario.es
$ awk "NR==$(($RANDOM % `wc -l < urls.txt`))+1 {print}"  urls.txt
https://google.com

sed

Use the ‘Streaming EDitor’ to transform a text.

Replace/delete text

Substitute “word1” for “word2”.
```
sed 's/word1/word2/' text.txt
```
This command will not change the file, only show the results. You can export the output to a file with > or use -i to edit the original file (use with caution, it’s always safer to create a new file).
```
sed 's/word1/word2/' text.txt > newtext.txt
```
```
sed -i 's/word1/word2/' text.txt
```
Also, it only changes the first occurence in each line. To change all ocurrences:
```
sed 's/word1/word2/g' text.txt
```
Find and replace a word in several files at once:
```
sed -i 's/word1/word2/g' *.txt
```
Delete word1.
```
sed 's/word1//g' text.txt
```
Delete first character of every line.
```
sed 's/^.//' text.txt
```
Delete last character of every line.
```
sed 's/.$//' text.txt
```
Replace “o” for “O” only on lines that match a pattern.
```
sed '/root/s/o/O/g' /etc/passwd
```

Delete lines

Delete lines matching a pattern.
```
sed '/root/d' /etc/passwd
```
You can use RegExp when looking for a pattern. This command will delete any empty line.
```
sed '/^$/d' test.txt
```
- Depending of the file formatting, you can also use '/\r/d' or '/^\r$/d'.

Print matched lines

sed -n '/pattern/p' file.txt

Prepend/Append lines

Insert text one line before every line.
```
sed 'i\new line' test.txt
```
Append text (insert one line after every line).
```
sed 'a\new line' test.txt
```

Specify a line number

If you add a line number before the subcommand letter (a, i, d, etc.) that subcommand will only run in that line. To refer to the last line, type $.

# Delete second line
sed 2d test.txt
# Delete from line 3 to line 6
sed 3,6d test.txt

# Insert a line at the beginning
sed '1i\new line' test.txt

Extended RegExp support

For some RegExp, you may need to use -E parameter.

sed -E '/(\.com)$/d'

Replace between patterns

You can use RegExp to split a line into several groups (using parenthesis) and replace only one of them.

$ cat urls.txt
https://elpais.com
https://eldiario.es
https://radiohuesca.com
https://google.com

$ sed -E 's/^(https:\/\/)(.*)(\.com)$/\1test\3/' urls.txt 
https://test.com
https://eldiario.es
https://test.com
https://test.com

You can select pattern groups with \ and its number: 1 for first group, 2 for the second, etc. In this case, we want to print the first group (https://), add test and print the third group (.com).

More examples

Merge lines 2 and 3. Replace ‘2’ with the line you want to merge. You can add spaces or any character between the lines.
```
sed '2N;s/\n//' testfile
```

Change only first occurence in a file (from line 0 to /-/)

$ cat text
Hola Mundo
-
Hola Mundo

$ sed '0,/-/{s/H/h/g}' text
hola Mundo
-
Hola Mundo

Change uppercase to lowercase and viceversa

sed 's/[[:lower:]]/\U&/g' lowertoupper.txt
sed 's/[[:upper:]]/\l&/g' uppertolower.txt
# Detects accented letters

cut

cut is a simpler version of awk. You can use it to separate a text in columns and show a specific column or several columns.

Print first column on a “:” delimiter file.
```
cut -d ":" -f1 /etc/passwd
```
Print two columns.
```
cut -d ":" -f1,7 /etc/passwd
```
By default, cut uses the delimiter as a separator in the output, but you can change it with --output-delimiter=DELIMITER.
```
cut -d ":" -f1,7 --output-delimiter=" " /etc/passwd
```

tr

tr works similar to sed: translates or deletes characters from standard input, writing to standard output.

tr 'character' 'substitution' < file

For example, you can change commas to tabs:

tr ',' '\t' < file.csv

You can achieve something similar with column -s ',' -t < file.csv.

Or you can delete all spaces:

echo {a..z} | tr -d ' '

You can change uppercase into lowercase (and viceversa) easily:

$ tr '[:lower:]' '[:upper:]' <<< 'Hello World'
HELLO WORLD

Test with this online terminal:

If you have any suggestion, feel free to contact me via social media or email.

awk, sed, cut and tr: processing a text file

Table of Contents

awk

Pipes

Columns

Rows

Formating the output

Conditions

Built-in functions

Some examples

sed

Replace/delete text

Delete lines

Print matched lines

Prepend/Append lines

Specify a line number

Extended RegExp support

Replace between patterns

More examples

cut

tr

Creating static websites with Astro

Speech Note: Text-To-Speech, Speech-To-Text and Translations within the same application

Fixing WebGL issues in LibreWolf

How to run CLI scripts inside a GUI environment

Open source projects to follow (XI)

Convert between several markup formats with Pandoc

timeout: run a command with a time limit

Export a manpage to (almost) any format

RSS readers: read feeds with these graphical and command line tools

RS1 Linux News: news aggregator focused on Linux and open source

Open source projects to follow (X)

Joplin: an awesome note-taking application, available on multiple devices

Mabox Linux: a lightweight Manjaro with Openbox WM

scan4all: a new vulnerability scanner

Using Kali Linux on Linode (VNC)

Alternative search engines: life beyond Google

Ultramarine Linux: Fedora with some useful tweaks

How to run Linux commands on a Google Colab notebook

Limit available system resources per user with Systemd and cgroups

Bliss OS: Android on your PC

Google Colab: some great projects

Quickemu: an alternative to GNOME Boxes for using virtual machines

Running desktop apps on Docker containers: X11 forwarding

List of Linux and FOSS websites