Finding duplicated files: several command-line methods

Last updated: Oct 19, 2022

Author: Ricardo Sanz

You can find duplicated files based on their content using the Terminal. These are some of the ways to achieve it.

Using find, sort and uniq

find . -type f -exec md5sum {} \; | sort | uniq -w32 -dD

This will use find to search for files (-type f) in the current directory and generate a MD5 checksum for each (-exec md5sum {} \;). Then, it will sort the output by the MD5 checksum (sort) and it will run uniq to search for the duplicated (-dD) MD5 checksum (-w32: look only the first 32 characters, that corresponds with the MD5 checksum).
If you use another hash function, you’ll need to change the uniq command.
```
find . -type f -exec sha1sum {} \; | sort | uniq -w40 -dD
```

fdupes -r .

This will search for duplicates recursively (-r), based on file sizes and MD5 signatures.

More options:

find . -type f | awk -F "/" '{print $NF}' | sort | uniq -d | sed 's/\s/\\ /g' | xargs -L 1 find . -name

This will find files (find . -type f), select their filename (awk -F "/" '{print $NF}'), sort them (sort), select the repeated ones (uniq -d), add a backslash to escape spaces (sed 's/\s/\\ /g') and search the path of those duplicated filenames (xargs -L 1 find . -name).

If you have any suggestion, feel free to contact me via social media or email.