You can find duplicated files based on their content using the Terminal. These are some of the ways to achieve it.

For GUI apps, check Finding duplicated files (II): GUI apps.

Table of Contents

Using find, sort and uniq

find . -type f -exec md5sum {} \; | sort | uniq -w32 -dD
  • This will use find to search for files (-type f) in the current directory and generate a MD5 checksum for each (-exec md5sum {} \;). Then, it will sort the output by the MD5 checksum (sort) and it will run uniq to search for the duplicated (-dD) MD5 checksum (-w32: look only the first 32 characters, that corresponds with the MD5 checksum).
  • If you use another hash function, you’ll need to change the uniq command.
    find . -type f -exec sha1sum {} \; | sort | uniq -w40 -dD
    

fdupes

fdupes -r .
  • This will search for duplicates recursively (-r), based on file sizes and MD5 signatures.

More options:

  • -t: show modified time.
  • -d: interactive delete, choose what files to preserve.

Finding duplicated file names (with the same or different content)

find . -type f | awk -F "/" '{print $NF}' | sort | uniq -d | sed 's/\s/\\ /g' | xargs -L 1 find . -name
  • This will find files (find . -type f), select their filename (awk -F "/" '{print $NF}'), sort them (sort), select the repeated ones (uniq -d), add a backslash to escape spaces (sed 's/\s/\\ /g') and search the path of those duplicated filenames (xargs -L 1 find . -name).

If you have any suggestion, feel free to contact me via social media or email.