Finding duplicated files: several command-line methods
You can find duplicated files based on their content using the Terminal. These are some of the ways to achieve it.
Using find, sort and uniq
find . -type f -exec md5sum {} \; | sort | uniq -w32 -dD
- This will use
find
to search for files (-type f
) in the current directory and generate a MD5 checksum for each (-exec md5sum {} \;
). Then, it will sort the output by the MD5 checksum (sort
) and it will rununiq
to search for the duplicated (-dD
) MD5 checksum (-w32
: look only the first 32 characters, that corresponds with the MD5 checksum). - If you use another hash function, you’ll need to change the
uniq
command.find . -type f -exec sha1sum {} \; | sort | uniq -w40 -dD
fdupes
fdupes -r .
- This will search for duplicates recursively (
-r
), based on file sizes and MD5 signatures.
Finding duplicated file names (with the same or different content)
find . -type f | awk -F "/" '{print $NF}' | sort | uniq -d | sed 's/\s/\\ /g' | xargs -L 1 find . -name
- This will find files (
find . -type f
), select their filename (awk -F "/" '{print $NF}'
), sort them (sort
), select the repeated ones (uniq -d
), add a backslash to escape spaces (sed 's/\s/\\ /g'
) and search the path of those duplicated filenames (xargs -L 1 find . -name
).
Featured content: