Existe um programa desse tipo e é chamado rdfind
:
SYNOPSIS
rdfind [ options ] directory1 | file1 [ directory2 | file2 ] ...
DESCRIPTION
rdfind finds duplicate files across and/or within several directories.
It calculates checksum only if necessary. rdfind runs in O(Nlog(N))
time with N being the number of files.
If two (or more) equal files are found, the program decides which of
them is the original and the rest are considered duplicates. This is
done by ranking the files to each other and deciding which has the
highest rank. See section RANKING for details.
Ele pode excluir as duplicatas ou substituí-las por links simbólicos ou físicos.