dedup(1)

Name

dedup – delete duplicate files

Synopsis

dedup [−x command] [−rfv] {source ... target | −a source ...}
dedup −i [−rfl] {source ... target | −a source ...}
dedup −n [−rf] {source ... target | −a source ...}
dedup −d [−lv] source target

Description

In the first form with target specified, dedup deletes every source that matches target. If target is a directory, dedup also deletes every source that matches any file or subdirectory contained in target at any depth.

When −a is specified instead of target, dedup deletes every source that matches another source preceding it on the command line.

Regular files match if they have the same content, regardless of file names and attributes. Directories match if all the files and subdirectories they contain have the same names and match. Symbolic links are not followed, but rather compared by their target paths. All other special files never match.

The following options control the search:

−r

If source is a directory, recursively delete individual matching files and subdirectories that it contains. Without this option, each source can only match and be deleted as a whole. (target is always searched recursively.)

With −a, all files and subdirectories in every source, including different files and subdirectories in the same source, can match each other and be deleted.

−f

Do not delete directories, only individual files. (Most useful together with −r.)

The following options control what happens when a match is found:

−v

Be verbose, showing files and directories as they are deleted.

−x command

Execute command instead of deleting. The source name is assigned to the positional parameter $1 and the matching target name to $2.

Even if the match is a directory, the command is only executed once for the entire match, not for every file and subdirectory it contains.

(The default behavior is equivalent to −x 'rm -r -- "$1"'.)

In the second and third forms, instead of deleting anything, dedup only shows the results of the search.

With −i, it lists every match together with matched target.

With −n, it lists files and directories that do not match any target.

In both forms superfluous results are omitted when −r is specified. With −i, only the matching directory is listed, not the files and subdirectories it contains. With −n, only the directory is listed if it contains two or more files and subdirectories, none of which match. Use −f to list individual files.

In the fourth form, dedup compares a single source tree to a target tree and gives a diff-like output.

The format of the diff-like output is as follows. Files and directories that only exist in one or the other tree are prefixed, respectively, by ‘−’ and ‘+’. Different files and directories that have the same relative path in the two trees are prefixed by ‘∗’. Matching files and directories with different relative paths are listed without a prefix together with their path in the other tree.

The following option controls which search results are displayed in the second and fourth forms:

−l

Lists all matching targets, not just the first one.

The following option additionally controls the output in the fourth form:

−v

Also list matching files and directories that have the same relative path in the two trees (i.e., unchanged files).

The following options control how comparisons are done in all forms:

−c command

Pipe file contents through command before comparing. The filename is assigned to the positional parameter $1.

(The default behavior is equivalent to −c 'cat'.)

−p command

Compare pairs of files with command. The filenames are assigned to the positional parameters $1 and $2.

(The default behavior is equivalent to −p 'cmp -s -- "$1" "$2"', except that file sizes and checksums are used to select only plausible matches for full comparison.)

Exit status

dedup exits 0 if no errors occurred, regardless of whether any files were deleted or any matches were listed.

Caveats

It is not checked whether two matching files are actually the same file. Thus, target should not be a parent directory of source, otherwise source will always match itself and be deleted.

Examples

Compare two files and delete file1 if they are the same:

dedup file1 file2

Recursively compare two directories and delete dir1 if is the same as dir2 or some dir2/subdir:

dedup dir1 dir2

Delete both files because they will match themselves in . (avoid doing this accidentally):

dedup file1 file2 . # caveat

Find all copies of file1, file2, dir, and any files and subdirectories in dir in a collection:

dedup -il file1 file2 collection
dedup -irl dir collection

Find all duplicates in a collection:

dedup -air collection

Find all instances of a file in a collection, even if the file has been gzipped:

dedup -c 'zcat -f' -il file collection

Find duplicate music files, even if their tags have changed:

dedup -c 'ffmpeg -v quiet -i "$1" -f s16le - </dev/null || cat' \
    -air collection

Find duplicate music files, even if they have slight discrepancies:

dedup -p 'wavcmp -sq -- "$1" "$2"' -air collection

Clean up after a directory move that was interrupted in the copying phase (a partially-copied file may remain):

mv dir /mnt
^C
dedup -r /mnt/dir dir

Restore names and timestamps to files (but not directories) using a backup for reference:

dedup -x 'mv -n -- "$1" "`dirname "$1"`/`basename "$2"`"' \
      -rf dir backup
dedup -rfx 'touch -r "$2" -- "$1"' dir backup

Download files from several locations, check whether all versions are the same (−n will list only one file in that case), and delete all the extra copies:

wget -i .../urls.txt
dedup -an -- *
dedup -a -- *

Delete any empty leaf subdirectories in the current directory:

dedup -r . /var/empty

Recursively compare two directories:

dedup -d dir1 dir2

Authors

dedup was written by Andrey Zholos.