dup.rst
author Oleksandr Gavenko <gavenkoa@gmail.com>
Wed, 14 Dec 2011 14:47:36 +0200
changeset 1164 376df9f34507
child 1165 b5e09fc4d751
permissions -rw-r--r--
Search for duplicate files.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
1164
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     1
.. -*- coding: utf-8; -*-
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     2
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     3
==============
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     4
 Duplication.
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     5
==============
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     6
.. contents::
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     7
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     8
Search for duplicate files.
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     9
===========================
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    10
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    11
This utilities only search for duplicate files:
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    12
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    13
  http://code.google.com/p/fdupes/
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    14
                fdupes home page
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    15
  http://ru.wikipedia.org/wiki/Fdupes
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    16
                fdupes wiki page
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    17
  http://duff.sourceforge.net/
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    18
                duff home page
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    19
  http://freedup.org/
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    20
                freedup home page
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    21
  http://dupedit.com/
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    22
                dupedit home page
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    23
  http://rdfind.pauldreik.se/
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    24
                Rdfind home page
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    25
  http://code.google.com/p/softenido/wiki/FindRepe
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    26
                FindRepe home page
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    27
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    28
freedups.
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    29
---------
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    30
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    31
Freedups searches through the directories you specify. When it finds two
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    32
identical files, it hard links them together. Now the two or more files still
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    33
exist in their respective directories, but only one copy of the data is stored
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    34
on disk; both directory entries point to the same data blocks.
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    35
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    36
  http://www.stearns.org/freedups/
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    37
                freedups home page
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    38
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    39
dupmerge.
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    40
---------
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    41
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    42
Dupmerge reads a list of files from standard input (eg., as produced by "find .
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    43
-print") and looks for identical files. When it finds two or more identical
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    44
files, all but one are unlinked to reclaim the disk space and recreated as hard
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    45
links to the remaining copy.
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    46
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    47
  https://sourceforge.net/projects/dupmerge/
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    48
                dupmerge home page
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    49
  http://freecode.com/projects/dupmerge
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    50
                dupmerge freecode page
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    51
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    52
ssdeep.
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    53
-------
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    54
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    55
ssdeep is a program for computing context triggered piecewise hashes (CTPH).
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    56
Also called fuzzy hashes, CTPH can match inputs that have homologies. Such
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    57
inputs have sequences of identical bytes in the same order, although bytes in
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    58
between these sequences may be different in both content and length.
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    59
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    60
  http://ssdeep.sourceforge.net/
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    61
                ssdeep home page
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    62