dup.rst
author Oleksandr Gavenko <gavenkoa@gmail.com>
Fri, 13 Jul 2012 22:32:19 +0300
changeset 1334 9bf0d5a1f0cf
parent 1182 f0fd5e35e832
child 1691 7eeecad00b74
permissions -rw-r--r--
Include common header with quick links.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
1164
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     1
.. -*- coding: utf-8; -*-
1334
9bf0d5a1f0cf Include common header with quick links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1182
diff changeset
     2
.. include:: HEADER.rst
1164
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     3
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     4
==============
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     5
 Duplication.
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     6
==============
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     7
.. contents::
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     8
1166
3c925325d9d4 Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1165
diff changeset
     9
Search for duplicate lines.
3c925325d9d4 Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1165
diff changeset
    10
===========================
3c925325d9d4 Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1165
diff changeset
    11
3c925325d9d4 Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1165
diff changeset
    12
  http://en.wikipedia.org/wiki/Duplicate_code
3c925325d9d4 Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1165
diff changeset
    13
                wiki page
1181
c3854c05f00d Code Clones Literature
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1166
diff changeset
    14
  http://students.cis.uab.edu/tairasr/clones/literature/
c3854c05f00d Code Clones Literature
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1166
diff changeset
    15
                Code Clones Literature
1166
3c925325d9d4 Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1165
diff changeset
    16
1182
f0fd5e35e832 Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1181
diff changeset
    17
Open source or free licence:
f0fd5e35e832 Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1181
diff changeset
    18
f0fd5e35e832 Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1181
diff changeset
    19
 * http://duplo.sourceforge.net/
f0fd5e35e832 Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1181
diff changeset
    20
 * http://clonedigger.sourceforge.net/
f0fd5e35e832 Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1181
diff changeset
    21
 * http://www.ccfinder.net/ccfinderxos.html
f0fd5e35e832 Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1181
diff changeset
    22
f0fd5e35e832 Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1181
diff changeset
    23
Proprietary or restricted licence:
f0fd5e35e832 Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1181
diff changeset
    24
f0fd5e35e832 Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1181
diff changeset
    25
 * http://www.txl.ca/nicaddownload.html
f0fd5e35e832 Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1181
diff changeset
    26
 * http://www.harukizaemon.com/simian/index.html
f0fd5e35e832 Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1181
diff changeset
    27
 * http://getatomiq.com/
f0fd5e35e832 Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1181
diff changeset
    28
 * http://www.harukizaemon.com/simian/index.html
f0fd5e35e832 Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1181
diff changeset
    29
1164
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    30
Search for duplicate files.
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    31
===========================
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    32
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    33
This utilities only search for duplicate files:
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    34
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    35
  http://duff.sourceforge.net/
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    36
                duff home page
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    37
  http://freedup.org/
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    38
                freedup home page
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    39
  http://dupedit.com/
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    40
                dupedit home page
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    41
  http://rdfind.pauldreik.se/
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    42
                Rdfind home page
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    43
  http://code.google.com/p/softenido/wiki/FindRepe
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    44
                FindRepe home page
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    45
1165
b5e09fc4d751 fdupes.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1164
diff changeset
    46
fdupes.
b5e09fc4d751 fdupes.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1164
diff changeset
    47
=======
b5e09fc4d751 fdupes.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1164
diff changeset
    48
::
b5e09fc4d751 fdupes.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1164
diff changeset
    49
b5e09fc4d751 fdupes.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1164
diff changeset
    50
  $ sudo apt-get install fdupes
b5e09fc4d751 fdupes.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1164
diff changeset
    51
b5e09fc4d751 fdupes.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1164
diff changeset
    52
See:
b5e09fc4d751 fdupes.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1164
diff changeset
    53
b5e09fc4d751 fdupes.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1164
diff changeset
    54
  http://code.google.com/p/fdupes/
b5e09fc4d751 fdupes.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1164
diff changeset
    55
                fdupes home page
b5e09fc4d751 fdupes.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1164
diff changeset
    56
  http://ru.wikipedia.org/wiki/Fdupes
b5e09fc4d751 fdupes.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1164
diff changeset
    57
                fdupes wiki page
b5e09fc4d751 fdupes.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1164
diff changeset
    58
  http://packages.debian.org/search?keywords=fdupes
b5e09fc4d751 fdupes.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1164
diff changeset
    59
                fdupes Debian package
b5e09fc4d751 fdupes.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1164
diff changeset
    60
1164
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    61
freedups.
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    62
---------
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    63
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    64
Freedups searches through the directories you specify. When it finds two
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    65
identical files, it hard links them together. Now the two or more files still
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    66
exist in their respective directories, but only one copy of the data is stored
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    67
on disk; both directory entries point to the same data blocks.
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    68
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    69
  http://www.stearns.org/freedups/
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    70
                freedups home page
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    71
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    72
dupmerge.
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    73
---------
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    74
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    75
Dupmerge reads a list of files from standard input (eg., as produced by "find .
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    76
-print") and looks for identical files. When it finds two or more identical
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    77
files, all but one are unlinked to reclaim the disk space and recreated as hard
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    78
links to the remaining copy.
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    79
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    80
  https://sourceforge.net/projects/dupmerge/
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    81
                dupmerge home page
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    82
  http://freecode.com/projects/dupmerge
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    83
                dupmerge freecode page
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    84
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    85
ssdeep.
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    86
-------
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    87
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    88
ssdeep is a program for computing context triggered piecewise hashes (CTPH).
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    89
Also called fuzzy hashes, CTPH can match inputs that have homologies. Such
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    90
inputs have sequences of identical bytes in the same order, although bytes in
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    91
between these sequences may be different in both content and length.
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    92
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    93
  http://ssdeep.sourceforge.net/
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    94
                ssdeep home page
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    95