dup.rst
author Oleksandr Gavenko <gavenkoa@gmail.com>
Sat, 10 Feb 2018 01:49:07 +0200
changeset 2230 9e6ad6607a9e
parent 2228 837f1337c59b
permissions -rw-r--r--
Fixed formatting.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
1164
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     1
.. -*- coding: utf-8; -*-
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     2
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     3
==============
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     4
 Duplication.
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     5
==============
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     6
.. contents::
1905
fba288d59662 Include only local subsections into TOC. This prevent duplication of
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1691
diff changeset
     7
   :local:
1164
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     8
1166
3c925325d9d4 Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1165
diff changeset
     9
Search for duplicate lines.
3c925325d9d4 Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1165
diff changeset
    10
===========================
3c925325d9d4 Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1165
diff changeset
    11
2093
cef16cb3dded Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2016
diff changeset
    12
http://en.wikipedia.org/wiki/Duplicate_code
cef16cb3dded Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2016
diff changeset
    13
  Wiki page.
cef16cb3dded Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2016
diff changeset
    14
http://students.cis.uab.edu/tairasr/clones/literature/
cef16cb3dded Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2016
diff changeset
    15
  Code Clones Literature.
cef16cb3dded Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2016
diff changeset
    16
https://moz.com/devblog/near-duplicate-detection/
cef16cb3dded Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2016
diff changeset
    17
  MinHash vs SimHash algorithm explanation.
1166
3c925325d9d4 Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1165
diff changeset
    18
1182
f0fd5e35e832 Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1181
diff changeset
    19
Open source or free licence:
f0fd5e35e832 Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1181
diff changeset
    20
2228
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2093
diff changeset
    21
* http://duplo.sourceforge.net/
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2093
diff changeset
    22
* http://clonedigger.sourceforge.net/
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2093
diff changeset
    23
* http://www.ccfinder.net/ccfinderxos.html
1182
f0fd5e35e832 Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1181
diff changeset
    24
f0fd5e35e832 Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1181
diff changeset
    25
Proprietary or restricted licence:
f0fd5e35e832 Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1181
diff changeset
    26
2228
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2093
diff changeset
    27
* http://www.txl.ca/nicaddownload.html
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2093
diff changeset
    28
* http://www.harukizaemon.com/simian/index.html
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2093
diff changeset
    29
* http://getatomiq.com/
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2093
diff changeset
    30
* http://www.harukizaemon.com/simian/index.html
1182
f0fd5e35e832 Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1181
diff changeset
    31
2093
cef16cb3dded Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2016
diff changeset
    32
http://stackoverflow.com/questions/191614/how-to-detect-code-duplication-during-development
cef16cb3dded Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2016
diff changeset
    33
  How to detect code duplication during development?
cef16cb3dded Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2016
diff changeset
    34
https://en.wikipedia.org/wiki/List_of_tools_for_static_code_analysis
cef16cb3dded Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2016
diff changeset
    35
  List of tools for static code analysis.
cef16cb3dded Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2016
diff changeset
    36
http://askubuntu.com/questions/434545/identify-duplicate-lines-in-a-file-without-deleting-them
cef16cb3dded Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2016
diff changeset
    37
  Identify duplicate lines in a file without deleting them?
cef16cb3dded Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2016
diff changeset
    38
http://stackoverflow.com/questions/13046791/how-to-delete-the-repeat-lines-in-emacs
cef16cb3dded Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2016
diff changeset
    39
  How to delete the repeat lines in emacs.
cef16cb3dded Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2016
diff changeset
    40
http://emacs.stackexchange.com/questions/13092/how-can-i-highlight-duplicate-lines
cef16cb3dded Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2016
diff changeset
    41
  How can I highlight duplicate lines?
cef16cb3dded Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2016
diff changeset
    42
https://www.emacswiki.org/emacs/DuplicateLines
cef16cb3dded Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2016
diff changeset
    43
  Duplicate Lines.
cef16cb3dded Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2016
diff changeset
    44
1164
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    45
Search for duplicate files.
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    46
===========================
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    47
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    48
This utilities only search for duplicate files:
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    49
2228
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2093
diff changeset
    50
http://duff.sourceforge.net/
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2093
diff changeset
    51
  duff home page
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2093
diff changeset
    52
http://freedup.org/
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2093
diff changeset
    53
  freedup home page
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2093
diff changeset
    54
http://dupedit.com/
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2093
diff changeset
    55
  dupedit home page
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2093
diff changeset
    56
http://rdfind.pauldreik.se/
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2093
diff changeset
    57
  Rdfind home page
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2093
diff changeset
    58
http://code.google.com/p/softenido/wiki/FindRepe
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2093
diff changeset
    59
  FindRepe home page
1164
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    60
2016
99083433ec1e Merge similar files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
    61
fdupes
99083433ec1e Merge similar files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
    62
======
1165
b5e09fc4d751 fdupes.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1164
diff changeset
    63
::
b5e09fc4d751 fdupes.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1164
diff changeset
    64
b5e09fc4d751 fdupes.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1164
diff changeset
    65
  $ sudo apt-get install fdupes
b5e09fc4d751 fdupes.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1164
diff changeset
    66
b5e09fc4d751 fdupes.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1164
diff changeset
    67
See:
b5e09fc4d751 fdupes.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1164
diff changeset
    68
2228
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2093
diff changeset
    69
http://code.google.com/p/fdupes/
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2093
diff changeset
    70
  fdupes home page
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2093
diff changeset
    71
http://ru.wikipedia.org/wiki/Fdupes
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2093
diff changeset
    72
  fdupes wiki page
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2093
diff changeset
    73
http://packages.debian.org/search?keywords=fdupes
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2093
diff changeset
    74
  fdupes Debian package
1165
b5e09fc4d751 fdupes.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1164
diff changeset
    75
2016
99083433ec1e Merge similar files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
    76
freedups
99083433ec1e Merge similar files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
    77
========
1164
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    78
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    79
Freedups searches through the directories you specify. When it finds two
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    80
identical files, it hard links them together. Now the two or more files still
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    81
exist in their respective directories, but only one copy of the data is stored
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    82
on disk; both directory entries point to the same data blocks.
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    83
2228
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2093
diff changeset
    84
http://www.stearns.org/freedups/
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2093
diff changeset
    85
  freedups home page
1164
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    86
2016
99083433ec1e Merge similar files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
    87
dupmerge
99083433ec1e Merge similar files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
    88
========
1164
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    89
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    90
Dupmerge reads a list of files from standard input (eg., as produced by "find .
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    91
-print") and looks for identical files. When it finds two or more identical
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    92
files, all but one are unlinked to reclaim the disk space and recreated as hard
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    93
links to the remaining copy.
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    94
2228
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2093
diff changeset
    95
https://sourceforge.net/projects/dupmerge/
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2093
diff changeset
    96
  dupmerge home page
1164
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    97
2016
99083433ec1e Merge similar files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
    98
ssdeep
99083433ec1e Merge similar files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
    99
======
1164
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
   100
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
   101
ssdeep is a program for computing context triggered piecewise hashes (CTPH).
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
   102
Also called fuzzy hashes, CTPH can match inputs that have homologies. Such
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
   103
inputs have sequences of identical bytes in the same order, although bytes in
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
   104
between these sequences may be different in both content and length.
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
   105
2228
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2093
diff changeset
   106
http://ssdeep.sourceforge.net/
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2093
diff changeset
   107
  ssdeep home page
1164
376df9f34507 Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
   108
2016
99083433ec1e Merge similar files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
   109
comparator
99083433ec1e Merge similar files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
   110
==========
99083433ec1e Merge similar files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
   111
99083433ec1e Merge similar files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
   112
Available under Cygwin. Find duplication in source files::
99083433ec1e Merge similar files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
   113
99083433ec1e Merge similar files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
   114
  $ comparator -s 5 $dir1 $dir2
99083433ec1e Merge similar files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
   115