author | Oleksandr Gavenko <gavenkoa@gmail.com> |
Sat, 10 Feb 2018 01:49:07 +0200 | |
changeset 2230 | 9e6ad6607a9e |
parent 2228 | 837f1337c59b |
permissions | -rw-r--r-- |
1164
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
1 |
.. -*- coding: utf-8; -*- |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
2 |
|
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
3 |
============== |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
4 |
Duplication. |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
5 |
============== |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
6 |
.. contents:: |
1905
fba288d59662
Include only local subsections into TOC. This prevent duplication of
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1691
diff
changeset
|
7 |
:local: |
1164
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
8 |
|
1166
3c925325d9d4
Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1165
diff
changeset
|
9 |
Search for duplicate lines. |
3c925325d9d4
Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1165
diff
changeset
|
10 |
=========================== |
3c925325d9d4
Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1165
diff
changeset
|
11 |
|
2093
cef16cb3dded
Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2016
diff
changeset
|
12 |
http://en.wikipedia.org/wiki/Duplicate_code |
cef16cb3dded
Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2016
diff
changeset
|
13 |
Wiki page. |
cef16cb3dded
Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2016
diff
changeset
|
14 |
http://students.cis.uab.edu/tairasr/clones/literature/ |
cef16cb3dded
Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2016
diff
changeset
|
15 |
Code Clones Literature. |
cef16cb3dded
Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2016
diff
changeset
|
16 |
https://moz.com/devblog/near-duplicate-detection/ |
cef16cb3dded
Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2016
diff
changeset
|
17 |
MinHash vs SimHash algorithm explanation. |
1166
3c925325d9d4
Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1165
diff
changeset
|
18 |
|
1182
f0fd5e35e832
Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1181
diff
changeset
|
19 |
Open source or free licence: |
f0fd5e35e832
Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1181
diff
changeset
|
20 |
|
2228
837f1337c59b
Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2093
diff
changeset
|
21 |
* http://duplo.sourceforge.net/ |
837f1337c59b
Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2093
diff
changeset
|
22 |
* http://clonedigger.sourceforge.net/ |
837f1337c59b
Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2093
diff
changeset
|
23 |
* http://www.ccfinder.net/ccfinderxos.html |
1182
f0fd5e35e832
Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1181
diff
changeset
|
24 |
|
f0fd5e35e832
Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1181
diff
changeset
|
25 |
Proprietary or restricted licence: |
f0fd5e35e832
Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1181
diff
changeset
|
26 |
|
2228
837f1337c59b
Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2093
diff
changeset
|
27 |
* http://www.txl.ca/nicaddownload.html |
837f1337c59b
Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2093
diff
changeset
|
28 |
* http://www.harukizaemon.com/simian/index.html |
837f1337c59b
Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2093
diff
changeset
|
29 |
* http://getatomiq.com/ |
837f1337c59b
Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2093
diff
changeset
|
30 |
* http://www.harukizaemon.com/simian/index.html |
1182
f0fd5e35e832
Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1181
diff
changeset
|
31 |
|
2093
cef16cb3dded
Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2016
diff
changeset
|
32 |
http://stackoverflow.com/questions/191614/how-to-detect-code-duplication-during-development |
cef16cb3dded
Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2016
diff
changeset
|
33 |
How to detect code duplication during development? |
cef16cb3dded
Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2016
diff
changeset
|
34 |
https://en.wikipedia.org/wiki/List_of_tools_for_static_code_analysis |
cef16cb3dded
Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2016
diff
changeset
|
35 |
List of tools for static code analysis. |
cef16cb3dded
Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2016
diff
changeset
|
36 |
http://askubuntu.com/questions/434545/identify-duplicate-lines-in-a-file-without-deleting-them |
cef16cb3dded
Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2016
diff
changeset
|
37 |
Identify duplicate lines in a file without deleting them? |
cef16cb3dded
Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2016
diff
changeset
|
38 |
http://stackoverflow.com/questions/13046791/how-to-delete-the-repeat-lines-in-emacs |
cef16cb3dded
Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2016
diff
changeset
|
39 |
How to delete the repeat lines in emacs. |
cef16cb3dded
Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2016
diff
changeset
|
40 |
http://emacs.stackexchange.com/questions/13092/how-can-i-highlight-duplicate-lines |
cef16cb3dded
Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2016
diff
changeset
|
41 |
How can I highlight duplicate lines? |
cef16cb3dded
Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2016
diff
changeset
|
42 |
https://www.emacswiki.org/emacs/DuplicateLines |
cef16cb3dded
Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2016
diff
changeset
|
43 |
Duplicate Lines. |
cef16cb3dded
Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2016
diff
changeset
|
44 |
|
1164
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
45 |
Search for duplicate files. |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
46 |
=========================== |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
47 |
|
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
48 |
This utilities only search for duplicate files: |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
49 |
|
2228
837f1337c59b
Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2093
diff
changeset
|
50 |
http://duff.sourceforge.net/ |
837f1337c59b
Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2093
diff
changeset
|
51 |
duff home page |
837f1337c59b
Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2093
diff
changeset
|
52 |
http://freedup.org/ |
837f1337c59b
Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2093
diff
changeset
|
53 |
freedup home page |
837f1337c59b
Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2093
diff
changeset
|
54 |
http://dupedit.com/ |
837f1337c59b
Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2093
diff
changeset
|
55 |
dupedit home page |
837f1337c59b
Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2093
diff
changeset
|
56 |
http://rdfind.pauldreik.se/ |
837f1337c59b
Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2093
diff
changeset
|
57 |
Rdfind home page |
837f1337c59b
Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2093
diff
changeset
|
58 |
http://code.google.com/p/softenido/wiki/FindRepe |
837f1337c59b
Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2093
diff
changeset
|
59 |
FindRepe home page |
1164
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
60 |
|
2016
99083433ec1e
Merge similar files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1912
diff
changeset
|
61 |
fdupes |
99083433ec1e
Merge similar files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1912
diff
changeset
|
62 |
====== |
1165 | 63 |
:: |
64 |
||
65 |
$ sudo apt-get install fdupes |
|
66 |
||
67 |
See: |
|
68 |
||
2228
837f1337c59b
Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2093
diff
changeset
|
69 |
http://code.google.com/p/fdupes/ |
837f1337c59b
Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2093
diff
changeset
|
70 |
fdupes home page |
837f1337c59b
Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2093
diff
changeset
|
71 |
http://ru.wikipedia.org/wiki/Fdupes |
837f1337c59b
Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2093
diff
changeset
|
72 |
fdupes wiki page |
837f1337c59b
Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2093
diff
changeset
|
73 |
http://packages.debian.org/search?keywords=fdupes |
837f1337c59b
Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2093
diff
changeset
|
74 |
fdupes Debian package |
1165 | 75 |
|
2016
99083433ec1e
Merge similar files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1912
diff
changeset
|
76 |
freedups |
99083433ec1e
Merge similar files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1912
diff
changeset
|
77 |
======== |
1164
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
78 |
|
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
79 |
Freedups searches through the directories you specify. When it finds two |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
80 |
identical files, it hard links them together. Now the two or more files still |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
81 |
exist in their respective directories, but only one copy of the data is stored |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
82 |
on disk; both directory entries point to the same data blocks. |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
83 |
|
2228
837f1337c59b
Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2093
diff
changeset
|
84 |
http://www.stearns.org/freedups/ |
837f1337c59b
Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2093
diff
changeset
|
85 |
freedups home page |
1164
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
86 |
|
2016
99083433ec1e
Merge similar files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1912
diff
changeset
|
87 |
dupmerge |
99083433ec1e
Merge similar files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1912
diff
changeset
|
88 |
======== |
1164
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
89 |
|
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
90 |
Dupmerge reads a list of files from standard input (eg., as produced by "find . |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
91 |
-print") and looks for identical files. When it finds two or more identical |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
92 |
files, all but one are unlinked to reclaim the disk space and recreated as hard |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
93 |
links to the remaining copy. |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
94 |
|
2228
837f1337c59b
Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2093
diff
changeset
|
95 |
https://sourceforge.net/projects/dupmerge/ |
837f1337c59b
Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2093
diff
changeset
|
96 |
dupmerge home page |
1164
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
97 |
|
2016
99083433ec1e
Merge similar files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1912
diff
changeset
|
98 |
ssdeep |
99083433ec1e
Merge similar files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1912
diff
changeset
|
99 |
====== |
1164
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
100 |
|
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
101 |
ssdeep is a program for computing context triggered piecewise hashes (CTPH). |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
102 |
Also called fuzzy hashes, CTPH can match inputs that have homologies. Such |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
103 |
inputs have sequences of identical bytes in the same order, although bytes in |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
104 |
between these sequences may be different in both content and length. |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
105 |
|
2228
837f1337c59b
Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2093
diff
changeset
|
106 |
http://ssdeep.sourceforge.net/ |
837f1337c59b
Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
2093
diff
changeset
|
107 |
ssdeep home page |
1164
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
108 |
|
2016
99083433ec1e
Merge similar files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1912
diff
changeset
|
109 |
comparator |
99083433ec1e
Merge similar files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1912
diff
changeset
|
110 |
========== |
99083433ec1e
Merge similar files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1912
diff
changeset
|
111 |
|
99083433ec1e
Merge similar files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1912
diff
changeset
|
112 |
Available under Cygwin. Find duplication in source files:: |
99083433ec1e
Merge similar files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1912
diff
changeset
|
113 |
|
99083433ec1e
Merge similar files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1912
diff
changeset
|
114 |
$ comparator -s 5 $dir1 $dir2 |
99083433ec1e
Merge similar files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1912
diff
changeset
|
115 |