author | Oleksandr Gavenko <gavenkoa@gmail.com> |
Wed, 05 Aug 2015 23:55:34 +0300 | |
changeset 1729 | 22ffd80639c0 |
parent 1691 | 7eeecad00b74 |
child 1905 | fba288d59662 |
permissions | -rw-r--r-- |
1164
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
1 |
.. -*- coding: utf-8; -*- |
1334
9bf0d5a1f0cf
Include common header with quick links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1182
diff
changeset
|
2 |
.. include:: HEADER.rst |
1164
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
3 |
|
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
4 |
============== |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
5 |
Duplication. |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
6 |
============== |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
7 |
.. contents:: |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
8 |
|
1166
3c925325d9d4
Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1165
diff
changeset
|
9 |
Search for duplicate lines. |
3c925325d9d4
Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1165
diff
changeset
|
10 |
=========================== |
3c925325d9d4
Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1165
diff
changeset
|
11 |
|
3c925325d9d4
Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1165
diff
changeset
|
12 |
http://en.wikipedia.org/wiki/Duplicate_code |
3c925325d9d4
Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1165
diff
changeset
|
13 |
wiki page |
1181
c3854c05f00d
Code Clones Literature
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1166
diff
changeset
|
14 |
http://students.cis.uab.edu/tairasr/clones/literature/ |
c3854c05f00d
Code Clones Literature
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1166
diff
changeset
|
15 |
Code Clones Literature |
1166
3c925325d9d4
Search for duplicate lines.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1165
diff
changeset
|
16 |
|
1182
f0fd5e35e832
Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1181
diff
changeset
|
17 |
Open source or free licence: |
f0fd5e35e832
Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1181
diff
changeset
|
18 |
|
f0fd5e35e832
Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1181
diff
changeset
|
19 |
* http://duplo.sourceforge.net/ |
f0fd5e35e832
Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1181
diff
changeset
|
20 |
* http://clonedigger.sourceforge.net/ |
f0fd5e35e832
Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1181
diff
changeset
|
21 |
* http://www.ccfinder.net/ccfinderxos.html |
f0fd5e35e832
Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1181
diff
changeset
|
22 |
|
f0fd5e35e832
Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1181
diff
changeset
|
23 |
Proprietary or restricted licence: |
f0fd5e35e832
Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1181
diff
changeset
|
24 |
|
f0fd5e35e832
Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1181
diff
changeset
|
25 |
* http://www.txl.ca/nicaddownload.html |
f0fd5e35e832
Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1181
diff
changeset
|
26 |
* http://www.harukizaemon.com/simian/index.html |
f0fd5e35e832
Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1181
diff
changeset
|
27 |
* http://getatomiq.com/ |
f0fd5e35e832
Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1181
diff
changeset
|
28 |
* http://www.harukizaemon.com/simian/index.html |
f0fd5e35e832
Open source or free licence:
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1181
diff
changeset
|
29 |
|
1164
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
30 |
Search for duplicate files. |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
31 |
=========================== |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
32 |
|
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
33 |
This utilities only search for duplicate files: |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
34 |
|
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
35 |
http://duff.sourceforge.net/ |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
36 |
duff home page |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
37 |
http://freedup.org/ |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
38 |
freedup home page |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
39 |
http://dupedit.com/ |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
40 |
dupedit home page |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
41 |
http://rdfind.pauldreik.se/ |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
42 |
Rdfind home page |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
43 |
http://code.google.com/p/softenido/wiki/FindRepe |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
44 |
FindRepe home page |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
45 |
|
1165 | 46 |
fdupes. |
47 |
======= |
|
48 |
:: |
|
49 |
||
50 |
$ sudo apt-get install fdupes |
|
51 |
||
52 |
See: |
|
53 |
||
54 |
http://code.google.com/p/fdupes/ |
|
55 |
fdupes home page |
|
56 |
http://ru.wikipedia.org/wiki/Fdupes |
|
57 |
fdupes wiki page |
|
58 |
http://packages.debian.org/search?keywords=fdupes |
|
59 |
fdupes Debian package |
|
60 |
||
1164
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
61 |
freedups. |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
62 |
--------- |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
63 |
|
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
64 |
Freedups searches through the directories you specify. When it finds two |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
65 |
identical files, it hard links them together. Now the two or more files still |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
66 |
exist in their respective directories, but only one copy of the data is stored |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
67 |
on disk; both directory entries point to the same data blocks. |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
68 |
|
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
69 |
http://www.stearns.org/freedups/ |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
70 |
freedups home page |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
71 |
|
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
72 |
dupmerge. |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
73 |
--------- |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
74 |
|
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
75 |
Dupmerge reads a list of files from standard input (eg., as produced by "find . |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
76 |
-print") and looks for identical files. When it finds two or more identical |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
77 |
files, all but one are unlinked to reclaim the disk space and recreated as hard |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
78 |
links to the remaining copy. |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
79 |
|
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
80 |
https://sourceforge.net/projects/dupmerge/ |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
81 |
dupmerge home page |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
82 |
|
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
83 |
ssdeep. |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
84 |
------- |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
85 |
|
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
86 |
ssdeep is a program for computing context triggered piecewise hashes (CTPH). |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
87 |
Also called fuzzy hashes, CTPH can match inputs that have homologies. Such |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
88 |
inputs have sequences of identical bytes in the same order, although bytes in |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
89 |
between these sequences may be different in both content and length. |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
90 |
|
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
91 |
http://ssdeep.sourceforge.net/ |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
92 |
ssdeep home page |
376df9f34507
Search for duplicate files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
93 |