author | Oleksandr Gavenko <gavenkoa@gmail.com> |
Mon, 27 Feb 2023 00:55:27 +0200 | |
changeset 1342 | d6413e1d20b0 |
parent 1233 | 26eb35750bab |
child 1347 | 272ec25b6f12 |
permissions | -rw-r--r-- |
243
deede3c3386f
Add coding to RST files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
242
diff
changeset
|
1 |
.. -*- coding: utf-8 -*- |
206
407f2a82ef26
Include common header for quick links. Exclude unnecessary .html files from build.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
109
diff
changeset
|
2 |
|
342
e3d85aeefdec
Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
339
diff
changeset
|
3 |
====================== |
e3d85aeefdec
Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
339
diff
changeset
|
4 |
gadict HACKING guide |
e3d85aeefdec
Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
339
diff
changeset
|
5 |
====================== |
69 | 6 |
.. contents:: |
301
1439e072640a
Remove CSS hack that suppress displaying document name in TOC by
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
252
diff
changeset
|
7 |
:local: |
1439e072640a
Remove CSS hack that suppress displaying document name in TOC by
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
252
diff
changeset
|
8 |
|
342
e3d85aeefdec
Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
339
diff
changeset
|
9 |
Versioning rules |
e3d85aeefdec
Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
339
diff
changeset
|
10 |
================ |
231
f993fc31e03f
Move versioning rules to proper file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
211
diff
changeset
|
11 |
|
f993fc31e03f
Move versioning rules to proper file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
211
diff
changeset
|
12 |
We use **major.minor** schema. |
f993fc31e03f
Move versioning rules to proper file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
211
diff
changeset
|
13 |
|
f993fc31e03f
Move versioning rules to proper file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
211
diff
changeset
|
14 |
Until we reach 5000 words **major** is 0. **minor** updated from time to time. |
f993fc31e03f
Move versioning rules to proper file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
211
diff
changeset
|
15 |
|
342
e3d85aeefdec
Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
339
diff
changeset
|
16 |
Getting sources |
e3d85aeefdec
Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
339
diff
changeset
|
17 |
=============== |
232
81bfc95bd853
Move getting sources to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
231
diff
changeset
|
18 |
|
338
61a9d2de0e3e
New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
301
diff
changeset
|
19 |
Cloning repository:: |
232
81bfc95bd853
Move getting sources to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
231
diff
changeset
|
20 |
|
338
61a9d2de0e3e
New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
301
diff
changeset
|
21 |
$ hg clone http://hg.defun.work/gadict gadict |
232
81bfc95bd853
Move getting sources to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
231
diff
changeset
|
22 |
$ hg clone http://hg.code.sf.net/p/gadict/code gadict-hg |
81bfc95bd853
Move getting sources to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
231
diff
changeset
|
23 |
|
338
61a9d2de0e3e
New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
301
diff
changeset
|
24 |
Pushing changes:: |
232
81bfc95bd853
Move getting sources to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
231
diff
changeset
|
25 |
|
338
61a9d2de0e3e
New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
301
diff
changeset
|
26 |
$ hg push ssh://$USER@hg.defun.work/gadict |
61a9d2de0e3e
New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
301
diff
changeset
|
27 |
$ hg push ssh://$USER@hg.code.sf.net/p/gadict/code |
61a9d2de0e3e
New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
301
diff
changeset
|
28 |
$ hg push https://$USER:$PASS@hg.code.sf.net/p/gadict/code |
232
81bfc95bd853
Move getting sources to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
231
diff
changeset
|
29 |
|
342
e3d85aeefdec
Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
339
diff
changeset
|
30 |
Browsing sources online |
e3d85aeefdec
Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
339
diff
changeset
|
31 |
======================= |
232
81bfc95bd853
Move getting sources to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
231
diff
changeset
|
32 |
|
338
61a9d2de0e3e
New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
301
diff
changeset
|
33 |
http://hg.defun.work/gadict |
61a9d2de0e3e
New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
301
diff
changeset
|
34 |
hgweb at home page. |
232
81bfc95bd853
Move getting sources to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
231
diff
changeset
|
35 |
http://hg.code.sf.net/p/gadict/code |
338
61a9d2de0e3e
New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
301
diff
changeset
|
36 |
hgweb at old home page (but supported as mirror). |
232
81bfc95bd853
Move getting sources to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
231
diff
changeset
|
37 |
https://sourceforge.net/p/gadict/code/ |
338
61a9d2de0e3e
New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
301
diff
changeset
|
38 |
Sourceforge Allure interface (not primary, a mirror). |
232
81bfc95bd853
Move getting sources to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
231
diff
changeset
|
39 |
|
875
ebba66b977b6
Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
851
diff
changeset
|
40 |
Building project |
ebba66b977b6
Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
851
diff
changeset
|
41 |
================ |
ebba66b977b6
Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
851
diff
changeset
|
42 |
|
ebba66b977b6
Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
851
diff
changeset
|
43 |
``gadict`` project provides dictionaries encoded in custom format. In order to |
ebba66b977b6
Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
851
diff
changeset
|
44 |
precess them you need GNU Make and Python 2.7 and possibly other tools. |
ebba66b977b6
Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
851
diff
changeset
|
45 |
|
ebba66b977b6
Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
851
diff
changeset
|
46 |
To produce dictionaries in ``dictd`` format you need to install ``dictd`` |
1160
d1c76b72e9d6
Docs: building project.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1130
diff
changeset
|
47 |
dictribution with ``dictfmt`` and ``dictzip`` utilities:: |
d1c76b72e9d6
Docs: building project.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1130
diff
changeset
|
48 |
|
d1c76b72e9d6
Docs: building project.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1130
diff
changeset
|
49 |
sudo apt install dictfmt dictzip |
d1c76b72e9d6
Docs: building project.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1130
diff
changeset
|
50 |
|
d1c76b72e9d6
Docs: building project.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1130
diff
changeset
|
51 |
and run:: |
875
ebba66b977b6
Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
851
diff
changeset
|
52 |
|
ebba66b977b6
Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
851
diff
changeset
|
53 |
$ make dict |
ebba66b977b6
Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
851
diff
changeset
|
54 |
|
1008
dc218f2b784d
Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
932
diff
changeset
|
55 |
To make Anki decks checkout Anki sources:: |
875
ebba66b977b6
Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
851
diff
changeset
|
56 |
|
ebba66b977b6
Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
851
diff
changeset
|
57 |
$ git clone https://github.com/dae/anki.git |
878
691dafb44619
Corrected instruction sequence.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
875
diff
changeset
|
58 |
$ cd anki |
1008
dc218f2b784d
Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
932
diff
changeset
|
59 |
|
dc218f2b784d
Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
932
diff
changeset
|
60 |
and update to specific revision (before strong dependency to ``pyaudio`` which |
dc218f2b784d
Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
932
diff
changeset
|
61 |
is not available on Cygwin):: |
dc218f2b784d
Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
932
diff
changeset
|
62 |
|
dc218f2b784d
Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
932
diff
changeset
|
63 |
$ git co 1d75cff5e7458c6538a4e75728c16bef8b7adb3e^ |
dc218f2b784d
Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
932
diff
changeset
|
64 |
|
dc218f2b784d
Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
932
diff
changeset
|
65 |
$ git show 1d75cff5e7458c6538a4e75728c16bef8b7adb3e |
dc218f2b784d
Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
932
diff
changeset
|
66 |
commit 1d75cff5e7458c6538a4e75728c16bef8b7adb3e |
dc218f2b784d
Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
932
diff
changeset
|
67 |
Author: Damien Elmes <git@ichi2.net> |
dc218f2b784d
Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
932
diff
changeset
|
68 |
Date: 2016-06-23 12:04:48 +1000 |
dc218f2b784d
Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
932
diff
changeset
|
69 |
|
dc218f2b784d
Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
932
diff
changeset
|
70 |
pyaudio is no longer optional |
dc218f2b784d
Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
932
diff
changeset
|
71 |
|
dc218f2b784d
Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
932
diff
changeset
|
72 |
Previously build uses Python 2 and depends on earlier source revitions (before |
dc218f2b784d
Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
932
diff
changeset
|
73 |
port to Python 3):: |
dc218f2b784d
Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
932
diff
changeset
|
74 |
|
875
ebba66b977b6
Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
851
diff
changeset
|
75 |
$ git co 15b349e3^ |
ebba66b977b6
Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
851
diff
changeset
|
76 |
|
932
2540e72ce603
How details about change where Anki switched to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
905
diff
changeset
|
77 |
$ git show 15b349e3 |
2540e72ce603
How details about change where Anki switched to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
905
diff
changeset
|
78 |
commit 15b349e3a8b34bf80c134b406c9b90f61250ee9e |
2540e72ce603
How details about change where Anki switched to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
905
diff
changeset
|
79 |
Author: Damien Elmes <git@ichi2.net> |
2540e72ce603
How details about change where Anki switched to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
905
diff
changeset
|
80 |
Date: 2016-05-12 14:45:35 +1000 |
2540e72ce603
How details about change where Anki switched to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
905
diff
changeset
|
81 |
|
2540e72ce603
How details about change where Anki switched to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
905
diff
changeset
|
82 |
start port to python 3 |
2540e72ce603
How details about change where Anki switched to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
905
diff
changeset
|
83 |
|
875
ebba66b977b6
Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
851
diff
changeset
|
84 |
and put path to Anki project source dir inside ``Makefile.config``:: |
ebba66b977b6
Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
851
diff
changeset
|
85 |
|
ebba66b977b6
Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
851
diff
changeset
|
86 |
ANKI_PY_DIR := $(HOME)/devel/anki |
ebba66b977b6
Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
851
diff
changeset
|
87 |
|
ebba66b977b6
Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
851
diff
changeset
|
88 |
Build command to make Anki deks is:: |
ebba66b977b6
Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
851
diff
changeset
|
89 |
|
ebba66b977b6
Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
851
diff
changeset
|
90 |
$ make anki |
ebba66b977b6
Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
851
diff
changeset
|
91 |
|
1010
33d4f850b576
Alternative Anki generators.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1008
diff
changeset
|
92 |
Alternative Anki generators |
33d4f850b576
Alternative Anki generators.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1008
diff
changeset
|
93 |
=========================== |
33d4f850b576
Alternative Anki generators.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1008
diff
changeset
|
94 |
|
33d4f850b576
Alternative Anki generators.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1008
diff
changeset
|
95 |
https://github.com/kerrickstaley/genanki |
33d4f850b576
Alternative Anki generators.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1008
diff
changeset
|
96 |
A Library for Generating Anki Decks. |
1227
d186960bd478
Added link https://github.com/lervag/apy
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1204
diff
changeset
|
97 |
https://github.com/lervag/apy |
d186960bd478
Added link https://github.com/lervag/apy
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1204
diff
changeset
|
98 |
CLI script for interacting with local Anki collection. |
1010
33d4f850b576
Alternative Anki generators.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1008
diff
changeset
|
99 |
https://github.com/damaru2/ankigenbot/blob/master/src/send_card.py |
33d4f850b576
Alternative Anki generators.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1008
diff
changeset
|
100 |
Pushes cards to https://ankiweb.net |
33d4f850b576
Alternative Anki generators.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1008
diff
changeset
|
101 |
|
348
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
102 |
Dictionary source file format |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
103 |
============================= |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
104 |
|
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
105 |
gadict project uses dictd C5 source file format in the past. C5 format have |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
106 |
several issues: |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
107 |
|
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
108 |
* C5 is not structural format. So producing another forms and conversion to |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
109 |
other formats is not possible. |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
110 |
* C5 have no markup for links neither for any other markups. |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
111 |
|
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
112 |
Before that project used dictd TAB file format which require placing article in |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
113 |
a single long line. That format is not for human editing at all. |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
114 |
|
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
115 |
Other dictionary source file formats are considered as choice, like TEI, ISO, |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
116 |
xdxf, MDF. XML like formats also are not for human editing. Also XML lack of |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
117 |
syntax locality and full file should be scanned to validate local changes... |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
118 |
|
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
119 |
Note that StarDict, AbbyLinguo, Babylon, dictd formats are not considered |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
120 |
because they all about a presentation but not a structure. They are target |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
121 |
formats for compilation. |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
122 |
|
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
123 |
Fancy looking analog to MDF + C5 was developed. |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
124 |
|
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
125 |
Beginning of file describe dictionary information. |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
126 |
|
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
127 |
Each article separated by ``\n__\n\n`` and consists of two parts: |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
128 |
|
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
129 |
* word variations with pronunciation |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
130 |
* word translations, with supplementary information like part of speach, |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
131 |
synonyms, antonyms, example of usage |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
132 |
|
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
133 |
*Word variation* are: |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
134 |
|
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
135 |
* *singularity* or *number*: ``s`` - single, ``pl`` - plural. |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
136 |
* *verb voice* or *verb tense*: ``v1`` - infinitive, ``v2`` - past tense, |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
137 |
``v3`` past participle tense. |
565
ac68f2680ea0
Add syntax to add related words. Add separators between ant/syn/rel in
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
542
diff
changeset
|
138 |
* *gender*: ``male`` or ``female``. |
ac68f2680ea0
Add syntax to add related words. Add separators between ant/syn/rel in
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
542
diff
changeset
|
139 |
* *comparison*: ``comp`` - comparative or ``super`` - superlative. |
348
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
140 |
|
903
3bbe249dae47
Explain order of work relations.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
878
diff
changeset
|
141 |
*Parts of speech* (ordered by preference): |
348
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
142 |
|
634
4f97d314c5e5
I try to keep word meanings in article in above POS order.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
618
diff
changeset
|
143 |
* ``v`` - verb |
348
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
144 |
* ``n`` - noun |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
145 |
* ``pron`` - pronoun |
634
4f97d314c5e5
I try to keep word meanings in article in above POS order.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
618
diff
changeset
|
146 |
* ``adv`` - adverb |
348
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
147 |
* ``adj`` - adjective |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
148 |
* ``prep`` - preposition |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
149 |
* ``conj`` - conjunction |
376 | 150 |
* ``num`` - numeral |
348
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
151 |
* ``int`` - interjection |
419
7dd3273d92c7
Special markers with roles same as for parts of speech.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
411
diff
changeset
|
152 |
* ``abbr`` - abbreviation |
7dd3273d92c7
Special markers with roles same as for parts of speech.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
411
diff
changeset
|
153 |
* ``phr`` - phrase |
7dd3273d92c7
Special markers with roles same as for parts of speech.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
411
diff
changeset
|
154 |
* ``phr.v`` - phrasal verb |
542
b5197c70972c
Add commonly used contractions.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
527
diff
changeset
|
155 |
* ``contr`` - contraction |
411
2fac252890a5
Document that prefix is kind of pos.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
376
diff
changeset
|
156 |
* ``prefix`` - word prefix |
348
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
157 |
|
634
4f97d314c5e5
I try to keep word meanings in article in above POS order.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
618
diff
changeset
|
158 |
.. note:: I try to keep word meanings in article in above POS order. |
4f97d314c5e5
I try to keep word meanings in article in above POS order.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
618
diff
changeset
|
159 |
|
348
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
160 |
Each meaning may refer to topics, like: |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
161 |
|
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
162 |
* ``sci`` - about science |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
163 |
* ``body`` - part of body |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
164 |
* ``math`` - mathematics |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
165 |
* ``chem`` - chemicals |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
166 |
* ``bio`` - biology |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
167 |
* ``music`` |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
168 |
* ``meal``, ``office``, etc |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
169 |
* ``size``, ``shape``, ``age``, ``color`` |
618
6ad7203ac9dc
Add support for hypernyms and hyponyms.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
566
diff
changeset
|
170 |
* ``archaic`` - old fashioned, no longer used |
348
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
171 |
|
903
3bbe249dae47
Explain order of work relations.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
878
diff
changeset
|
172 |
*Word relation* (ordered by preference): |
618
6ad7203ac9dc
Add support for hypernyms and hyponyms.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
566
diff
changeset
|
173 |
|
903
3bbe249dae47
Explain order of work relations.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
878
diff
changeset
|
174 |
* ``topic:`` - topics/tags |
3bbe249dae47
Explain order of work relations.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
878
diff
changeset
|
175 |
* ``ant:`` - antonyms |
618
6ad7203ac9dc
Add support for hypernyms and hyponyms.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
566
diff
changeset
|
176 |
* ``syn:`` - synonyms |
6ad7203ac9dc
Add support for hypernyms and hyponyms.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
566
diff
changeset
|
177 |
* ``hyper:`` - hypernyms |
6ad7203ac9dc
Add support for hypernyms and hyponyms.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
566
diff
changeset
|
178 |
* ``hypo:`` - hyponyms |
6ad7203ac9dc
Add support for hypernyms and hyponyms.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
566
diff
changeset
|
179 |
* ``rel:`` - related (see also) terms |
348
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
180 |
|
566
0bba61492c37
Add syntax for glossary/explanation.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
565
diff
changeset
|
181 |
Translation marked by lowercase ISO 639-1 code with ``:`` (colon) character, |
0bba61492c37
Add syntax for glossary/explanation.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
565
diff
changeset
|
182 |
like: |
360
cb0b59398e25
Add Latin language tag example.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
348
diff
changeset
|
183 |
|
cb0b59398e25
Add Latin language tag example.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
348
diff
changeset
|
184 |
* ``en:`` - English |
cb0b59398e25
Add Latin language tag example.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
348
diff
changeset
|
185 |
* ``ru:`` - Russian |
cb0b59398e25
Add Latin language tag example.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
348
diff
changeset
|
186 |
* ``uk:`` - Ukrainian |
cb0b59398e25
Add Latin language tag example.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
348
diff
changeset
|
187 |
* ``la:`` - Latin |
348
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
188 |
|
566
0bba61492c37
Add syntax for glossary/explanation.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
565
diff
changeset
|
189 |
Example marked by lowercase ISO 639-1 code with ``>`` (greater) character. |
0bba61492c37
Add syntax for glossary/explanation.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
565
diff
changeset
|
190 |
|
905 | 191 |
Explanation or glossary are marked by lowercase ISO 639-1 code with ``=`` |
192 |
(equal) character. |
|
566
0bba61492c37
Add syntax for glossary/explanation.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
565
diff
changeset
|
193 |
|
527
0a31299fad70
Add support for country spelling variant.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
419
diff
changeset
|
194 |
Pronunciation variants marked by: |
0a31299fad70
Add support for country spelling variant.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
419
diff
changeset
|
195 |
|
0a31299fad70
Add support for country spelling variant.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
419
diff
changeset
|
196 |
* ``Am`` - American |
0a31299fad70
Add support for country spelling variant.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
419
diff
changeset
|
197 |
* ``Br`` - Great Britain |
0a31299fad70
Add support for country spelling variant.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
419
diff
changeset
|
198 |
* ``Au`` - Australian |
348
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
199 |
|
647
6ae5399c8087
Add ``rare`` attribute to headword to filter low frequency headwords out from
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
642
diff
changeset
|
200 |
``rare`` attribute to first headword used as marker that word has low frequency. |
6ae5399c8087
Add ``rare`` attribute to headword to filter low frequency headwords out from
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
642
diff
changeset
|
201 |
SRS file writers skip entries marked as ``rare``. I found it convenient to check |
6ae5399c8087
Add ``rare`` attribute to headword to filter low frequency headwords out from
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
642
diff
changeset
|
202 |
frequency with: |
6ae5399c8087
Add ``rare`` attribute to headword to filter low frequency headwords out from
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
642
diff
changeset
|
203 |
|
6ae5399c8087
Add ``rare`` attribute to headword to filter low frequency headwords out from
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
642
diff
changeset
|
204 |
https://books.google.com/ngrams/ |
6ae5399c8087
Add ``rare`` attribute to headword to filter low frequency headwords out from
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
642
diff
changeset
|
205 |
Google N-grams from books 1800-2010. |
6ae5399c8087
Add ``rare`` attribute to headword to filter low frequency headwords out from
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
642
diff
changeset
|
206 |
|
6ae5399c8087
Add ``rare`` attribute to headword to filter low frequency headwords out from
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
642
diff
changeset
|
207 |
For cut-off point I chose ``beseech`` word. All less frequent words receive |
6ae5399c8087
Add ``rare`` attribute to headword to filter low frequency headwords out from
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
642
diff
changeset
|
208 |
``rare`` marker. |
6ae5399c8087
Add ``rare`` attribute to headword to filter low frequency headwords out from
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
642
diff
changeset
|
209 |
|
1204
ad00658fcd00
Docs: gaphrase & gadialog file formats.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1160
diff
changeset
|
210 |
gaphrase & gadialog file formats |
ad00658fcd00
Docs: gaphrase & gadialog file formats.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1160
diff
changeset
|
211 |
================================ |
ad00658fcd00
Docs: gaphrase & gadialog file formats.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1160
diff
changeset
|
212 |
|
ad00658fcd00
Docs: gaphrase & gadialog file formats.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1160
diff
changeset
|
213 |
``gaphrase`` & ``gadialog`` files keeps data for generating one side Anki cards. |
ad00658fcd00
Docs: gaphrase & gadialog file formats.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1160
diff
changeset
|
214 |
|
ad00658fcd00
Docs: gaphrase & gadialog file formats.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1160
diff
changeset
|
215 |
Both use same numbering schema that allows to merge updated articles with |
ad00658fcd00
Docs: gaphrase & gadialog file formats.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1160
diff
changeset
|
216 |
original without losing learning progress: |
ad00658fcd00
Docs: gaphrase & gadialog file formats.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1160
diff
changeset
|
217 |
|
ad00658fcd00
Docs: gaphrase & gadialog file formats.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1160
diff
changeset
|
218 |
* First line of file starts with ``## NUM`` - to keep track latest used number. |
ad00658fcd00
Docs: gaphrase & gadialog file formats.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1160
diff
changeset
|
219 |
* Aticles are separated by number line with format ``# NUM``. |
ad00658fcd00
Docs: gaphrase & gadialog file formats.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1160
diff
changeset
|
220 |
|
ad00658fcd00
Docs: gaphrase & gadialog file formats.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1160
diff
changeset
|
221 |
``gadialog`` additionally maintains dialog, each part is marked by line starting |
ad00658fcd00
Docs: gaphrase & gadialog file formats.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1160
diff
changeset
|
222 |
with ``- TEXT``. |
ad00658fcd00
Docs: gaphrase & gadialog file formats.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1160
diff
changeset
|
223 |
|
345
ca5a7d9e7a4b
Reason for switching to another dictionary source format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
344
diff
changeset
|
224 |
C5 dictionary source file format |
ca5a7d9e7a4b
Reason for switching to another dictionary source format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
344
diff
changeset
|
225 |
================================ |
233
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
226 |
|
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
227 |
For source file format used dictd C5 file format. See:: |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
228 |
|
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
229 |
$ man 1 dictfmt |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
230 |
|
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
231 |
Shortly: |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
232 |
|
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
233 |
* Headwords was preceded by 5 or more underscore characters ``_`` and a blank |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
234 |
line. |
345
ca5a7d9e7a4b
Reason for switching to another dictionary source format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
344
diff
changeset
|
235 |
* Article may have several headwords, in that case they are placed in one line |
ca5a7d9e7a4b
Reason for switching to another dictionary source format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
344
diff
changeset
|
236 |
and separated by ``;<SPACE>``. |
ca5a7d9e7a4b
Reason for switching to another dictionary source format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
344
diff
changeset
|
237 |
* All text until the next headword is considered as the definition. |
338
61a9d2de0e3e
New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
301
diff
changeset
|
238 |
* Any leading ``@`` characters are stripped out, but the file is otherwise |
61a9d2de0e3e
New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
301
diff
changeset
|
239 |
unchanged. |
345
ca5a7d9e7a4b
Reason for switching to another dictionary source format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
344
diff
changeset
|
240 |
* UTF-8 encoding is supported at least by Goldendict. |
233
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
241 |
|
348
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
242 |
gadict project used C5 format in the past but switched to own format. |
46
86c0184efac7
Comment syntax convention.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
44
diff
changeset
|
243 |
|
346
738da7eddaca
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
345
diff
changeset
|
244 |
TODO convention |
738da7eddaca
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
345
diff
changeset
|
245 |
=============== |
738da7eddaca
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
345
diff
changeset
|
246 |
|
738da7eddaca
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
345
diff
changeset
|
247 |
Entries or parts of text that was not completed marked by keywords: |
738da7eddaca
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
345
diff
changeset
|
248 |
|
738da7eddaca
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
345
diff
changeset
|
249 |
TODO |
738da7eddaca
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
345
diff
changeset
|
250 |
incomplete |
738da7eddaca
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
345
diff
changeset
|
251 |
XXX |
738da7eddaca
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
345
diff
changeset
|
252 |
urgent incomplete |
738da7eddaca
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
345
diff
changeset
|
253 |
|
738da7eddaca
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
345
diff
changeset
|
254 |
Makefile rules ``todo`` find this occurrence in sources:: |
738da7eddaca
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
345
diff
changeset
|
255 |
|
738da7eddaca
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
345
diff
changeset
|
256 |
$ make todo |
738da7eddaca
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
345
diff
changeset
|
257 |
|
342
e3d85aeefdec
Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
339
diff
changeset
|
258 |
World wide dictionary formats and standards |
e3d85aeefdec
Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
339
diff
changeset
|
259 |
=========================================== |
233
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
260 |
|
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
261 |
http://en.wikipedia.org/wiki/Dictionary_writing_system |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
262 |
Dictionary writing system |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
263 |
http://www.sil.org/computing/shoebox/mdf.html |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
264 |
Multi-Dictionary Formatter (MDF). It defines about 100 data |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
265 |
field markers. |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
266 |
http://fieldworks.sil.org/flex/ |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
267 |
FieldWorks Language Explorer (or FLEx, for short) is designed |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
268 |
to help field linguists perform many common language |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
269 |
documentation and analysis tasks. |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
270 |
http://code.google.com/p/lift-standard/ |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
271 |
LIFT (Lexicon Interchange FormaT) is an XML format for storing |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
272 |
lexical information, as used in the creation of dictionaries. |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
273 |
It's not necessarily the format for your lexicon. |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
274 |
http://www.lexiquepro.com/ |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
275 |
Lexique Pro is an interactive lexicon viewer and editor, with |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
276 |
hyperlinks between entries, category views, dictionary |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
277 |
reversal, search, and export tools. It's designed to display |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
278 |
your data in a user-friendly format so you can distribute it |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
279 |
to others. |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
280 |
http://deb.fi.muni.cz/index.php |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
281 |
DEBII — Dictionary Editor and Browser |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
282 |
|
814
32541770fadd
Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
813
diff
changeset
|
283 |
Linguistic sources |
32541770fadd
Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
813
diff
changeset
|
284 |
================== |
32541770fadd
Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
813
diff
changeset
|
285 |
|
32541770fadd
Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
813
diff
changeset
|
286 |
Ukrainian linguistics corpora |
32541770fadd
Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
813
diff
changeset
|
287 |
----------------------------- |
32541770fadd
Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
813
diff
changeset
|
288 |
|
32541770fadd
Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
813
diff
changeset
|
289 |
**National corpus of Russian language**. There is parallel Russian-Ukrainian |
32541770fadd
Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
813
diff
changeset
|
290 |
texts. Search by keywords, grammatical function, thesaurus properties and other |
32541770fadd
Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
813
diff
changeset
|
291 |
properties. |
32541770fadd
Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
813
diff
changeset
|
292 |
|
32541770fadd
Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
813
diff
changeset
|
293 |
http://www.ruscorpora.ru/search-para-uk.html |
32541770fadd
Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
813
diff
changeset
|
294 |
Page for querying online. |
32541770fadd
Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
813
diff
changeset
|
295 |
|
32541770fadd
Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
813
diff
changeset
|
296 |
**Corpus of mova.info project**. Thtere are literal search and search by word |
32541770fadd
Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
813
diff
changeset
|
297 |
family. |
32541770fadd
Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
813
diff
changeset
|
298 |
|
32541770fadd
Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
813
diff
changeset
|
299 |
http://www.mova.info/corpus.aspx |
32541770fadd
Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
813
diff
changeset
|
300 |
Page for querying online. |
32541770fadd
Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
813
diff
changeset
|
301 |
|
635
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
302 |
Word lists |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
303 |
========== |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
304 |
|
636
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
305 |
Frequency wordlists use several statistics: |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
306 |
|
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
307 |
* number of word occurrences in corpus, usually marked by ``F`` |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
308 |
* adjusted number of occurrences per 1.000.000 in corpus, usually marked by |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
309 |
``U`` |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
310 |
* Standard Frequency Index (SFI) is a: |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
311 |
|
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
312 |
.. math:: SFI = 40 + 10 * log_10(U) |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
313 |
|
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
314 |
=== ================ |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
315 |
SFI Freq |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
316 |
=== ================ |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
317 |
90 1 per 10 |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
318 |
80 1 per 100 |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
319 |
70 1 per 1000 |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
320 |
60 1 per 10.000 |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
321 |
50 1 per 100.000 |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
322 |
40 1 per 1.000.000 |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
323 |
30 1 per 10.000.000 |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
324 |
=== ================ |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
325 |
* deviation of word frequency across documents in corpus, usually marked by |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
326 |
``D`` |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
327 |
|
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
328 |
Sorting numerically on first= column:: |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
329 |
|
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
330 |
$ sort -k 1nr,2 <$IN >$OUT |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
331 |
|
1233 | 332 |
https://www.wordandphrase.info/frequencyList.asp |
333 |
Word frequency info based on COCA. |
|
334 |
https://www.english-corpora.org/coca/ |
|
335 |
COCA corpus with word frequency info. |
|
336 |
||
635
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
337 |
OANC frequency wordlist |
636
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
338 |
----------------------- |
635
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
339 |
|
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
340 |
The Open American National Corpus (OANC) is a roughly 15 million word subset of |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
341 |
the ANC Second Release that is unrestricted in terms of usage and |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
342 |
redistribution. |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
343 |
|
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
344 |
I've got OANC from link: http://www.anc.org/OANC/OANC-1.0.1-UTF8.zip |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
345 |
|
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
346 |
After unpacking only ``.txt`` files:: |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
347 |
|
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
348 |
$ unzip OANC-1.0.1-UTF8.zip '*.txt' |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
349 |
$ cd OANC; find . -type f | xargs cat | wc |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
350 |
2090929 14586935 96737202 |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
351 |
|
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
352 |
I built frequency list with: |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
353 |
|
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
354 |
http://www.laurenceanthony.net/software/antconc/ |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
355 |
A freeware corpus analysis toolkit for concordancing and text analysis. |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
356 |
|
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
357 |
manually removed single and double letter words, filter out misspelled words |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
358 |
with ``en_US`` ``hunspell`` spell-checker and merged word variations to baseform |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
359 |
with using WordNet. See details in ``obsolete/oanc.py``. |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
360 |
|
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
361 |
http://www.anc.org/data/oanc/download/ |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
362 |
OANC download page. |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
363 |
|
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
364 |
http://www.anc.org/data/oanc/ |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
365 |
OANC home page. |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
366 |
|
642
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
367 |
https://en.wikipedia.org/wiki/Word_lists_by_frequency |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
368 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
369 |
Useful word lists: |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
370 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
371 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
372 |
https://en.wikipedia.org/wiki/Academic_Word_List |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
373 |
Academic Word List at Wikipedia. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
374 |
https://web.archive.org/web/20080212073904/http://language.massey.ac.nz/staff/awl/headwords.shtml |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
375 |
Academic Word List by Averil Coxhead created in 2000 as addition to GSL and |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
376 |
has 570 headwords. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
377 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
378 |
Obsolete or proprietary word list: |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
379 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
380 |
https://en.wikipedia.org/wiki/Basic_English |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
381 |
850 headword list created in 1930. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
382 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
383 |
General Service List |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
384 |
-------------------- |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
385 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
386 |
Updated GSL (General Service List) was obtained from: |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
387 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
388 |
http://jbauman.com/gsl.html |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
389 |
A 1995 revised version of the GSL with minor changes by John Bauman. He added |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
390 |
284 new headwords to original 2000 word list created by Michael West in 1953. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
391 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
392 |
First column represents the number of occurrences per 1,000,000 words of the |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
393 |
Brown corpus based on counting word families. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
394 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
395 |
https://en.wikipedia.org/wiki/General_Service_List |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
396 |
General Service List at Wikipedia. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
397 |
http://jbauman.com/aboutgsl.html |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
398 |
About the General Service List by John Bauman. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
399 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
400 |
New General Service List |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
401 |
------------------------ |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
402 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
403 |
NGSL was obtained from: |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
404 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
405 |
http://www.newgeneralservicelist.org/s/NGSL-101-by-band-qq9o.xlsx |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
406 |
Microsoft XLS file with headword, frequency and SFI. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
407 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
408 |
First column represents the adjusted frequency per 1,000,000 words and counting |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
409 |
base word families. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
410 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
411 |
Academic Word List |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
412 |
------------------ |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
413 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
414 |
The Academic Word List (AWL) was published in the Summer, 2000 issue of the |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
415 |
TESOL Quarterly (v. 34, no. 2). It was devloped by Averil Coxhead, of Victoria |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
416 |
University of Wellington, in New Zealand. The AWL is a replacement for the |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
417 |
University Word List (published by Paul Nation in 1984). |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
418 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
419 |
AWL (Academic Word List) is obtained from: |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
420 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
421 |
https://web.archive.org/web/20081014065815/http://language.massey.ac.nz/staff/awl/download/awlheadwords.rtf |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
422 |
Original Academic Word List in RTF format. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
423 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
424 |
Its structure is headword following by frequency level (from 1 as most frequent |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
425 |
to 10 as least frequent). |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
426 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
427 |
New Academic Word List |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
428 |
---------------------- |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
429 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
430 |
Frequency word list was obtained from: |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
431 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
432 |
http://www.newacademicwordlist.org/s/NAWL_SFI.csv |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
433 |
CSV with colums ``Word,SFI,U,D``. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
434 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
435 |
``SFI`` and ``D`` columns was deleted and ``U`` and ``Word`` column was swapped. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
436 |
Data was sorted by ``U`` column (adjusted frequency per 1,000,000 words). |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
437 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
438 |
NSWL headword list with word variations was obtained from: |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
439 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
440 |
http://www.laurenceanthony.net/software/antwordprofiler/ |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
441 |
Laurence Anthony's AntWordProfiler home page. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
442 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
443 |
It is encoded in ``latin-1`` and recoded into ``utf-8`` (because of ``É`` |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
444 |
symbol). |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
445 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
446 |
See also: |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
447 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
448 |
http://www.newacademicwordlist.org/ |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
449 |
Home page. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
450 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
451 |
Special English word list |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
452 |
------------------------- |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
453 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
454 |
https://en.wikipedia.org/wiki/Special_English |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
455 |
Special English is a controlled version of the English languageused by the |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
456 |
United States broadcasting service Voice of America (VOA). 1557 headwords. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
457 |
|
654
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
458 |
Business Service List |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
459 |
--------------------- |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
460 |
|
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
461 |
The 1700 words of the BSL 1.01 version gives up to 97% coverage of general |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
462 |
business English materials when combined with the 2800 words of the NGSL. |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
463 |
|
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
464 |
Wordlist with variations was obtained from: |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
465 |
|
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
466 |
http://www.newgeneralservicelist.org/s/AWPngslbsl-twcg.zip |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
467 |
In AntWordProfiler compatable format. |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
468 |
|
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
469 |
http://www.newgeneralservicelist.org/bsl-business-service-list/ |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
470 |
BSL home & download page. |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
471 |
|
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
472 |
TOEIC Service List |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
473 |
------------------ |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
474 |
|
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
475 |
Based on a 1.5 million word corpus of various TOEIC preparation materials, the |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
476 |
1200 words of the TSL 1.1 version gives up to 99% coverage of TOEIC materials |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
477 |
and tests when combined with the 2800 words of the NGSL. |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
478 |
|
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
479 |
Wordlist with variations was obtained from: |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
480 |
|
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
481 |
http://www.newgeneralservicelist.org/s/AWPngsltsl.zip |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
482 |
In AntWordProfiler compatable format. |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
483 |
|
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
484 |
http://www.newgeneralservicelist.org/toeic-list/ |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
485 |
The TOEIC Service List home page. |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
486 |
|
1075
a8fad275310b
Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1010
diff
changeset
|
487 |
KET wordlist |
a8fad275310b
Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1010
diff
changeset
|
488 |
------------ |
a8fad275310b
Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1010
diff
changeset
|
489 |
|
a8fad275310b
Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1010
diff
changeset
|
490 |
The KET Vocabulary List gives teachers a guide to the vocabulary needed when |
a8fad275310b
Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1010
diff
changeset
|
491 |
preparing students for the KET and KET for Schools examinations. |
a8fad275310b
Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1010
diff
changeset
|
492 |
|
a8fad275310b
Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1010
diff
changeset
|
493 |
The list covers vocabulary appropriate to the A2 level on the CEFR. |
a8fad275310b
Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1010
diff
changeset
|
494 |
|
a8fad275310b
Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1010
diff
changeset
|
495 |
http://www.cambridgeenglish.org/images/22105-ket-vocabulary-list.pdf |
a8fad275310b
Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1010
diff
changeset
|
496 |
Key English Test (KET) Vocabulary List © UCLES 2012. |
a8fad275310b
Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1010
diff
changeset
|
497 |
|
a8fad275310b
Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1010
diff
changeset
|
498 |
PET wordlist |
a8fad275310b
Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1010
diff
changeset
|
499 |
------------ |
a8fad275310b
Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1010
diff
changeset
|
500 |
|
a8fad275310b
Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1010
diff
changeset
|
501 |
Preliminary and Preliminary for Schools Vocabulary List gives teachers a guide |
a8fad275310b
Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1010
diff
changeset
|
502 |
to the vocabulary needed when preparing students for the Preliminary and |
a8fad275310b
Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1010
diff
changeset
|
503 |
Preliminary for Schools exam inations. |
a8fad275310b
Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1010
diff
changeset
|
504 |
|
a8fad275310b
Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1010
diff
changeset
|
505 |
The list covers vocabulary appropriate to the B1 level on the CEFR. |
a8fad275310b
Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1010
diff
changeset
|
506 |
|
a8fad275310b
Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1010
diff
changeset
|
507 |
http://www.cambridgeenglish.org/images/84669-pet-vocabulary-list.pdf |
a8fad275310b
Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1010
diff
changeset
|
508 |
Preliminary (PET) Wordlist © UCLES 2012. |
a8fad275310b
Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1010
diff
changeset
|
509 |
|
642
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
510 |
BNC+COCA wordlist |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
511 |
----------------- |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
512 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
513 |
Paul Nation prepare frequency wordlist from combined BNC and COCA corpus: |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
514 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
515 |
http://www.victoria.ac.nz/lals/about/staff/paul-nation |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
516 |
Paul Nation's home page and list download page. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
517 |
https://simple.wiktionary.org/wiki/Wiktionary:BNC_spoken_freq |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
518 |
About list on Wikimedia. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
519 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
520 |
It has 25000 basewords (and each baseword comes with variations) splited into |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
521 |
chunks by 1000 words. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
522 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
523 |
I get list from: |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
524 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
525 |
http://www.laurenceanthony.net/software/antwordprofiler/ |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
526 |
Laurence Anthony's AntWordProfiler home page. |
635
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
527 |
|
1130
44161bb73b60
About Oxford 3000/5000.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1075
diff
changeset
|
528 |
Oxford 3000/5000 |
44161bb73b60
About Oxford 3000/5000.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1075
diff
changeset
|
529 |
---------------- |
44161bb73b60
About Oxford 3000/5000.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1075
diff
changeset
|
530 |
|
44161bb73b60
About Oxford 3000/5000.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1075
diff
changeset
|
531 |
https://www.oxfordlearnersdictionaries.com/wordlists/ |
44161bb73b60
About Oxford 3000/5000.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1075
diff
changeset
|
532 |
Based on extensive corpora and aligned to the CEFR. |
44161bb73b60
About Oxford 3000/5000.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
1075
diff
changeset
|
533 |
|
850
e1ac373d384c
Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
814
diff
changeset
|
534 |
Miscellaneous wordlists |
e1ac373d384c
Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
814
diff
changeset
|
535 |
----------------------- |
e1ac373d384c
Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
814
diff
changeset
|
536 |
|
e1ac373d384c
Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
814
diff
changeset
|
537 |
The Dolch word list is a list of frequently used English words compiled by |
e1ac373d384c
Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
814
diff
changeset
|
538 |
Edward William Dolch. The list was prepared in 1936 and was originally published |
e1ac373d384c
Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
814
diff
changeset
|
539 |
in his book Problems in Reading in 1948. Dolch compiled the list based on |
e1ac373d384c
Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
814
diff
changeset
|
540 |
children's books of his era. The list contains 220 "service words". The |
851
a45ebb513160
Added note that gadict cover miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
850
diff
changeset
|
541 |
compilation excludes nouns, which comprise a separate 95-word list. |
a45ebb513160
Added note that gadict cover miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
850
diff
changeset
|
542 |
|
a45ebb513160
Added note that gadict cover miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
850
diff
changeset
|
543 |
Dolch wordlist already covered by ``gadict``. |
850
e1ac373d384c
Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
814
diff
changeset
|
544 |
|
e1ac373d384c
Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
814
diff
changeset
|
545 |
https://en.wikipedia.org/wiki/Dolch_word_list |
e1ac373d384c
Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
814
diff
changeset
|
546 |
Wikipedia article with list itself. |
e1ac373d384c
Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
814
diff
changeset
|
547 |
|
851
a45ebb513160
Added note that gadict cover miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
850
diff
changeset
|
548 |
The Leipzig-Jakarta list is a 100-word word list used by linguists to test the |
850
e1ac373d384c
Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
814
diff
changeset
|
549 |
degree of chronological separation of languages by comparing words that are |
e1ac373d384c
Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
814
diff
changeset
|
550 |
resistant to borrowing. The Leipzig-Jakarta list became available in 2009. |
e1ac373d384c
Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
814
diff
changeset
|
551 |
|
851
a45ebb513160
Added note that gadict cover miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
850
diff
changeset
|
552 |
Leipzig-Jakarta wordlist already covered by ``gadict``. |
a45ebb513160
Added note that gadict cover miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
850
diff
changeset
|
553 |
|
850
e1ac373d384c
Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
814
diff
changeset
|
554 |
https://en.wikipedia.org/wiki/Leipzig%E2%80%93Jakarta_list |
e1ac373d384c
Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
814
diff
changeset
|
555 |
Wikipedia article with list itself. |
e1ac373d384c
Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
814
diff
changeset
|
556 |
|
e1ac373d384c
Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
814
diff
changeset
|
557 |
The words in the Swadesh lists were chosen for their universal, culturally |
e1ac373d384c
Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
814
diff
changeset
|
558 |
independent availability in as many languages as possible. Swadesh's final list, |
e1ac373d384c
Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
814
diff
changeset
|
559 |
published in 1971, contains 100 terms. |
e1ac373d384c
Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
814
diff
changeset
|
560 |
|
851
a45ebb513160
Added note that gadict cover miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
850
diff
changeset
|
561 |
Swadesh wordlist already covered by ``gadict`` except some rare words. |
a45ebb513160
Added note that gadict cover miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
850
diff
changeset
|
562 |
|
850
e1ac373d384c
Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
814
diff
changeset
|
563 |
https://en.wikipedia.org/wiki/Swadesh_list |
e1ac373d384c
Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
814
diff
changeset
|
564 |
|
342
e3d85aeefdec
Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
339
diff
changeset
|
565 |
Typing IPA chars in Emacs |
e3d85aeefdec
Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
339
diff
changeset
|
566 |
========================= |
95
27117b30660d
Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
69
diff
changeset
|
567 |
|
27117b30660d
Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
69
diff
changeset
|
568 |
For entering IPA chars use IPA input method. To enable it type:: |
27117b30660d
Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
69
diff
changeset
|
569 |
|
27117b30660d
Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
69
diff
changeset
|
570 |
C-u C-\ ipa <enter> |
27117b30660d
Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
69
diff
changeset
|
571 |
|
27117b30660d
Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
69
diff
changeset
|
572 |
All chars from alphabet typed as usual. To type special IPA chars use next key |
246
2c3b02416526
M-x describe-input-method
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
243
diff
changeset
|
573 |
bindings (or read help in Emacs by ``M-x describe-input-method`` or ``C-h I``). |
95
27117b30660d
Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
69
diff
changeset
|
574 |
|
27117b30660d
Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
69
diff
changeset
|
575 |
For vowel:: |
27117b30660d
Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
69
diff
changeset
|
576 |
|
247
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
577 |
æ ae |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
578 |
ɑ o| or A |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
579 |
ɒ |o or /A |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
580 |
ʊ U |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
581 |
ɛ /3 or E |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
582 |
ɔ /c |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
583 |
ə /e |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
584 |
ʌ /v |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
585 |
ɪ I |
95
27117b30660d
Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
69
diff
changeset
|
586 |
|
27117b30660d
Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
69
diff
changeset
|
587 |
For consonant:: |
27117b30660d
Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
69
diff
changeset
|
588 |
|
247
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
589 |
θ th |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
590 |
ð dh |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
591 |
ʃ sh |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
592 |
ʧ tsh |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
593 |
ʒ zh or 3 |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
594 |
ŋ ng |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
595 |
ɡ g |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
596 |
ɹ /r |
95
27117b30660d
Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
69
diff
changeset
|
597 |
|
27117b30660d
Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
69
diff
changeset
|
598 |
Special chars:: |
27117b30660d
Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
69
diff
changeset
|
599 |
|
247
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
600 |
ː : (semicolon) |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
601 |
ˈ ' (quote) |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
602 |
ˌ ` (back quote) |
95
27117b30660d
Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
69
diff
changeset
|
603 |
|
247
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
604 |
Alternatively use ``ipa-x-sampa`` or ``ipa-kirshenbaum`` input method (for help |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
605 |
type: ``C-h I ipa-x-sampa RET`` or ``C-h I ipa-kirshenbaum RET``). |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
606 |