author | Oleksandr Gavenko <gavenkoa@gmail.com> |
Fri, 11 Nov 2016 00:19:47 +0200 | |
changeset 667 | 5f69f0776c37 |
parent 654 | 2e7485bc264d |
child 811 | d8b40020cb6d |
permissions | -rw-r--r-- |
243
deede3c3386f
Add coding to RST files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
242
diff
changeset
|
1 |
.. -*- coding: utf-8 -*- |
206
407f2a82ef26
Include common header for quick links. Exclude unnecessary .html files from build.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
109
diff
changeset
|
2 |
|
342
e3d85aeefdec
Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
339
diff
changeset
|
3 |
====================== |
e3d85aeefdec
Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
339
diff
changeset
|
4 |
gadict HACKING guide |
e3d85aeefdec
Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
339
diff
changeset
|
5 |
====================== |
69 | 6 |
.. contents:: |
301
1439e072640a
Remove CSS hack that suppress displaying document name in TOC by
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
252
diff
changeset
|
7 |
:local: |
1439e072640a
Remove CSS hack that suppress displaying document name in TOC by
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
252
diff
changeset
|
8 |
|
342
e3d85aeefdec
Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
339
diff
changeset
|
9 |
Versioning rules |
e3d85aeefdec
Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
339
diff
changeset
|
10 |
================ |
231
f993fc31e03f
Move versioning rules to proper file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
211
diff
changeset
|
11 |
|
f993fc31e03f
Move versioning rules to proper file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
211
diff
changeset
|
12 |
We use **major.minor** schema. |
f993fc31e03f
Move versioning rules to proper file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
211
diff
changeset
|
13 |
|
f993fc31e03f
Move versioning rules to proper file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
211
diff
changeset
|
14 |
Until we reach 5000 words **major** is 0. **minor** updated from time to time. |
f993fc31e03f
Move versioning rules to proper file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
211
diff
changeset
|
15 |
|
342
e3d85aeefdec
Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
339
diff
changeset
|
16 |
Getting sources |
e3d85aeefdec
Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
339
diff
changeset
|
17 |
=============== |
232
81bfc95bd853
Move getting sources to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
231
diff
changeset
|
18 |
|
338
61a9d2de0e3e
New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
301
diff
changeset
|
19 |
Cloning repository:: |
232
81bfc95bd853
Move getting sources to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
231
diff
changeset
|
20 |
|
338
61a9d2de0e3e
New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
301
diff
changeset
|
21 |
$ hg clone http://hg.defun.work/gadict gadict |
232
81bfc95bd853
Move getting sources to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
231
diff
changeset
|
22 |
$ hg clone http://hg.code.sf.net/p/gadict/code gadict-hg |
81bfc95bd853
Move getting sources to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
231
diff
changeset
|
23 |
|
338
61a9d2de0e3e
New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
301
diff
changeset
|
24 |
Pushing changes:: |
232
81bfc95bd853
Move getting sources to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
231
diff
changeset
|
25 |
|
338
61a9d2de0e3e
New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
301
diff
changeset
|
26 |
$ hg push ssh://$USER@hg.defun.work/gadict |
61a9d2de0e3e
New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
301
diff
changeset
|
27 |
$ hg push ssh://$USER@hg.code.sf.net/p/gadict/code |
61a9d2de0e3e
New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
301
diff
changeset
|
28 |
$ hg push https://$USER:$PASS@hg.code.sf.net/p/gadict/code |
232
81bfc95bd853
Move getting sources to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
231
diff
changeset
|
29 |
|
342
e3d85aeefdec
Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
339
diff
changeset
|
30 |
Browsing sources online |
e3d85aeefdec
Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
339
diff
changeset
|
31 |
======================= |
232
81bfc95bd853
Move getting sources to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
231
diff
changeset
|
32 |
|
338
61a9d2de0e3e
New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
301
diff
changeset
|
33 |
http://hg.defun.work/gadict |
61a9d2de0e3e
New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
301
diff
changeset
|
34 |
hgweb at home page. |
232
81bfc95bd853
Move getting sources to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
231
diff
changeset
|
35 |
http://hg.code.sf.net/p/gadict/code |
338
61a9d2de0e3e
New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
301
diff
changeset
|
36 |
hgweb at old home page (but supported as mirror). |
232
81bfc95bd853
Move getting sources to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
231
diff
changeset
|
37 |
https://sourceforge.net/p/gadict/code/ |
338
61a9d2de0e3e
New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
301
diff
changeset
|
38 |
Sourceforge Allure interface (not primary, a mirror). |
232
81bfc95bd853
Move getting sources to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
231
diff
changeset
|
39 |
|
347
05df277da404
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
346
diff
changeset
|
40 |
Dictionary file name convention |
05df277da404
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
346
diff
changeset
|
41 |
=============================== |
05df277da404
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
346
diff
changeset
|
42 |
|
05df277da404
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
346
diff
changeset
|
43 |
BNF form:: |
05df277da404
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
346
diff
changeset
|
44 |
|
05df277da404
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
346
diff
changeset
|
45 |
FILE ::= "gadict_" NAME ".gadict" |
05df277da404
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
346
diff
changeset
|
46 |
|
05df277da404
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
346
diff
changeset
|
47 |
``NAME`` may have form ``ISOCODE "-" ISOCODE``, like ``en-ru``, where |
05df277da404
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
346
diff
changeset
|
48 |
``ISOCODE`` is ISO 639-1 language (2 letter) code |
05df277da404
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
346
diff
changeset
|
49 |
|
05df277da404
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
346
diff
changeset
|
50 |
``NAME`` may be a dictionary abbreviation name. |
05df277da404
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
346
diff
changeset
|
51 |
|
05df277da404
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
346
diff
changeset
|
52 |
During dictionaries compilation and releases ``".gadict"`` suffix changed to |
05df277da404
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
346
diff
changeset
|
53 |
appropriated but base name should be preserved as ``"gadict_" NAME``. |
05df277da404
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
346
diff
changeset
|
54 |
|
348
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
55 |
Dictionary source file format |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
56 |
============================= |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
57 |
|
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
58 |
gadict project uses dictd C5 source file format in the past. C5 format have |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
59 |
several issues: |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
60 |
|
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
61 |
* C5 is not structural format. So producing another forms and conversion to |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
62 |
other formats is not possible. |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
63 |
* C5 have no markup for links neither for any other markups. |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
64 |
|
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
65 |
Before that project used dictd TAB file format which require placing article in |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
66 |
a single long line. That format is not for human editing at all. |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
67 |
|
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
68 |
Other dictionary source file formats are considered as choice, like TEI, ISO, |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
69 |
xdxf, MDF. XML like formats also are not for human editing. Also XML lack of |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
70 |
syntax locality and full file should be scanned to validate local changes... |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
71 |
|
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
72 |
Note that StarDict, AbbyLinguo, Babylon, dictd formats are not considered |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
73 |
because they all about a presentation but not a structure. They are target |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
74 |
formats for compilation. |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
75 |
|
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
76 |
Fancy looking analog to MDF + C5 was developed. |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
77 |
|
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
78 |
Beginning of file describe dictionary information. |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
79 |
|
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
80 |
Each article separated by ``\n__\n\n`` and consists of two parts: |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
81 |
|
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
82 |
* word variations with pronunciation |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
83 |
* word translations, with supplementary information like part of speach, |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
84 |
synonyms, antonyms, example of usage |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
85 |
|
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
86 |
*Word variation* are: |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
87 |
|
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
88 |
* *singularity* or *number*: ``s`` - single, ``pl`` - plural. |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
89 |
* *verb voice* or *verb tense*: ``v1`` - infinitive, ``v2`` - past tense, |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
90 |
``v3`` past participle tense. |
565
ac68f2680ea0
Add syntax to add related words. Add separators between ant/syn/rel in
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
542
diff
changeset
|
91 |
* *gender*: ``male`` or ``female``. |
ac68f2680ea0
Add syntax to add related words. Add separators between ant/syn/rel in
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
542
diff
changeset
|
92 |
* *comparison*: ``comp`` - comparative or ``super`` - superlative. |
348
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
93 |
|
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
94 |
*Parts of speech* are: |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
95 |
|
634
4f97d314c5e5
I try to keep word meanings in article in above POS order.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
618
diff
changeset
|
96 |
* ``v`` - verb |
348
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
97 |
* ``n`` - noun |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
98 |
* ``pron`` - pronoun |
634
4f97d314c5e5
I try to keep word meanings in article in above POS order.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
618
diff
changeset
|
99 |
* ``adv`` - adverb |
348
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
100 |
* ``adj`` - adjective |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
101 |
* ``prep`` - preposition |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
102 |
* ``conj`` - conjunction |
376 | 103 |
* ``num`` - numeral |
348
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
104 |
* ``int`` - interjection |
419
7dd3273d92c7
Special markers with roles same as for parts of speech.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
411
diff
changeset
|
105 |
* ``abbr`` - abbreviation |
7dd3273d92c7
Special markers with roles same as for parts of speech.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
411
diff
changeset
|
106 |
* ``phr`` - phrase |
7dd3273d92c7
Special markers with roles same as for parts of speech.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
411
diff
changeset
|
107 |
* ``phr.v`` - phrasal verb |
542
b5197c70972c
Add commonly used contractions.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
527
diff
changeset
|
108 |
* ``contr`` - contraction |
411
2fac252890a5
Document that prefix is kind of pos.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
376
diff
changeset
|
109 |
* ``prefix`` - word prefix |
348
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
110 |
|
634
4f97d314c5e5
I try to keep word meanings in article in above POS order.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
618
diff
changeset
|
111 |
.. note:: I try to keep word meanings in article in above POS order. |
4f97d314c5e5
I try to keep word meanings in article in above POS order.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
618
diff
changeset
|
112 |
|
348
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
113 |
Each meaning may refer to topics, like: |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
114 |
|
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
115 |
* ``sci`` - about science |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
116 |
* ``body`` - part of body |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
117 |
* ``math`` - mathematics |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
118 |
* ``chem`` - chemicals |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
119 |
* ``bio`` - biology |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
120 |
* ``music`` |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
121 |
* ``meal``, ``office``, etc |
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
122 |
* ``size``, ``shape``, ``age``, ``color`` |
618
6ad7203ac9dc
Add support for hypernyms and hyponyms.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
566
diff
changeset
|
123 |
* ``archaic`` - old fashioned, no longer used |
348
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
124 |
|
618
6ad7203ac9dc
Add support for hypernyms and hyponyms.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
566
diff
changeset
|
125 |
Word relations: |
6ad7203ac9dc
Add support for hypernyms and hyponyms.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
566
diff
changeset
|
126 |
|
6ad7203ac9dc
Add support for hypernyms and hyponyms.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
566
diff
changeset
|
127 |
* ``syn:`` - synonyms |
6ad7203ac9dc
Add support for hypernyms and hyponyms.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
566
diff
changeset
|
128 |
* ``ant:`` - antonyms |
6ad7203ac9dc
Add support for hypernyms and hyponyms.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
566
diff
changeset
|
129 |
* ``hyper:`` - hypernyms |
6ad7203ac9dc
Add support for hypernyms and hyponyms.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
566
diff
changeset
|
130 |
* ``hypo:`` - hyponyms |
6ad7203ac9dc
Add support for hypernyms and hyponyms.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
566
diff
changeset
|
131 |
* ``rel:`` - related (see also) terms |
6ad7203ac9dc
Add support for hypernyms and hyponyms.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
566
diff
changeset
|
132 |
* ``topic:`` - topics/tags |
348
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
133 |
|
566
0bba61492c37
Add syntax for glossary/explanation.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
565
diff
changeset
|
134 |
Translation marked by lowercase ISO 639-1 code with ``:`` (colon) character, |
0bba61492c37
Add syntax for glossary/explanation.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
565
diff
changeset
|
135 |
like: |
360
cb0b59398e25
Add Latin language tag example.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
348
diff
changeset
|
136 |
|
cb0b59398e25
Add Latin language tag example.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
348
diff
changeset
|
137 |
* ``en:`` - English |
cb0b59398e25
Add Latin language tag example.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
348
diff
changeset
|
138 |
* ``ru:`` - Russian |
cb0b59398e25
Add Latin language tag example.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
348
diff
changeset
|
139 |
* ``uk:`` - Ukrainian |
cb0b59398e25
Add Latin language tag example.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
348
diff
changeset
|
140 |
* ``la:`` - Latin |
348
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
141 |
|
566
0bba61492c37
Add syntax for glossary/explanation.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
565
diff
changeset
|
142 |
Example marked by lowercase ISO 639-1 code with ``>`` (greater) character. |
0bba61492c37
Add syntax for glossary/explanation.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
565
diff
changeset
|
143 |
|
0bba61492c37
Add syntax for glossary/explanation.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
565
diff
changeset
|
144 |
Explanation or glossary marked by lowercase ISO 639-1 code with ``=`` (equal) |
0bba61492c37
Add syntax for glossary/explanation.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
565
diff
changeset
|
145 |
character. |
0bba61492c37
Add syntax for glossary/explanation.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
565
diff
changeset
|
146 |
|
527
0a31299fad70
Add support for country spelling variant.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
419
diff
changeset
|
147 |
Pronunciation variants marked by: |
0a31299fad70
Add support for country spelling variant.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
419
diff
changeset
|
148 |
|
0a31299fad70
Add support for country spelling variant.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
419
diff
changeset
|
149 |
* ``Am`` - American |
0a31299fad70
Add support for country spelling variant.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
419
diff
changeset
|
150 |
* ``Br`` - Great Britain |
0a31299fad70
Add support for country spelling variant.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
419
diff
changeset
|
151 |
* ``Au`` - Australian |
348
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
152 |
|
647
6ae5399c8087
Add ``rare`` attribute to headword to filter low frequency headwords out from
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
642
diff
changeset
|
153 |
``rare`` attribute to first headword used as marker that word has low frequency. |
6ae5399c8087
Add ``rare`` attribute to headword to filter low frequency headwords out from
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
642
diff
changeset
|
154 |
SRS file writers skip entries marked as ``rare``. I found it convenient to check |
6ae5399c8087
Add ``rare`` attribute to headword to filter low frequency headwords out from
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
642
diff
changeset
|
155 |
frequency with: |
6ae5399c8087
Add ``rare`` attribute to headword to filter low frequency headwords out from
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
642
diff
changeset
|
156 |
|
6ae5399c8087
Add ``rare`` attribute to headword to filter low frequency headwords out from
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
642
diff
changeset
|
157 |
https://books.google.com/ngrams/ |
6ae5399c8087
Add ``rare`` attribute to headword to filter low frequency headwords out from
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
642
diff
changeset
|
158 |
Google N-grams from books 1800-2010. |
6ae5399c8087
Add ``rare`` attribute to headword to filter low frequency headwords out from
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
642
diff
changeset
|
159 |
|
6ae5399c8087
Add ``rare`` attribute to headword to filter low frequency headwords out from
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
642
diff
changeset
|
160 |
For cut-off point I chose ``beseech`` word. All less frequent words receive |
6ae5399c8087
Add ``rare`` attribute to headword to filter low frequency headwords out from
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
642
diff
changeset
|
161 |
``rare`` marker. |
6ae5399c8087
Add ``rare`` attribute to headword to filter low frequency headwords out from
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
642
diff
changeset
|
162 |
|
345
ca5a7d9e7a4b
Reason for switching to another dictionary source format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
344
diff
changeset
|
163 |
C5 dictionary source file format |
ca5a7d9e7a4b
Reason for switching to another dictionary source format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
344
diff
changeset
|
164 |
================================ |
233
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
165 |
|
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
166 |
For source file format used dictd C5 file format. See:: |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
167 |
|
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
168 |
$ man 1 dictfmt |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
169 |
|
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
170 |
Shortly: |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
171 |
|
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
172 |
* Headwords was preceded by 5 or more underscore characters ``_`` and a blank |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
173 |
line. |
345
ca5a7d9e7a4b
Reason for switching to another dictionary source format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
344
diff
changeset
|
174 |
* Article may have several headwords, in that case they are placed in one line |
ca5a7d9e7a4b
Reason for switching to another dictionary source format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
344
diff
changeset
|
175 |
and separated by ``;<SPACE>``. |
ca5a7d9e7a4b
Reason for switching to another dictionary source format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
344
diff
changeset
|
176 |
* All text until the next headword is considered as the definition. |
338
61a9d2de0e3e
New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
301
diff
changeset
|
177 |
* Any leading ``@`` characters are stripped out, but the file is otherwise |
61a9d2de0e3e
New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
301
diff
changeset
|
178 |
unchanged. |
345
ca5a7d9e7a4b
Reason for switching to another dictionary source format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
344
diff
changeset
|
179 |
* UTF-8 encoding is supported at least by Goldendict. |
233
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
180 |
|
348
f089cd68ea7b
Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
347
diff
changeset
|
181 |
gadict project used C5 format in the past but switched to own format. |
46
86c0184efac7
Comment syntax convention.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
44
diff
changeset
|
182 |
|
346
738da7eddaca
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
345
diff
changeset
|
183 |
TODO convention |
738da7eddaca
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
345
diff
changeset
|
184 |
=============== |
738da7eddaca
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
345
diff
changeset
|
185 |
|
738da7eddaca
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
345
diff
changeset
|
186 |
Entries or parts of text that was not completed marked by keywords: |
738da7eddaca
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
345
diff
changeset
|
187 |
|
738da7eddaca
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
345
diff
changeset
|
188 |
TODO |
738da7eddaca
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
345
diff
changeset
|
189 |
incomplete |
738da7eddaca
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
345
diff
changeset
|
190 |
XXX |
738da7eddaca
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
345
diff
changeset
|
191 |
urgent incomplete |
738da7eddaca
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
345
diff
changeset
|
192 |
|
738da7eddaca
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
345
diff
changeset
|
193 |
Makefile rules ``todo`` find this occurrence in sources:: |
738da7eddaca
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
345
diff
changeset
|
194 |
|
738da7eddaca
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
345
diff
changeset
|
195 |
$ make todo |
738da7eddaca
Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
345
diff
changeset
|
196 |
|
342
e3d85aeefdec
Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
339
diff
changeset
|
197 |
World wide dictionary formats and standards |
e3d85aeefdec
Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
339
diff
changeset
|
198 |
=========================================== |
233
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
199 |
|
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
200 |
http://en.wikipedia.org/wiki/Dictionary_writing_system |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
201 |
Dictionary writing system |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
202 |
http://www.sil.org/computing/shoebox/mdf.html |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
203 |
Multi-Dictionary Formatter (MDF). It defines about 100 data |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
204 |
field markers. |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
205 |
http://fieldworks.sil.org/flex/ |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
206 |
FieldWorks Language Explorer (or FLEx, for short) is designed |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
207 |
to help field linguists perform many common language |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
208 |
documentation and analysis tasks. |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
209 |
http://code.google.com/p/lift-standard/ |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
210 |
LIFT (Lexicon Interchange FormaT) is an XML format for storing |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
211 |
lexical information, as used in the creation of dictionaries. |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
212 |
It's not necessarily the format for your lexicon. |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
213 |
http://www.lexiquepro.com/ |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
214 |
Lexique Pro is an interactive lexicon viewer and editor, with |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
215 |
hyperlinks between entries, category views, dictionary |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
216 |
reversal, search, and export tools. It's designed to display |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
217 |
your data in a user-friendly format so you can distribute it |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
218 |
to others. |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
219 |
http://deb.fi.muni.cz/index.php |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
220 |
DEBII — Dictionary Editor and Browser |
d3670cd252ce
Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
232
diff
changeset
|
221 |
|
635
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
222 |
Word lists |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
223 |
========== |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
224 |
|
636
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
225 |
Frequency wordlists use several statistics: |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
226 |
|
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
227 |
* number of word occurrences in corpus, usually marked by ``F`` |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
228 |
* adjusted number of occurrences per 1.000.000 in corpus, usually marked by |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
229 |
``U`` |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
230 |
* Standard Frequency Index (SFI) is a: |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
231 |
|
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
232 |
.. math:: SFI = 40 + 10 * log_10(U) |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
233 |
|
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
234 |
=== ================ |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
235 |
SFI Freq |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
236 |
=== ================ |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
237 |
90 1 per 10 |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
238 |
80 1 per 100 |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
239 |
70 1 per 1000 |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
240 |
60 1 per 10.000 |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
241 |
50 1 per 100.000 |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
242 |
40 1 per 1.000.000 |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
243 |
30 1 per 10.000.000 |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
244 |
=== ================ |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
245 |
* deviation of word frequency across documents in corpus, usually marked by |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
246 |
``D`` |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
247 |
|
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
248 |
Sorting numerically on first= column:: |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
249 |
|
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
250 |
$ sort -k 1nr,2 <$IN >$OUT |
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
251 |
|
635
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
252 |
OANC frequency wordlist |
636
bc521aba85bc
Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
635
diff
changeset
|
253 |
----------------------- |
635
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
254 |
|
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
255 |
The Open American National Corpus (OANC) is a roughly 15 million word subset of |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
256 |
the ANC Second Release that is unrestricted in terms of usage and |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
257 |
redistribution. |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
258 |
|
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
259 |
I've got OANC from link: http://www.anc.org/OANC/OANC-1.0.1-UTF8.zip |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
260 |
|
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
261 |
After unpacking only ``.txt`` files:: |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
262 |
|
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
263 |
$ unzip OANC-1.0.1-UTF8.zip '*.txt' |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
264 |
$ cd OANC; find . -type f | xargs cat | wc |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
265 |
2090929 14586935 96737202 |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
266 |
|
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
267 |
I built frequency list with: |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
268 |
|
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
269 |
http://www.laurenceanthony.net/software/antconc/ |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
270 |
A freeware corpus analysis toolkit for concordancing and text analysis. |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
271 |
|
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
272 |
manually removed single and double letter words, filter out misspelled words |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
273 |
with ``en_US`` ``hunspell`` spell-checker and merged word variations to baseform |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
274 |
with using WordNet. See details in ``obsolete/oanc.py``. |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
275 |
|
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
276 |
http://www.anc.org/data/oanc/download/ |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
277 |
OANC download page. |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
278 |
|
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
279 |
http://www.anc.org/data/oanc/ |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
280 |
OANC home page. |
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
281 |
|
642
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
282 |
https://en.wikipedia.org/wiki/Word_lists_by_frequency |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
283 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
284 |
Useful word lists: |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
285 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
286 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
287 |
https://en.wikipedia.org/wiki/Academic_Word_List |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
288 |
Academic Word List at Wikipedia. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
289 |
https://web.archive.org/web/20080212073904/http://language.massey.ac.nz/staff/awl/headwords.shtml |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
290 |
Academic Word List by Averil Coxhead created in 2000 as addition to GSL and |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
291 |
has 570 headwords. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
292 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
293 |
Obsolete or proprietary word list: |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
294 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
295 |
https://en.wikipedia.org/wiki/Basic_English |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
296 |
850 headword list created in 1930. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
297 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
298 |
General Service List |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
299 |
-------------------- |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
300 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
301 |
Updated GSL (General Service List) was obtained from: |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
302 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
303 |
http://jbauman.com/gsl.html |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
304 |
A 1995 revised version of the GSL with minor changes by John Bauman. He added |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
305 |
284 new headwords to original 2000 word list created by Michael West in 1953. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
306 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
307 |
First column represents the number of occurrences per 1,000,000 words of the |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
308 |
Brown corpus based on counting word families. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
309 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
310 |
https://en.wikipedia.org/wiki/General_Service_List |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
311 |
General Service List at Wikipedia. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
312 |
http://jbauman.com/aboutgsl.html |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
313 |
About the General Service List by John Bauman. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
314 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
315 |
New General Service List |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
316 |
------------------------ |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
317 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
318 |
NGSL was obtained from: |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
319 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
320 |
http://www.newgeneralservicelist.org/s/NGSL-101-by-band-qq9o.xlsx |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
321 |
Microsoft XLS file with headword, frequency and SFI. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
322 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
323 |
First column represents the adjusted frequency per 1,000,000 words and counting |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
324 |
base word families. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
325 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
326 |
Academic Word List |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
327 |
------------------ |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
328 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
329 |
The Academic Word List (AWL) was published in the Summer, 2000 issue of the |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
330 |
TESOL Quarterly (v. 34, no. 2). It was devloped by Averil Coxhead, of Victoria |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
331 |
University of Wellington, in New Zealand. The AWL is a replacement for the |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
332 |
University Word List (published by Paul Nation in 1984). |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
333 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
334 |
AWL (Academic Word List) is obtained from: |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
335 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
336 |
https://web.archive.org/web/20081014065815/http://language.massey.ac.nz/staff/awl/download/awlheadwords.rtf |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
337 |
Original Academic Word List in RTF format. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
338 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
339 |
Its structure is headword following by frequency level (from 1 as most frequent |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
340 |
to 10 as least frequent). |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
341 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
342 |
New Academic Word List |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
343 |
---------------------- |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
344 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
345 |
Frequency word list was obtained from: |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
346 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
347 |
http://www.newacademicwordlist.org/s/NAWL_SFI.csv |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
348 |
CSV with colums ``Word,SFI,U,D``. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
349 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
350 |
``SFI`` and ``D`` columns was deleted and ``U`` and ``Word`` column was swapped. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
351 |
Data was sorted by ``U`` column (adjusted frequency per 1,000,000 words). |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
352 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
353 |
NSWL headword list with word variations was obtained from: |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
354 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
355 |
http://www.laurenceanthony.net/software/antwordprofiler/ |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
356 |
Laurence Anthony's AntWordProfiler home page. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
357 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
358 |
It is encoded in ``latin-1`` and recoded into ``utf-8`` (because of ``É`` |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
359 |
symbol). |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
360 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
361 |
See also: |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
362 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
363 |
http://www.newacademicwordlist.org/ |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
364 |
Home page. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
365 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
366 |
Special English word list |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
367 |
------------------------- |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
368 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
369 |
https://en.wikipedia.org/wiki/Special_English |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
370 |
Special English is a controlled version of the English languageused by the |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
371 |
United States broadcasting service Voice of America (VOA). 1557 headwords. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
372 |
|
654
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
373 |
Business Service List |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
374 |
--------------------- |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
375 |
|
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
376 |
The 1700 words of the BSL 1.01 version gives up to 97% coverage of general |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
377 |
business English materials when combined with the 2800 words of the NGSL. |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
378 |
|
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
379 |
Wordlist with variations was obtained from: |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
380 |
|
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
381 |
http://www.newgeneralservicelist.org/s/AWPngslbsl-twcg.zip |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
382 |
In AntWordProfiler compatable format. |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
383 |
|
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
384 |
http://www.newgeneralservicelist.org/bsl-business-service-list/ |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
385 |
BSL home & download page. |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
386 |
|
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
387 |
TOEIC Service List |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
388 |
------------------ |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
389 |
|
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
390 |
Based on a 1.5 million word corpus of various TOEIC preparation materials, the |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
391 |
1200 words of the TSL 1.1 version gives up to 99% coverage of TOEIC materials |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
392 |
and tests when combined with the 2800 words of the NGSL. |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
393 |
|
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
394 |
Wordlist with variations was obtained from: |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
395 |
|
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
396 |
http://www.newgeneralservicelist.org/s/AWPngsltsl.zip |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
397 |
In AntWordProfiler compatable format. |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
398 |
|
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
399 |
http://www.newgeneralservicelist.org/toeic-list/ |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
400 |
The TOEIC Service List home page. |
2e7485bc264d
About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
647
diff
changeset
|
401 |
|
642
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
402 |
BNC+COCA wordlist |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
403 |
----------------- |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
404 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
405 |
Paul Nation prepare frequency wordlist from combined BNC and COCA corpus: |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
406 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
407 |
http://www.victoria.ac.nz/lals/about/staff/paul-nation |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
408 |
Paul Nation's home page and list download page. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
409 |
https://simple.wiktionary.org/wiki/Wiktionary:BNC_spoken_freq |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
410 |
About list on Wikimedia. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
411 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
412 |
It has 25000 basewords (and each baseword comes with variations) splited into |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
413 |
chunks by 1000 words. |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
414 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
415 |
I get list from: |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
416 |
|
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
417 |
http://www.laurenceanthony.net/software/antwordprofiler/ |
c1032aea6265
Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
636
diff
changeset
|
418 |
Laurence Anthony's AntWordProfiler home page. |
635
445ee650a9ba
Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
634
diff
changeset
|
419 |
|
342
e3d85aeefdec
Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
339
diff
changeset
|
420 |
Register gadict dictionaries for dictd under Debian |
e3d85aeefdec
Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
339
diff
changeset
|
421 |
=================================================== |
44 | 422 |
:: |
43
b75220c0eef6
Register gadict dictionaries for dictd under Debian.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
423 |
|
b75220c0eef6
Register gadict dictionaries for dictd under Debian.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
424 |
$ su |
b75220c0eef6
Register gadict dictionaries for dictd under Debian.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
425 |
$ cat >>etc/dictd/dictd.order <<EOF |
44 | 426 |
gadict-dictabbr |
427 |
/home/user/usr/share/dictd/ |
|
43
b75220c0eef6
Register gadict dictionaries for dictd under Debian.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
428 |
$ dictdconfig --write |
b75220c0eef6
Register gadict dictionaries for dictd under Debian.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
429 |
$ /etc/init.d/dictd restart |
b75220c0eef6
Register gadict dictionaries for dictd under Debian.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
430 |
$ ^D |
b75220c0eef6
Register gadict dictionaries for dictd under Debian.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
431 |
$ dictdconfig --list |
b75220c0eef6
Register gadict dictionaries for dictd under Debian.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
432 |
$ dict -d gadict-dictabbr v |
b75220c0eef6
Register gadict dictionaries for dictd under Debian.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff
changeset
|
433 |
|
342
e3d85aeefdec
Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
339
diff
changeset
|
434 |
Typing IPA chars in Emacs |
e3d85aeefdec
Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
339
diff
changeset
|
435 |
========================= |
95
27117b30660d
Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
69
diff
changeset
|
436 |
|
27117b30660d
Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
69
diff
changeset
|
437 |
For entering IPA chars use IPA input method. To enable it type:: |
27117b30660d
Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
69
diff
changeset
|
438 |
|
27117b30660d
Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
69
diff
changeset
|
439 |
C-u C-\ ipa <enter> |
27117b30660d
Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
69
diff
changeset
|
440 |
|
27117b30660d
Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
69
diff
changeset
|
441 |
All chars from alphabet typed as usual. To type special IPA chars use next key |
246
2c3b02416526
M-x describe-input-method
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
243
diff
changeset
|
442 |
bindings (or read help in Emacs by ``M-x describe-input-method`` or ``C-h I``). |
95
27117b30660d
Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
69
diff
changeset
|
443 |
|
27117b30660d
Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
69
diff
changeset
|
444 |
For vowel:: |
27117b30660d
Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
69
diff
changeset
|
445 |
|
247
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
446 |
æ ae |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
447 |
ɑ o| or A |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
448 |
ɒ |o or /A |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
449 |
ʊ U |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
450 |
ɛ /3 or E |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
451 |
ɔ /c |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
452 |
ə /e |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
453 |
ʌ /v |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
454 |
ɪ I |
95
27117b30660d
Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
69
diff
changeset
|
455 |
|
27117b30660d
Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
69
diff
changeset
|
456 |
For consonant:: |
27117b30660d
Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
69
diff
changeset
|
457 |
|
247
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
458 |
θ th |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
459 |
ð dh |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
460 |
ʃ sh |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
461 |
ʧ tsh |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
462 |
ʒ zh or 3 |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
463 |
ŋ ng |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
464 |
ɡ g |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
465 |
ɹ /r |
95
27117b30660d
Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
69
diff
changeset
|
466 |
|
27117b30660d
Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
69
diff
changeset
|
467 |
Special chars:: |
27117b30660d
Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
69
diff
changeset
|
468 |
|
247
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
469 |
ː : (semicolon) |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
470 |
ˈ ' (quote) |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
471 |
ˌ ` (back quote) |
95
27117b30660d
Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
69
diff
changeset
|
472 |
|
247
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
473 |
Alternatively use ``ipa-x-sampa`` or ``ipa-kirshenbaum`` input method (for help |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
474 |
type: ``C-h I ipa-x-sampa RET`` or ``C-h I ipa-kirshenbaum RET``). |
ba56b6c0877b
About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
246
diff
changeset
|
475 |