www/HACKING.rst
author Oleksandr Gavenko <gavenkoa@gmail.com>
Fri, 21 Aug 2020 14:16:22 +0300
changeset 1227 d186960bd478
parent 1204 ad00658fcd00
child 1233 26eb35750bab
permissions -rw-r--r--
Added link https://github.com/lervag/apy
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
243
deede3c3386f Add coding to RST files.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 242
diff changeset
     1
.. -*- coding: utf-8 -*-
206
407f2a82ef26 Include common header for quick links. Exclude unnecessary .html files from build.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 109
diff changeset
     2
342
e3d85aeefdec Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 339
diff changeset
     3
======================
e3d85aeefdec Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 339
diff changeset
     4
 gadict HACKING guide
e3d85aeefdec Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 339
diff changeset
     5
======================
69
580cc720c496 Fix RST syntax.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 46
diff changeset
     6
.. contents::
301
1439e072640a Remove CSS hack that suppress displaying document name in TOC by
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 252
diff changeset
     7
   :local:
1439e072640a Remove CSS hack that suppress displaying document name in TOC by
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 252
diff changeset
     8
342
e3d85aeefdec Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 339
diff changeset
     9
Versioning rules
e3d85aeefdec Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 339
diff changeset
    10
================
231
f993fc31e03f Move versioning rules to proper file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 211
diff changeset
    11
f993fc31e03f Move versioning rules to proper file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 211
diff changeset
    12
We use **major.minor** schema.
f993fc31e03f Move versioning rules to proper file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 211
diff changeset
    13
f993fc31e03f Move versioning rules to proper file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 211
diff changeset
    14
Until we reach 5000 words **major** is 0. **minor** updated from time to time.
f993fc31e03f Move versioning rules to proper file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 211
diff changeset
    15
342
e3d85aeefdec Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 339
diff changeset
    16
Getting sources
e3d85aeefdec Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 339
diff changeset
    17
===============
232
81bfc95bd853 Move getting sources to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 231
diff changeset
    18
338
61a9d2de0e3e New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 301
diff changeset
    19
Cloning repository::
232
81bfc95bd853 Move getting sources to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 231
diff changeset
    20
338
61a9d2de0e3e New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 301
diff changeset
    21
  $ hg clone http://hg.defun.work/gadict gadict
232
81bfc95bd853 Move getting sources to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 231
diff changeset
    22
  $ hg clone http://hg.code.sf.net/p/gadict/code gadict-hg
81bfc95bd853 Move getting sources to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 231
diff changeset
    23
338
61a9d2de0e3e New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 301
diff changeset
    24
Pushing changes::
232
81bfc95bd853 Move getting sources to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 231
diff changeset
    25
338
61a9d2de0e3e New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 301
diff changeset
    26
  $ hg push ssh://$USER@hg.defun.work/gadict
61a9d2de0e3e New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 301
diff changeset
    27
  $ hg push ssh://$USER@hg.code.sf.net/p/gadict/code
61a9d2de0e3e New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 301
diff changeset
    28
  $ hg push https://$USER:$PASS@hg.code.sf.net/p/gadict/code
232
81bfc95bd853 Move getting sources to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 231
diff changeset
    29
342
e3d85aeefdec Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 339
diff changeset
    30
Browsing sources online
e3d85aeefdec Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 339
diff changeset
    31
=======================
232
81bfc95bd853 Move getting sources to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 231
diff changeset
    32
338
61a9d2de0e3e New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 301
diff changeset
    33
  http://hg.defun.work/gadict
61a9d2de0e3e New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 301
diff changeset
    34
    hgweb at home page.
232
81bfc95bd853 Move getting sources to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 231
diff changeset
    35
  http://hg.code.sf.net/p/gadict/code
338
61a9d2de0e3e New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 301
diff changeset
    36
    hgweb at old home page (but supported as mirror).
232
81bfc95bd853 Move getting sources to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 231
diff changeset
    37
  https://sourceforge.net/p/gadict/code/
338
61a9d2de0e3e New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 301
diff changeset
    38
    Sourceforge Allure interface (not primary, a mirror).
232
81bfc95bd853 Move getting sources to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 231
diff changeset
    39
875
ebba66b977b6 Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 851
diff changeset
    40
Building project
ebba66b977b6 Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 851
diff changeset
    41
================
ebba66b977b6 Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 851
diff changeset
    42
ebba66b977b6 Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 851
diff changeset
    43
``gadict`` project provides dictionaries encoded in custom format. In order to
ebba66b977b6 Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 851
diff changeset
    44
precess them you need GNU Make and Python 2.7 and possibly other tools.
ebba66b977b6 Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 851
diff changeset
    45
ebba66b977b6 Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 851
diff changeset
    46
To produce dictionaries in ``dictd`` format you need to install ``dictd``
1160
d1c76b72e9d6 Docs: building project.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1130
diff changeset
    47
dictribution with ``dictfmt`` and ``dictzip`` utilities::
d1c76b72e9d6 Docs: building project.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1130
diff changeset
    48
d1c76b72e9d6 Docs: building project.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1130
diff changeset
    49
  sudo apt install dictfmt dictzip
d1c76b72e9d6 Docs: building project.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1130
diff changeset
    50
d1c76b72e9d6 Docs: building project.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1130
diff changeset
    51
and run::
875
ebba66b977b6 Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 851
diff changeset
    52
ebba66b977b6 Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 851
diff changeset
    53
  $ make dict
ebba66b977b6 Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 851
diff changeset
    54
1008
dc218f2b784d Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 932
diff changeset
    55
To make Anki decks checkout Anki sources::
875
ebba66b977b6 Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 851
diff changeset
    56
ebba66b977b6 Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 851
diff changeset
    57
  $ git clone https://github.com/dae/anki.git
878
691dafb44619 Corrected instruction sequence.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 875
diff changeset
    58
  $ cd anki
1008
dc218f2b784d Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 932
diff changeset
    59
dc218f2b784d Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 932
diff changeset
    60
and update to specific revision (before strong dependency to ``pyaudio`` which
dc218f2b784d Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 932
diff changeset
    61
is not available on Cygwin)::
dc218f2b784d Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 932
diff changeset
    62
dc218f2b784d Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 932
diff changeset
    63
  $ git co 1d75cff5e7458c6538a4e75728c16bef8b7adb3e^
dc218f2b784d Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 932
diff changeset
    64
dc218f2b784d Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 932
diff changeset
    65
  $ git show 1d75cff5e7458c6538a4e75728c16bef8b7adb3e
dc218f2b784d Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 932
diff changeset
    66
  commit 1d75cff5e7458c6538a4e75728c16bef8b7adb3e
dc218f2b784d Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 932
diff changeset
    67
  Author: Damien Elmes <git@ichi2.net>
dc218f2b784d Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 932
diff changeset
    68
  Date:   2016-06-23 12:04:48 +1000
dc218f2b784d Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 932
diff changeset
    69
dc218f2b784d Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 932
diff changeset
    70
      pyaudio is no longer optional
dc218f2b784d Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 932
diff changeset
    71
dc218f2b784d Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 932
diff changeset
    72
Previously build uses Python 2 and depends on earlier source revitions (before
dc218f2b784d Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 932
diff changeset
    73
port to Python 3)::
dc218f2b784d Ported build of Anki apkg to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 932
diff changeset
    74
875
ebba66b977b6 Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 851
diff changeset
    75
  $ git co  15b349e3^
ebba66b977b6 Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 851
diff changeset
    76
932
2540e72ce603 How details about change where Anki switched to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 905
diff changeset
    77
  $ git show 15b349e3
2540e72ce603 How details about change where Anki switched to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 905
diff changeset
    78
  commit 15b349e3a8b34bf80c134b406c9b90f61250ee9e
2540e72ce603 How details about change where Anki switched to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 905
diff changeset
    79
  Author: Damien Elmes <git@ichi2.net>
2540e72ce603 How details about change where Anki switched to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 905
diff changeset
    80
  Date:   2016-05-12 14:45:35 +1000
2540e72ce603 How details about change where Anki switched to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 905
diff changeset
    81
2540e72ce603 How details about change where Anki switched to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 905
diff changeset
    82
      start port to python 3
2540e72ce603 How details about change where Anki switched to Python 3.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 905
diff changeset
    83
875
ebba66b977b6 Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 851
diff changeset
    84
and put path to Anki project source dir inside ``Makefile.config``::
ebba66b977b6 Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 851
diff changeset
    85
ebba66b977b6 Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 851
diff changeset
    86
  ANKI_PY_DIR := $(HOME)/devel/anki
ebba66b977b6 Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 851
diff changeset
    87
ebba66b977b6 Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 851
diff changeset
    88
Build command to make Anki deks is::
ebba66b977b6 Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 851
diff changeset
    89
ebba66b977b6 Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 851
diff changeset
    90
  $ make anki
ebba66b977b6 Ported Anki decks build to Cygwin+Windows. Added corresponding docs.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 851
diff changeset
    91
1010
33d4f850b576 Alternative Anki generators.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1008
diff changeset
    92
Alternative Anki generators
33d4f850b576 Alternative Anki generators.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1008
diff changeset
    93
===========================
33d4f850b576 Alternative Anki generators.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1008
diff changeset
    94
33d4f850b576 Alternative Anki generators.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1008
diff changeset
    95
https://github.com/kerrickstaley/genanki
33d4f850b576 Alternative Anki generators.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1008
diff changeset
    96
  A Library for Generating Anki Decks.
1227
d186960bd478 Added link https://github.com/lervag/apy
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1204
diff changeset
    97
https://github.com/lervag/apy
d186960bd478 Added link https://github.com/lervag/apy
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1204
diff changeset
    98
  CLI script for interacting with local Anki collection.
1010
33d4f850b576 Alternative Anki generators.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1008
diff changeset
    99
https://github.com/damaru2/ankigenbot/blob/master/src/send_card.py
33d4f850b576 Alternative Anki generators.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1008
diff changeset
   100
  Pushes cards to https://ankiweb.net
33d4f850b576 Alternative Anki generators.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1008
diff changeset
   101
348
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   102
Dictionary source file format
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   103
=============================
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   104
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   105
gadict project uses dictd C5 source file format in the past. C5 format have
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   106
several issues:
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   107
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   108
 * C5 is not structural format. So producing another forms and conversion to
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   109
   other formats is not possible.
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   110
 * C5 have no markup for links neither for any other markups.
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   111
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   112
Before that project used dictd TAB file format which require placing article in
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   113
a single long line. That format is not for human editing at all.
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   114
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   115
Other dictionary source file formats are considered as choice, like TEI, ISO,
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   116
xdxf, MDF. XML like formats also are not for human editing. Also XML lack of
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   117
syntax locality and full file should be scanned to validate local changes...
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   118
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   119
Note that StarDict, AbbyLinguo, Babylon, dictd formats are not considered
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   120
because they all about a presentation but not a structure. They are target
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   121
formats for compilation.
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   122
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   123
Fancy looking analog to MDF + C5 was developed.
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   124
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   125
Beginning of file describe dictionary information.
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   126
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   127
Each article separated by ``\n__\n\n`` and consists of two parts:
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   128
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   129
 * word variations with pronunciation
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   130
 * word translations, with supplementary information like part of speach,
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   131
   synonyms, antonyms, example of usage
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   132
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   133
*Word variation* are:
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   134
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   135
* *singularity* or *number*: ``s`` - single, ``pl`` - plural.
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   136
* *verb voice* or *verb tense*: ``v1`` - infinitive, ``v2`` - past tense,
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   137
  ``v3`` past participle tense.
565
ac68f2680ea0 Add syntax to add related words. Add separators between ant/syn/rel in
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 542
diff changeset
   138
* *gender*: ``male`` or ``female``.
ac68f2680ea0 Add syntax to add related words. Add separators between ant/syn/rel in
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 542
diff changeset
   139
* *comparison*: ``comp`` - comparative or ``super`` - superlative.
348
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   140
903
3bbe249dae47 Explain order of work relations.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 878
diff changeset
   141
*Parts of speech* (ordered by preference):
348
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   142
634
4f97d314c5e5 I try to keep word meanings in article in above POS order.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 618
diff changeset
   143
* ``v`` - verb
348
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   144
* ``n`` - noun
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   145
* ``pron`` - pronoun
634
4f97d314c5e5 I try to keep word meanings in article in above POS order.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 618
diff changeset
   146
* ``adv`` - adverb
348
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   147
* ``adj`` - adjective
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   148
* ``prep`` - preposition
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   149
* ``conj`` - conjunction
376
b7e7a04b9060 numeral
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 360
diff changeset
   150
* ``num`` - numeral
348
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   151
* ``int`` - interjection
419
7dd3273d92c7 Special markers with roles same as for parts of speech.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 411
diff changeset
   152
* ``abbr`` - abbreviation
7dd3273d92c7 Special markers with roles same as for parts of speech.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 411
diff changeset
   153
* ``phr`` - phrase
7dd3273d92c7 Special markers with roles same as for parts of speech.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 411
diff changeset
   154
* ``phr.v`` - phrasal verb
542
b5197c70972c Add commonly used contractions.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 527
diff changeset
   155
* ``contr`` - contraction
411
2fac252890a5 Document that prefix is kind of pos.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 376
diff changeset
   156
* ``prefix`` - word prefix
348
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   157
634
4f97d314c5e5 I try to keep word meanings in article in above POS order.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 618
diff changeset
   158
.. note:: I try to keep word meanings in article in above POS order.
4f97d314c5e5 I try to keep word meanings in article in above POS order.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 618
diff changeset
   159
348
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   160
Each meaning may refer to topics, like:
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   161
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   162
* ``sci`` - about science
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   163
* ``body`` - part of body
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   164
* ``math`` - mathematics
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   165
* ``chem`` - chemicals
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   166
* ``bio`` - biology
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   167
* ``music``
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   168
* ``meal``, ``office``, etc
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   169
* ``size``, ``shape``, ``age``, ``color``
618
6ad7203ac9dc Add support for hypernyms and hyponyms.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 566
diff changeset
   170
* ``archaic`` - old fashioned, no longer used
348
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   171
903
3bbe249dae47 Explain order of work relations.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 878
diff changeset
   172
*Word relation* (ordered by preference):
618
6ad7203ac9dc Add support for hypernyms and hyponyms.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 566
diff changeset
   173
903
3bbe249dae47 Explain order of work relations.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 878
diff changeset
   174
* ``topic:`` - topics/tags
3bbe249dae47 Explain order of work relations.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 878
diff changeset
   175
* ``ant:`` - antonyms
618
6ad7203ac9dc Add support for hypernyms and hyponyms.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 566
diff changeset
   176
* ``syn:`` - synonyms
6ad7203ac9dc Add support for hypernyms and hyponyms.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 566
diff changeset
   177
* ``hyper:`` - hypernyms
6ad7203ac9dc Add support for hypernyms and hyponyms.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 566
diff changeset
   178
* ``hypo:`` - hyponyms
6ad7203ac9dc Add support for hypernyms and hyponyms.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 566
diff changeset
   179
* ``rel:`` - related (see also) terms
348
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   180
566
0bba61492c37 Add syntax for glossary/explanation.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 565
diff changeset
   181
Translation marked by lowercase ISO 639-1 code with ``:`` (colon) character,
0bba61492c37 Add syntax for glossary/explanation.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 565
diff changeset
   182
like:
360
cb0b59398e25 Add Latin language tag example.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 348
diff changeset
   183
cb0b59398e25 Add Latin language tag example.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 348
diff changeset
   184
* ``en:`` - English
cb0b59398e25 Add Latin language tag example.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 348
diff changeset
   185
* ``ru:`` - Russian
cb0b59398e25 Add Latin language tag example.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 348
diff changeset
   186
* ``uk:`` - Ukrainian
cb0b59398e25 Add Latin language tag example.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 348
diff changeset
   187
* ``la:`` - Latin
348
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   188
566
0bba61492c37 Add syntax for glossary/explanation.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 565
diff changeset
   189
Example marked by lowercase ISO 639-1 code with ``>`` (greater) character.
0bba61492c37 Add syntax for glossary/explanation.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 565
diff changeset
   190
905
bc1807ccf58e Fix grammar.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 903
diff changeset
   191
Explanation or glossary are marked by lowercase ISO 639-1 code with ``=``
bc1807ccf58e Fix grammar.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 903
diff changeset
   192
(equal) character.
566
0bba61492c37 Add syntax for glossary/explanation.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 565
diff changeset
   193
527
0a31299fad70 Add support for country spelling variant.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 419
diff changeset
   194
Pronunciation variants marked by:
0a31299fad70 Add support for country spelling variant.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 419
diff changeset
   195
0a31299fad70 Add support for country spelling variant.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 419
diff changeset
   196
* ``Am`` - American
0a31299fad70 Add support for country spelling variant.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 419
diff changeset
   197
* ``Br`` - Great Britain
0a31299fad70 Add support for country spelling variant.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 419
diff changeset
   198
* ``Au`` - Australian
348
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   199
647
6ae5399c8087 Add ``rare`` attribute to headword to filter low frequency headwords out from
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 642
diff changeset
   200
``rare`` attribute to first headword used as marker that word has low frequency.
6ae5399c8087 Add ``rare`` attribute to headword to filter low frequency headwords out from
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 642
diff changeset
   201
SRS file writers skip entries marked as ``rare``. I found it convenient to check
6ae5399c8087 Add ``rare`` attribute to headword to filter low frequency headwords out from
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 642
diff changeset
   202
frequency with:
6ae5399c8087 Add ``rare`` attribute to headword to filter low frequency headwords out from
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 642
diff changeset
   203
6ae5399c8087 Add ``rare`` attribute to headword to filter low frequency headwords out from
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 642
diff changeset
   204
https://books.google.com/ngrams/
6ae5399c8087 Add ``rare`` attribute to headword to filter low frequency headwords out from
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 642
diff changeset
   205
  Google N-grams from books 1800-2010.
6ae5399c8087 Add ``rare`` attribute to headword to filter low frequency headwords out from
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 642
diff changeset
   206
6ae5399c8087 Add ``rare`` attribute to headword to filter low frequency headwords out from
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 642
diff changeset
   207
For cut-off point I chose ``beseech`` word. All less frequent words receive
6ae5399c8087 Add ``rare`` attribute to headword to filter low frequency headwords out from
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 642
diff changeset
   208
``rare`` marker.
6ae5399c8087 Add ``rare`` attribute to headword to filter low frequency headwords out from
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 642
diff changeset
   209
1204
ad00658fcd00 Docs: gaphrase & gadialog file formats.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1160
diff changeset
   210
gaphrase & gadialog file formats
ad00658fcd00 Docs: gaphrase & gadialog file formats.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1160
diff changeset
   211
================================
ad00658fcd00 Docs: gaphrase & gadialog file formats.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1160
diff changeset
   212
ad00658fcd00 Docs: gaphrase & gadialog file formats.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1160
diff changeset
   213
``gaphrase`` & ``gadialog`` files keeps data for generating one side Anki cards.
ad00658fcd00 Docs: gaphrase & gadialog file formats.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1160
diff changeset
   214
ad00658fcd00 Docs: gaphrase & gadialog file formats.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1160
diff changeset
   215
Both use same numbering schema that allows to merge updated articles with
ad00658fcd00 Docs: gaphrase & gadialog file formats.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1160
diff changeset
   216
original without losing learning progress:
ad00658fcd00 Docs: gaphrase & gadialog file formats.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1160
diff changeset
   217
ad00658fcd00 Docs: gaphrase & gadialog file formats.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1160
diff changeset
   218
* First line of file starts with ``## NUM`` - to keep track latest used number.
ad00658fcd00 Docs: gaphrase & gadialog file formats.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1160
diff changeset
   219
* Aticles are separated by number line with format ``# NUM``.
ad00658fcd00 Docs: gaphrase & gadialog file formats.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1160
diff changeset
   220
ad00658fcd00 Docs: gaphrase & gadialog file formats.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1160
diff changeset
   221
``gadialog`` additionally maintains dialog, each part is marked by line starting
ad00658fcd00 Docs: gaphrase & gadialog file formats.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1160
diff changeset
   222
with ``- TEXT``.
ad00658fcd00 Docs: gaphrase & gadialog file formats.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1160
diff changeset
   223
345
ca5a7d9e7a4b Reason for switching to another dictionary source format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 344
diff changeset
   224
C5 dictionary source file format
ca5a7d9e7a4b Reason for switching to another dictionary source format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 344
diff changeset
   225
================================
233
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   226
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   227
For source file format used dictd C5 file format. See::
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   228
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   229
  $ man 1 dictfmt
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   230
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   231
Shortly:
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   232
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   233
 * Headwords was preceded by 5 or more underscore characters ``_`` and a blank
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   234
   line.
345
ca5a7d9e7a4b Reason for switching to another dictionary source format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 344
diff changeset
   235
 * Article may have several headwords, in that case they are placed in one line
ca5a7d9e7a4b Reason for switching to another dictionary source format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 344
diff changeset
   236
   and separated by ``;<SPACE>``.
ca5a7d9e7a4b Reason for switching to another dictionary source format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 344
diff changeset
   237
 * All text until the next headword is considered as the definition.
338
61a9d2de0e3e New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 301
diff changeset
   238
 * Any leading ``@`` characters are stripped out, but the file is otherwise
61a9d2de0e3e New home page. SF is used as mirror.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 301
diff changeset
   239
   unchanged.
345
ca5a7d9e7a4b Reason for switching to another dictionary source format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 344
diff changeset
   240
 * UTF-8 encoding is supported at least by Goldendict.
233
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   241
348
f089cd68ea7b Dictionary source file format.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 347
diff changeset
   242
gadict project used C5 format in the past but switched to own format.
46
86c0184efac7 Comment syntax convention.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 44
diff changeset
   243
346
738da7eddaca Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 345
diff changeset
   244
TODO convention
738da7eddaca Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 345
diff changeset
   245
===============
738da7eddaca Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 345
diff changeset
   246
738da7eddaca Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 345
diff changeset
   247
Entries or parts of text that was not completed marked by keywords:
738da7eddaca Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 345
diff changeset
   248
738da7eddaca Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 345
diff changeset
   249
  TODO
738da7eddaca Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 345
diff changeset
   250
    incomplete
738da7eddaca Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 345
diff changeset
   251
  XXX
738da7eddaca Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 345
diff changeset
   252
    urgent incomplete
738da7eddaca Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 345
diff changeset
   253
738da7eddaca Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 345
diff changeset
   254
Makefile rules ``todo`` find this occurrence in sources::
738da7eddaca Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 345
diff changeset
   255
738da7eddaca Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 345
diff changeset
   256
  $ make todo
738da7eddaca Move section to more appropriate place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 345
diff changeset
   257
342
e3d85aeefdec Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 339
diff changeset
   258
World wide dictionary formats and standards
e3d85aeefdec Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 339
diff changeset
   259
===========================================
233
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   260
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   261
  http://en.wikipedia.org/wiki/Dictionary_writing_system
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   262
                Dictionary writing system
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   263
  http://www.sil.org/computing/shoebox/mdf.html
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   264
                Multi-Dictionary Formatter (MDF). It defines about 100 data
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   265
                field markers.
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   266
  http://fieldworks.sil.org/flex/
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   267
                FieldWorks Language Explorer (or FLEx, for short) is designed
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   268
                to help field linguists perform many common language
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   269
                documentation and analysis tasks.
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   270
  http://code.google.com/p/lift-standard/
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   271
                LIFT (Lexicon Interchange FormaT) is an XML format for storing
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   272
                lexical information, as used in the creation of dictionaries.
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   273
                It's not necessarily the format for your lexicon.
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   274
  http://www.lexiquepro.com/
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   275
                Lexique Pro is an interactive lexicon viewer and editor, with
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   276
                hyperlinks between entries, category views, dictionary
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   277
                reversal, search, and export tools. It's designed to display
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   278
                your data in a user-friendly format so you can distribute it
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   279
                to others.
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   280
  http://deb.fi.muni.cz/index.php
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   281
                DEBII — Dictionary Editor and Browser
d3670cd252ce Move info about dict format to proper place.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 232
diff changeset
   282
814
32541770fadd Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 813
diff changeset
   283
Linguistic sources
32541770fadd Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 813
diff changeset
   284
==================
32541770fadd Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 813
diff changeset
   285
32541770fadd Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 813
diff changeset
   286
Ukrainian linguistics corpora
32541770fadd Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 813
diff changeset
   287
-----------------------------
32541770fadd Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 813
diff changeset
   288
32541770fadd Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 813
diff changeset
   289
**National corpus of Russian language**. There is parallel Russian-Ukrainian
32541770fadd Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 813
diff changeset
   290
texts. Search by keywords, grammatical function, thesaurus properties and other
32541770fadd Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 813
diff changeset
   291
properties.
32541770fadd Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 813
diff changeset
   292
32541770fadd Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 813
diff changeset
   293
http://www.ruscorpora.ru/search-para-uk.html
32541770fadd Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 813
diff changeset
   294
  Page for querying online.
32541770fadd Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 813
diff changeset
   295
32541770fadd Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 813
diff changeset
   296
**Corpus of mova.info project**. Thtere are literal search and search by word
32541770fadd Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 813
diff changeset
   297
family.
32541770fadd Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 813
diff changeset
   298
32541770fadd Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 813
diff changeset
   299
http://www.mova.info/corpus.aspx
32541770fadd Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 813
diff changeset
   300
  Page for querying online.
32541770fadd Ukrainian linguistics corpora.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 813
diff changeset
   301
635
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   302
Word lists
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   303
==========
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   304
636
bc521aba85bc Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 635
diff changeset
   305
Frequency wordlists use several statistics:
bc521aba85bc Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 635
diff changeset
   306
bc521aba85bc Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 635
diff changeset
   307
* number of word occurrences in corpus, usually marked by ``F``
bc521aba85bc Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 635
diff changeset
   308
* adjusted number of occurrences per 1.000.000 in corpus, usually marked by
bc521aba85bc Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 635
diff changeset
   309
  ``U``
bc521aba85bc Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 635
diff changeset
   310
* Standard Frequency Index (SFI) is a:
bc521aba85bc Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 635
diff changeset
   311
bc521aba85bc Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 635
diff changeset
   312
  .. math:: SFI = 40 + 10 * log_10(U)
bc521aba85bc Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 635
diff changeset
   313
bc521aba85bc Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 635
diff changeset
   314
  ===  ================
bc521aba85bc Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 635
diff changeset
   315
  SFI       Freq
bc521aba85bc Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 635
diff changeset
   316
  ===  ================
bc521aba85bc Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 635
diff changeset
   317
  90   1 per 10
bc521aba85bc Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 635
diff changeset
   318
  80   1 per 100
bc521aba85bc Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 635
diff changeset
   319
  70   1 per 1000
bc521aba85bc Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 635
diff changeset
   320
  60   1 per 10.000
bc521aba85bc Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 635
diff changeset
   321
  50   1 per 100.000
bc521aba85bc Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 635
diff changeset
   322
  40   1 per 1.000.000
bc521aba85bc Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 635
diff changeset
   323
  30   1 per 10.000.000
bc521aba85bc Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 635
diff changeset
   324
  ===  ================
bc521aba85bc Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 635
diff changeset
   325
* deviation of word frequency across documents in corpus, usually marked by
bc521aba85bc Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 635
diff changeset
   326
  ``D``
bc521aba85bc Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 635
diff changeset
   327
bc521aba85bc Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 635
diff changeset
   328
Sorting numerically on first= column::
bc521aba85bc Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 635
diff changeset
   329
bc521aba85bc Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 635
diff changeset
   330
  $ sort -k 1nr,2 <$IN >$OUT
bc521aba85bc Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 635
diff changeset
   331
635
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   332
OANC frequency wordlist
636
bc521aba85bc Frequency wordlists use several statistics.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 635
diff changeset
   333
-----------------------
635
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   334
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   335
The Open American National Corpus (OANC) is a roughly 15 million word subset of
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   336
the ANC Second Release that is unrestricted in terms of usage and
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   337
redistribution.
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   338
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   339
I've got OANC from link: http://www.anc.org/OANC/OANC-1.0.1-UTF8.zip
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   340
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   341
After unpacking only ``.txt`` files::
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   342
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   343
  $ unzip OANC-1.0.1-UTF8.zip '*.txt'
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   344
  $ cd OANC; find . -type f | xargs cat | wc
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   345
  2090929 14586935 96737202
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   346
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   347
I built frequency list with:
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   348
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   349
http://www.laurenceanthony.net/software/antconc/
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   350
  A freeware corpus analysis toolkit for concordancing and text analysis.
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   351
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   352
manually removed single and double letter words, filter out misspelled words
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   353
with ``en_US`` ``hunspell`` spell-checker and merged word variations to baseform
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   354
with using WordNet. See details in ``obsolete/oanc.py``.
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   355
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   356
http://www.anc.org/data/oanc/download/
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   357
  OANC download page.
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   358
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   359
http://www.anc.org/data/oanc/
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   360
  OANC home page.
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   361
642
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   362
https://en.wikipedia.org/wiki/Word_lists_by_frequency
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   363
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   364
Useful word lists:
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   365
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   366
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   367
https://en.wikipedia.org/wiki/Academic_Word_List
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   368
  Academic Word List at Wikipedia.
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   369
https://web.archive.org/web/20080212073904/http://language.massey.ac.nz/staff/awl/headwords.shtml
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   370
  Academic Word List by Averil Coxhead created in 2000 as addition to GSL and
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   371
  has 570 headwords.
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   372
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   373
Obsolete or proprietary word list:
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   374
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   375
https://en.wikipedia.org/wiki/Basic_English
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   376
  850 headword list created in 1930.
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   377
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   378
General Service List
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   379
--------------------
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   380
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   381
Updated GSL (General Service List) was obtained from:
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   382
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   383
http://jbauman.com/gsl.html
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   384
  A 1995 revised version of the GSL with minor changes by John Bauman. He added
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   385
  284 new headwords to original 2000 word list created by Michael West in 1953.
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   386
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   387
First column represents the number of occurrences per 1,000,000 words of the
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   388
Brown corpus based on counting word families.
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   389
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   390
https://en.wikipedia.org/wiki/General_Service_List
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   391
  General Service List at Wikipedia.
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   392
http://jbauman.com/aboutgsl.html
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   393
  About the General Service List by John Bauman.
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   394
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   395
New General Service List
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   396
------------------------
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   397
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   398
NGSL was obtained from:
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   399
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   400
http://www.newgeneralservicelist.org/s/NGSL-101-by-band-qq9o.xlsx
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   401
  Microsoft XLS file with headword, frequency and SFI.
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   402
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   403
First column represents the adjusted frequency per 1,000,000 words and counting
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   404
base word families.
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   405
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   406
Academic Word List
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   407
------------------
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   408
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   409
The Academic Word List (AWL) was published in the Summer, 2000 issue of the
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   410
TESOL Quarterly (v. 34, no. 2). It was devloped by Averil Coxhead, of Victoria
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   411
University of Wellington, in New Zealand. The AWL is a replacement for the
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   412
University Word List (published by Paul Nation in 1984).
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   413
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   414
AWL (Academic Word List) is obtained from:
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   415
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   416
https://web.archive.org/web/20081014065815/http://language.massey.ac.nz/staff/awl/download/awlheadwords.rtf
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   417
  Original Academic Word List in RTF format.
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   418
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   419
Its structure is headword following by frequency level (from 1 as most frequent
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   420
to 10 as least frequent).
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   421
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   422
New Academic Word List
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   423
----------------------
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   424
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   425
Frequency word list was obtained from:
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   426
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   427
http://www.newacademicwordlist.org/s/NAWL_SFI.csv
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   428
  CSV with colums ``Word,SFI,U,D``.
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   429
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   430
``SFI`` and ``D`` columns was deleted and ``U`` and ``Word`` column was swapped.
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   431
Data was sorted by ``U`` column (adjusted frequency per 1,000,000 words).
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   432
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   433
NSWL headword list with word variations was obtained from:
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   434
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   435
http://www.laurenceanthony.net/software/antwordprofiler/
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   436
  Laurence Anthony's AntWordProfiler home page.
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   437
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   438
It is encoded in ``latin-1`` and recoded into ``utf-8`` (because of ``É``
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   439
symbol).
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   440
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   441
See also:
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   442
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   443
http://www.newacademicwordlist.org/
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   444
  Home page.
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   445
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   446
Special English word list
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   447
-------------------------
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   448
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   449
https://en.wikipedia.org/wiki/Special_English
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   450
  Special English is a controlled version of the English languageused by the
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   451
  United States broadcasting service Voice of America (VOA). 1557 headwords.
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   452
654
2e7485bc264d About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 647
diff changeset
   453
Business Service List
2e7485bc264d About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 647
diff changeset
   454
---------------------
2e7485bc264d About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 647
diff changeset
   455
2e7485bc264d About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 647
diff changeset
   456
The 1700 words of the BSL 1.01 version gives up to 97% coverage of general
2e7485bc264d About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 647
diff changeset
   457
business English materials when combined with the 2800 words of the NGSL.
2e7485bc264d About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 647
diff changeset
   458
2e7485bc264d About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 647
diff changeset
   459
Wordlist with variations was obtained from:
2e7485bc264d About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 647
diff changeset
   460
2e7485bc264d About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 647
diff changeset
   461
http://www.newgeneralservicelist.org/s/AWPngslbsl-twcg.zip
2e7485bc264d About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 647
diff changeset
   462
  In AntWordProfiler compatable format.
2e7485bc264d About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 647
diff changeset
   463
2e7485bc264d About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 647
diff changeset
   464
http://www.newgeneralservicelist.org/bsl-business-service-list/
2e7485bc264d About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 647
diff changeset
   465
  BSL home & download page.
2e7485bc264d About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 647
diff changeset
   466
2e7485bc264d About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 647
diff changeset
   467
TOEIC Service List
2e7485bc264d About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 647
diff changeset
   468
------------------
2e7485bc264d About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 647
diff changeset
   469
2e7485bc264d About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 647
diff changeset
   470
Based on a 1.5 million word corpus of various TOEIC preparation materials, the
2e7485bc264d About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 647
diff changeset
   471
1200 words of the TSL 1.1 version gives up to 99% coverage of TOEIC materials
2e7485bc264d About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 647
diff changeset
   472
and tests when combined with the 2800 words of the NGSL.
2e7485bc264d About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 647
diff changeset
   473
2e7485bc264d About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 647
diff changeset
   474
Wordlist with variations was obtained from:
2e7485bc264d About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 647
diff changeset
   475
2e7485bc264d About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 647
diff changeset
   476
http://www.newgeneralservicelist.org/s/AWPngsltsl.zip
2e7485bc264d About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 647
diff changeset
   477
  In AntWordProfiler compatable format.
2e7485bc264d About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 647
diff changeset
   478
2e7485bc264d About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 647
diff changeset
   479
http://www.newgeneralservicelist.org/toeic-list/
2e7485bc264d About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 647
diff changeset
   480
  The TOEIC Service List home page.
2e7485bc264d About Business Service List & TOEIC Service List.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 647
diff changeset
   481
1075
a8fad275310b Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1010
diff changeset
   482
KET wordlist
a8fad275310b Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1010
diff changeset
   483
------------
a8fad275310b Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1010
diff changeset
   484
a8fad275310b Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1010
diff changeset
   485
The KET Vocabulary List gives teachers a guide to the vocabulary needed when
a8fad275310b Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1010
diff changeset
   486
preparing students for the KET and KET for Schools examinations.
a8fad275310b Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1010
diff changeset
   487
a8fad275310b Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1010
diff changeset
   488
The list covers vocabulary appropriate to the A2 level on the CEFR.
a8fad275310b Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1010
diff changeset
   489
a8fad275310b Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1010
diff changeset
   490
http://www.cambridgeenglish.org/images/22105-ket-vocabulary-list.pdf
a8fad275310b Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1010
diff changeset
   491
  Key English Test (KET) Vocabulary List © UCLES 2012.
a8fad275310b Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1010
diff changeset
   492
a8fad275310b Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1010
diff changeset
   493
PET wordlist
a8fad275310b Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1010
diff changeset
   494
------------
a8fad275310b Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1010
diff changeset
   495
a8fad275310b Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1010
diff changeset
   496
Preliminary and Preliminary for Schools Vocabulary List gives teachers a guide
a8fad275310b Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1010
diff changeset
   497
to the vocabulary needed when preparing students for the Preliminary and
a8fad275310b Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1010
diff changeset
   498
Preliminary for Schools exam inations.
a8fad275310b Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1010
diff changeset
   499
a8fad275310b Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1010
diff changeset
   500
The list covers vocabulary appropriate to the B1 level on the CEFR.
a8fad275310b Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1010
diff changeset
   501
a8fad275310b Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1010
diff changeset
   502
http://www.cambridgeenglish.org/images/84669-pet-vocabulary-list.pdf
a8fad275310b Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1010
diff changeset
   503
  Preliminary (PET) Wordlist © UCLES 2012.
a8fad275310b Added official KET/PET wordlists of 2012.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1010
diff changeset
   504
642
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   505
BNC+COCA wordlist
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   506
-----------------
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   507
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   508
Paul Nation prepare frequency wordlist from combined BNC and COCA corpus:
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   509
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   510
http://www.victoria.ac.nz/lals/about/staff/paul-nation
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   511
  Paul Nation's home page and list download page.
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   512
https://simple.wiktionary.org/wiki/Wiktionary:BNC_spoken_freq
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   513
  About list on Wikimedia.
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   514
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   515
It has 25000 basewords (and each baseword comes with variations) splited into
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   516
chunks by 1000 words.
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   517
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   518
I get list from:
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   519
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   520
http://www.laurenceanthony.net/software/antwordprofiler/
c1032aea6265 Describe word list sources.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 636
diff changeset
   521
  Laurence Anthony's AntWordProfiler home page.
635
445ee650a9ba Add OANC frequency wordlist.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 634
diff changeset
   522
1130
44161bb73b60 About Oxford 3000/5000.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1075
diff changeset
   523
Oxford 3000/5000
44161bb73b60 About Oxford 3000/5000.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1075
diff changeset
   524
----------------
44161bb73b60 About Oxford 3000/5000.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1075
diff changeset
   525
44161bb73b60 About Oxford 3000/5000.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1075
diff changeset
   526
https://www.oxfordlearnersdictionaries.com/wordlists/
44161bb73b60 About Oxford 3000/5000.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1075
diff changeset
   527
  Based on extensive corpora and aligned to the CEFR.
44161bb73b60 About Oxford 3000/5000.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1075
diff changeset
   528
850
e1ac373d384c Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 814
diff changeset
   529
Miscellaneous wordlists
e1ac373d384c Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 814
diff changeset
   530
-----------------------
e1ac373d384c Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 814
diff changeset
   531
e1ac373d384c Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 814
diff changeset
   532
The Dolch word list is a list of frequently used English words compiled by
e1ac373d384c Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 814
diff changeset
   533
Edward William Dolch. The list was prepared in 1936 and was originally published
e1ac373d384c Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 814
diff changeset
   534
in his book Problems in Reading in 1948. Dolch compiled the list based on
e1ac373d384c Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 814
diff changeset
   535
children's books of his era. The list contains 220 "service words". The
851
a45ebb513160 Added note that gadict cover miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 850
diff changeset
   536
compilation excludes nouns, which comprise a separate 95-word list.
a45ebb513160 Added note that gadict cover miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 850
diff changeset
   537
a45ebb513160 Added note that gadict cover miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 850
diff changeset
   538
Dolch wordlist already covered by ``gadict``.
850
e1ac373d384c Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 814
diff changeset
   539
e1ac373d384c Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 814
diff changeset
   540
https://en.wikipedia.org/wiki/Dolch_word_list
e1ac373d384c Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 814
diff changeset
   541
  Wikipedia article with list itself.
e1ac373d384c Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 814
diff changeset
   542
851
a45ebb513160 Added note that gadict cover miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 850
diff changeset
   543
The Leipzig-Jakarta list is a 100-word word list used by linguists to test the
850
e1ac373d384c Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 814
diff changeset
   544
degree of chronological separation of languages by comparing words that are
e1ac373d384c Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 814
diff changeset
   545
resistant to borrowing. The Leipzig-Jakarta list became available in 2009.
e1ac373d384c Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 814
diff changeset
   546
851
a45ebb513160 Added note that gadict cover miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 850
diff changeset
   547
Leipzig-Jakarta wordlist already covered by ``gadict``.
a45ebb513160 Added note that gadict cover miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 850
diff changeset
   548
850
e1ac373d384c Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 814
diff changeset
   549
https://en.wikipedia.org/wiki/Leipzig%E2%80%93Jakarta_list
e1ac373d384c Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 814
diff changeset
   550
  Wikipedia article with list itself.
e1ac373d384c Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 814
diff changeset
   551
e1ac373d384c Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 814
diff changeset
   552
The words in the Swadesh lists were chosen for their universal, culturally
e1ac373d384c Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 814
diff changeset
   553
independent availability in as many languages as possible. Swadesh's final list,
e1ac373d384c Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 814
diff changeset
   554
published in 1971, contains 100 terms.
e1ac373d384c Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 814
diff changeset
   555
851
a45ebb513160 Added note that gadict cover miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 850
diff changeset
   556
Swadesh wordlist already covered by ``gadict`` except some rare words.
a45ebb513160 Added note that gadict cover miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 850
diff changeset
   557
850
e1ac373d384c Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 814
diff changeset
   558
https://en.wikipedia.org/wiki/Swadesh_list
e1ac373d384c Added info about miscellaneous wordlists.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 814
diff changeset
   559
342
e3d85aeefdec Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 339
diff changeset
   560
Typing IPA chars in Emacs
e3d85aeefdec Remove trailing dot from section titles.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 339
diff changeset
   561
=========================
95
27117b30660d Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 69
diff changeset
   562
27117b30660d Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 69
diff changeset
   563
For entering IPA chars use IPA input method. To enable it type::
27117b30660d Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 69
diff changeset
   564
27117b30660d Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 69
diff changeset
   565
  C-u C-\ ipa <enter>
27117b30660d Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 69
diff changeset
   566
27117b30660d Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 69
diff changeset
   567
All chars from alphabet typed as usual. To type special IPA chars use next key
246
2c3b02416526 M-x describe-input-method
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 243
diff changeset
   568
bindings (or read help in Emacs by ``M-x describe-input-method`` or ``C-h I``).
95
27117b30660d Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 69
diff changeset
   569
27117b30660d Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 69
diff changeset
   570
For vowel::
27117b30660d Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 69
diff changeset
   571
247
ba56b6c0877b About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 246
diff changeset
   572
  æ  ae
ba56b6c0877b About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 246
diff changeset
   573
  ɑ  o| or A
ba56b6c0877b About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 246
diff changeset
   574
  ɒ  |o  or /A
ba56b6c0877b About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 246
diff changeset
   575
  ʊ  U
ba56b6c0877b About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 246
diff changeset
   576
  ɛ  /3 or E
ba56b6c0877b About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 246
diff changeset
   577
  ɔ  /c
ba56b6c0877b About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 246
diff changeset
   578
  ə  /e
ba56b6c0877b About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 246
diff changeset
   579
  ʌ  /v
ba56b6c0877b About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 246
diff changeset
   580
  ɪ  I
95
27117b30660d Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 69
diff changeset
   581
27117b30660d Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 69
diff changeset
   582
For consonant::
27117b30660d Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 69
diff changeset
   583
247
ba56b6c0877b About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 246
diff changeset
   584
  θ  th
ba56b6c0877b About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 246
diff changeset
   585
  ð  dh
ba56b6c0877b About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 246
diff changeset
   586
  ʃ  sh
ba56b6c0877b About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 246
diff changeset
   587
  ʧ  tsh
ba56b6c0877b About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 246
diff changeset
   588
  ʒ  zh or 3
ba56b6c0877b About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 246
diff changeset
   589
  ŋ  ng
ba56b6c0877b About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 246
diff changeset
   590
  ɡ  g
ba56b6c0877b About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 246
diff changeset
   591
  ɹ  /r
95
27117b30660d Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 69
diff changeset
   592
27117b30660d Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 69
diff changeset
   593
Special chars::
27117b30660d Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 69
diff changeset
   594
247
ba56b6c0877b About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 246
diff changeset
   595
  ː  : (semicolon)
ba56b6c0877b About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 246
diff changeset
   596
  ˈ  ' (quote)
ba56b6c0877b About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 246
diff changeset
   597
  ˌ  ` (back quote)
95
27117b30660d Move 'IPA chars' section to HACKING file.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 69
diff changeset
   598
247
ba56b6c0877b About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 246
diff changeset
   599
Alternatively use ``ipa-x-sampa`` or ``ipa-kirshenbaum`` input method (for help
ba56b6c0877b About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 246
diff changeset
   600
type: ``C-h I ipa-x-sampa RET`` or ``C-h I ipa-kirshenbaum RET``).
ba56b6c0877b About ipa-x-sampa and ipa-kirshenbaum.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 246
diff changeset
   601