www/HACKING.rst
changeset 348 f089cd68ea7b
parent 347 05df277da404
child 360 cb0b59398e25
--- a/www/HACKING.rst	Sun Mar 13 18:43:54 2016 +0200
+++ b/www/HACKING.rst	Sun Mar 13 19:46:38 2016 +0200
@@ -52,6 +52,74 @@
 During dictionaries compilation and releases ``".gadict"`` suffix changed to
 appropriated but base name should be preserved as ``"gadict_" NAME``.
 
+Dictionary source file format
+=============================
+
+gadict project uses dictd C5 source file format in the past. C5 format have
+several issues:
+
+ * C5 is not structural format. So producing another forms and conversion to
+   other formats is not possible.
+ * C5 have no markup for links neither for any other markups.
+
+Before that project used dictd TAB file format which require placing article in
+a single long line. That format is not for human editing at all.
+
+Other dictionary source file formats are considered as choice, like TEI, ISO,
+xdxf, MDF. XML like formats also are not for human editing. Also XML lack of
+syntax locality and full file should be scanned to validate local changes...
+
+Note that StarDict, AbbyLinguo, Babylon, dictd formats are not considered
+because they all about a presentation but not a structure. They are target
+formats for compilation.
+
+Fancy looking analog to MDF + C5 was developed.
+
+Beginning of file describe dictionary information.
+
+Each article separated by ``\n__\n\n`` and consists of two parts:
+
+ * word variations with pronunciation
+ * word translations, with supplementary information like part of speach,
+   synonyms, antonyms, example of usage
+
+*Word variation* are:
+
+* *singularity* or *number*: ``s`` - single, ``pl`` - plural.
+* *verb voice* or *verb tense*: ``v1`` - infinitive, ``v2`` - past tense,
+  ``v3`` past participle tense.
+* *gender*: ``male`` or ``female``
+* *comparison*: ``comp`` - comparative or ``super`` - superlative
+
+*Parts of speech* are:
+
+* ``n`` - noun
+* ``pron`` - pronoun
+* ``adj`` - adjective
+* ``v`` - verb
+* ``adv`` - adverb
+* ``prep`` - preposition
+* ``conj`` - conjunction
+* ``int`` - interjection
+
+Each meaning may refer to topics, like:
+
+* ``sci`` - about science
+* ``body`` - part of body
+* ``math`` - mathematics
+* ``chem`` - chemicals
+* ``bio`` - biology
+* ``music``
+* ``meal``, ``office``, etc
+* ``size``, ``shape``, ``age``, ``color``
+
+Synonyms marked by ``syn:``, antonyms marked by ``ant:``.
+
+Translation marked by lowercase ISO 639-1 code, like ``en:``, ``ru:``, ``uk:``.
+
+Pronunciation variants marked by ``Am`` - American, ``Br`` - Great Britain,
+``Au`` - Australia.
+
 C5 dictionary source file format
 ================================
 
@@ -70,11 +138,7 @@
    unchanged.
  * UTF-8 encoding is supported at least by Goldendict.
 
-gadict project used C5 format in the past but switched to own format due to:
-
- * C5 is not structural format. So producing another forms and conversion to
-   other formats is not possible.
- * C5 have no markup for links neither for any other markups.
+gadict project used C5 format in the past but switched to own format.
 
 TODO convention
 ===============