www/HACKING.rst
author Oleksandr Gavenko <gavenkoa@gmail.com>
Sun, 13 Mar 2016 18:41:22 +0200
changeset 345 ca5a7d9e7a4b
parent 344 904e71e64fbc
child 346 738da7eddaca
permissions -rw-r--r--
Reason for switching to another dictionary source format.

.. -*- coding: utf-8 -*-

======================
 gadict HACKING guide
======================
.. contents::
   :local:

Versioning rules
================

We use **major.minor** schema.

Until we reach 5000 words **major** is 0. **minor** updated from time to time.

Getting sources
===============

Cloning repository::

  $ hg clone http://hg.defun.work/gadict gadict
  $ hg clone http://hg.code.sf.net/p/gadict/code gadict-hg

Pushing changes::

  $ hg push ssh://$USER@hg.defun.work/gadict
  $ hg push ssh://$USER@hg.code.sf.net/p/gadict/code
  $ hg push https://$USER:$PASS@hg.code.sf.net/p/gadict/code

Browsing sources online
=======================

  http://hg.defun.work/gadict
    hgweb at home page.
  http://hg.code.sf.net/p/gadict/code
    hgweb at old home page (but supported as mirror).
  https://sourceforge.net/p/gadict/code/
    Sourceforge Allure interface (not primary, a mirror).

C5 dictionary source file format
================================

For source file format used dictd C5 file format. See::

  $ man 1 dictfmt

Shortly:

 * Headwords was preceded by 5 or more underscore characters ``_`` and a blank
   line.
 * Article may have several headwords, in that case they are placed in one line
   and separated by ``;<SPACE>``.
 * All text until the next headword is considered as the definition.
 * Any leading ``@`` characters are stripped out, but the file is otherwise
   unchanged.
 * UTF-8 encoding is supported at least by Goldendict.

gadict project used C5 format in the past but switched to own format due to:

 * C5 is not structural format. So producing another forms and conversion to
   other formats is not possible.
 * C5 have no markup for links neither for any other markups.

TODO convention
===============

Entries or parts of text that was not completed marked by keywords:

  TODO
    incomplete
  XXX
    urgent incomplete

Makefile rules ``todo`` find this occurrence in sources::

  $ make todo

Dictionary file name convention
===============================

BNF form::

  FILE ::= "gadict_" NAME ".gadict"

``NAME`` may have form ``ISOCODE "-" ISOCODE``, like ``en-ru``, where
``ISOCODE`` is ISO 639-1 language (2 letter) code

``NAME`` may be a dictionary abbreviation name.

During dictionaries compilation and releases ``".gadict"`` suffix changed to
appropriated but base name should be preserved as ``"gadict_" NAME``.

World wide dictionary formats and standards
===========================================

  http://en.wikipedia.org/wiki/Dictionary_writing_system
                Dictionary writing system
  http://www.sil.org/computing/shoebox/mdf.html
                Multi-Dictionary Formatter (MDF). It defines about 100 data
                field markers.
  http://fieldworks.sil.org/flex/
                FieldWorks Language Explorer (or FLEx, for short) is designed
                to help field linguists perform many common language
                documentation and analysis tasks.
  http://code.google.com/p/lift-standard/
                LIFT (Lexicon Interchange FormaT) is an XML format for storing
                lexical information, as used in the creation of dictionaries.
                It's not necessarily the format for your lexicon.
  http://www.lexiquepro.com/
                Lexique Pro is an interactive lexicon viewer and editor, with
                hyperlinks between entries, category views, dictionary
                reversal, search, and export tools. It's designed to display
                your data in a user-friendly format so you can distribute it
                to others.
  http://deb.fi.muni.cz/index.php
                DEBII — Dictionary Editor and Browser

Register gadict dictionaries for dictd under Debian
===================================================
::

  $ su
  $ cat >>etc/dictd/dictd.order <<EOF
  gadict-dictabbr
  /home/user/usr/share/dictd/
  $ dictdconfig --write
  $ /etc/init.d/dictd restart
  $ ^D
  $ dictdconfig --list
  $ dict -d gadict-dictabbr v

Typing IPA chars in Emacs
=========================

For entering IPA chars use IPA input method. To enable it type::

  C-u C-\ ipa <enter>

All chars from alphabet typed as usual. To type special IPA chars use next key
bindings (or read help in Emacs by ``M-x describe-input-method`` or ``C-h I``).

For vowel::

  æ  ae
  ɑ  o| or A
  ɒ  |o  or /A
  ʊ  U
  ɛ  /3 or E
  ɔ  /c
  ə  /e
  ʌ  /v
  ɪ  I

For consonant::

  θ  th
  ð  dh
  ʃ  sh
  ʧ  tsh
  ʒ  zh or 3
  ŋ  ng
  ɡ  g
  ɹ  /r

Special chars::

  ː  : (semicolon)
  ˈ  ' (quote)
  ˌ  ` (back quote)

Alternatively use ``ipa-x-sampa`` or ``ipa-kirshenbaum`` input method (for help
type: ``C-h I ipa-x-sampa RET`` or ``C-h I ipa-kirshenbaum RET``).