www/HACKING.rst
changeset 345 ca5a7d9e7a4b
parent 344 904e71e64fbc
child 346 738da7eddaca
equal deleted inserted replaced
344:904e71e64fbc 345:ca5a7d9e7a4b
    35   http://hg.code.sf.net/p/gadict/code
    35   http://hg.code.sf.net/p/gadict/code
    36     hgweb at old home page (but supported as mirror).
    36     hgweb at old home page (but supported as mirror).
    37   https://sourceforge.net/p/gadict/code/
    37   https://sourceforge.net/p/gadict/code/
    38     Sourceforge Allure interface (not primary, a mirror).
    38     Sourceforge Allure interface (not primary, a mirror).
    39 
    39 
    40 Dictionary source file format
    40 C5 dictionary source file format
    41 =============================
    41 ================================
    42 
    42 
    43 For source file format used dictd C5 file format. See::
    43 For source file format used dictd C5 file format. See::
    44 
    44 
    45   $ man 1 dictfmt
    45   $ man 1 dictfmt
    46 
    46 
    47 Shortly:
    47 Shortly:
    48 
    48 
    49  * Headwords was preceded by 5 or more underscore characters ``_`` and a blank
    49  * Headwords was preceded by 5 or more underscore characters ``_`` and a blank
    50    line.
    50    line.
    51  * All text until the next headword is considered the definition.
    51  * Article may have several headwords, in that case they are placed in one line
       
    52    and separated by ``;<SPACE>``.
       
    53  * All text until the next headword is considered as the definition.
    52  * Any leading ``@`` characters are stripped out, but the file is otherwise
    54  * Any leading ``@`` characters are stripped out, but the file is otherwise
    53    unchanged.
    55    unchanged.
       
    56  * UTF-8 encoding is supported at least by Goldendict.
    54 
    57 
    55 For convenience also used such assumptions:
    58 gadict project used C5 format in the past but switched to own format due to:
    56 
    59 
    57  * Headwords was separated by ``;<SPACE>`` (and all was placed on single line).
    60  * C5 is not structural format. So producing another forms and conversion to
    58  * UTF-8 encoding was used.
    61    other formats is not possible.
    59  * Lines started with ``#`` striped out (comment syntax).
    62  * C5 have no markup for links neither for any other markups.
    60  * First line with ``ABOUT:`` used as description of dictionary.
       
    61  * First URL (line with ``http://``) used as dictionary home page.
       
    62 
       
    63 Comment syntax convention
       
    64 =========================
       
    65 
       
    66 As 'dictd -c5' format does not support comment syntax we filter out all
       
    67 lines that start with '#'.
       
    68 
    63 
    69 TODO convention
    64 TODO convention
    70 ===============
    65 ===============
    71 
    66 
    72 Entries or parts of text that was not completed marked by keywords:
    67 Entries or parts of text that was not completed marked by keywords: