ocr.rst
author Oleksandr Gavenko <gavenkoa@gmail.com>
Mon, 22 Feb 2016 12:46:36 +0200
changeset 1905 fba288d59662
parent 1346 a2fbf50a43f4
child 1912 8b81a8f0f692
permissions -rw-r--r--
Include only local subsections into TOC. This prevent duplication of TOC when build single page HTML document. Also this make unnecessary CSS hack to hide document title as top level section.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
1334
9bf0d5a1f0cf Include common header with quick links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1136
diff changeset
     1
.. -*- coding: utf-8; -*-
9bf0d5a1f0cf Include common header with quick links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1136
diff changeset
     2
.. include:: HEADER.rst
1136
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     3
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     4
======
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     5
 OCS.
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     6
======
1346
a2fbf50a43f4 Fix: Has no 'contents::' directive.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1334
diff changeset
     7
.. contents::
1905
fba288d59662 Include only local subsections into TOC. This prevent duplication of
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1346
diff changeset
     8
   :local:
1136
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     9
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    10
gocr.
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    11
=====
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    12
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    13
  $ gocr $IN.pnm >$OUT.txt
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    14
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    15
ocrfeeder.
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    16
==========
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    17
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    18
Document layout analysis and optical character recognition system::
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    19
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    20
  $ sudo apt-get install ocrfeeder
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    21
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    22
Using::
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    23
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    24
  $ ocrfeeder-cli --o $OUTDIR --format HTML --images $IN.pnm
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    25
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    26
tesseract.
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    27
==========
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    28
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    29
Installing::
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    30
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    31
  $ sudo apt-get install tesseract-ocr
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    32
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    33
Using::
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    34
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    35
  $ tesseract $IN.tif $OUT
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    36
  $ cat $OUT.txt
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    37
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    38
ocropus.
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    39
========
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    40
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    41
  $ ocropus hocr-to-text screen.ppm
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    42
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    43
ocrad
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    44
=====
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    45
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    46
Optical Character Recognition program::
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    47
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    48
  $ sudo apt-get install ocrad
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    49
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    50
Misc.
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    51
=====
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    52
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    53
unpapper