ocr.rst
author Oleksandr Gavenko <gavenkoa@gmail.com>
Fri, 13 Jul 2012 22:32:19 +0300
changeset 1334 9bf0d5a1f0cf
parent 1136 8d9c9a102827
child 1346 a2fbf50a43f4
permissions -rw-r--r--
Include common header with quick links.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
1334
9bf0d5a1f0cf Include common header with quick links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1136
diff changeset
     1
.. -*- coding: utf-8; -*-
9bf0d5a1f0cf Include common header with quick links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1136
diff changeset
     2
.. include:: HEADER.rst
1136
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     3
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     4
======
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     5
 OCS.
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     6
======
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     7
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     8
gocr.
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     9
=====
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    10
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    11
  $ gocr $IN.pnm >$OUT.txt
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    12
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    13
ocrfeeder.
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    14
==========
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    15
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    16
Document layout analysis and optical character recognition system::
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    17
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    18
  $ sudo apt-get install ocrfeeder
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    19
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    20
Using::
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    21
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    22
  $ ocrfeeder-cli --o $OUTDIR --format HTML --images $IN.pnm
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    23
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    24
tesseract.
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    25
==========
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    26
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    27
Installing::
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    28
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    29
  $ sudo apt-get install tesseract-ocr
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    30
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    31
Using::
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    32
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    33
  $ tesseract $IN.tif $OUT
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    34
  $ cat $OUT.txt
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    35
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    36
ocropus.
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    37
========
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    38
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    39
  $ ocropus hocr-to-text screen.ppm
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    40
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    41
ocrad
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    42
=====
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    43
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    44
Optical Character Recognition program::
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    45
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    46
  $ sudo apt-get install ocrad
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    47
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    48
Misc.
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    49
=====
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    50
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    51
unpapper