ocr.rst
author Oleksandr Gavenko <gavenkoa@gmail.com>
Sat, 28 Nov 2020 14:37:48 +0200
changeset 2469 d6eb5318b6ff
parent 2228 837f1337c59b
permissions -rw-r--r--
Automatically Configuring WSL.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
1334
9bf0d5a1f0cf Include common header with quick links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1136
diff changeset
     1
.. -*- coding: utf-8; -*-
1136
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     2
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     3
======
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     4
 OCS.
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     5
======
1346
a2fbf50a43f4 Fix: Has no 'contents::' directive.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1334
diff changeset
     6
.. contents::
1905
fba288d59662 Include only local subsections into TOC. This prevent duplication of
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1346
diff changeset
     7
   :local:
1136
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     8
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     9
gocr.
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    10
=====
2058
b6533d1e3019 Fix RST markup.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
    11
::
1136
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    12
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    13
  $ gocr $IN.pnm >$OUT.txt
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    14
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    15
ocrfeeder.
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    16
==========
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    17
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    18
Document layout analysis and optical character recognition system::
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    19
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    20
  $ sudo apt-get install ocrfeeder
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    21
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    22
Using::
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    23
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    24
  $ ocrfeeder-cli --o $OUTDIR --format HTML --images $IN.pnm
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    25
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    26
tesseract.
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    27
==========
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    28
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    29
Installing::
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    30
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    31
  $ sudo apt-get install tesseract-ocr
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    32
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    33
Using::
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    34
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    35
  $ tesseract $IN.tif $OUT
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    36
  $ cat $OUT.txt
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    37
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    38
ocropus.
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    39
========
2228
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2058
diff changeset
    40
::
1136
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    41
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    42
  $ ocropus hocr-to-text screen.ppm
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    43
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    44
ocrad
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    45
=====
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    46
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    47
Optical Character Recognition program::
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    48
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    49
  $ sudo apt-get install ocrad
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    50
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    51
Misc.
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    52
=====
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    53
8d9c9a102827 About OCR program.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    54
unpapper