ocr.rst
author Oleksandr Gavenko <gavenkoa@gmail.com>
Mon, 22 Feb 2016 12:46:36 +0200
changeset 1905 fba288d59662
parent 1346 a2fbf50a43f4
child 1912 8b81a8f0f692
permissions -rw-r--r--
Include only local subsections into TOC. This prevent duplication of TOC when build single page HTML document. Also this make unnecessary CSS hack to hide document title as top level section.

.. -*- coding: utf-8; -*-
.. include:: HEADER.rst

======
 OCS.
======
.. contents::
   :local:

gocr.
=====

  $ gocr $IN.pnm >$OUT.txt

ocrfeeder.
==========

Document layout analysis and optical character recognition system::

  $ sudo apt-get install ocrfeeder

Using::

  $ ocrfeeder-cli --o $OUTDIR --format HTML --images $IN.pnm

tesseract.
==========

Installing::

  $ sudo apt-get install tesseract-ocr

Using::

  $ tesseract $IN.tif $OUT
  $ cat $OUT.txt

ocropus.
========

  $ ocropus hocr-to-text screen.ppm

ocrad
=====

Optical Character Recognition program::

  $ sudo apt-get install ocrad

Misc.
=====

unpapper