ocr.rst
author Oleksandr Gavenko <gavenkoa@gmail.com>
Mon, 09 Oct 2017 10:49:36 +0300
changeset 2188 e95731eef030
parent 2058 b6533d1e3019
child 2228 837f1337c59b
permissions -rw-r--r--
Fixed: NameError: name 'locale_encoding' is not defined File /bin/rst2html.py, line 17, in <module> from docutils.core import publish_cmdline, default_description File /usr/lib/python2.7/site-packages/docutils/core.py, line 20, in <module> from docutils import frontend, io, utils, readers, writers File /usr/lib/python2.7/site-packages/docutils/frontend.py, line 41, in <module> import docutils.utils File /usr/lib/python2.7/site-packages/docutils/utils/__init__.py, line 20, in <module> import docutils.io File /usr/lib/python2.7/site-packages/docutils/io.py, line 18, in <module> from docutils.utils.error_reporting import locale_encoding, ErrorString, ErrorOutput File /usr/lib/python2.7/site-packages/docutils/utils/error_reporting.py, line 60, in <module> codecs.lookup(locale_encoding or '') # None -> '' NameError: name 'locale_encoding' is not defined

.. -*- coding: utf-8; -*-

======
 OCS.
======
.. contents::
   :local:

gocr.
=====
::

  $ gocr $IN.pnm >$OUT.txt

ocrfeeder.
==========

Document layout analysis and optical character recognition system::

  $ sudo apt-get install ocrfeeder

Using::

  $ ocrfeeder-cli --o $OUTDIR --format HTML --images $IN.pnm

tesseract.
==========

Installing::

  $ sudo apt-get install tesseract-ocr

Using::

  $ tesseract $IN.tif $OUT
  $ cat $OUT.txt

ocropus.
========

  $ ocropus hocr-to-text screen.ppm

ocrad
=====

Optical Character Recognition program::

  $ sudo apt-get install ocrad

Misc.
=====

unpapper