Home > python, ubuntu > Tesseract OCR

Tesseract OCR

The Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but it is probably one of the most accurate open source OCR engines available. The source code will read a binary, grey or color image and output text. Image input is managed by the Leptonica Image Processing Library which can read a wide variety of image formats.” (source)

As so many times, the Ubuntu repos are out-of-date again :( Via apt-get you can install version 2.

Version 3 is also available and works much better than v2. Installation notes are here: notes #1, notes #2.

The two versions can co-exist. V2 is installed to /usr/bin/tesseract while v3 is installed to /usr/local/bin/tesseract.

There is a Python library for Tesseract called pytesser.

I’ve integrated pytesser in my jabbapylib library, supporting Tesseract v2 and v3 too.

Update (20120505)
Installing tesseract v3 from source is now integrated in jabbatron.

Categories: python, ubuntu Tags: , ,
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: