Archive
Tesseract OCR
“The Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but it is probably one of the most accurate open source OCR engines available. The source code will read a binary, grey or color image and output text. Image input is managed by the Leptonica Image Processing Library which can read a wide variety of image formats.” (source)
As so many times, the Ubuntu repos are out-of-date again :( Via apt-get you can install version 2.
Version 3 is also available and works much better than v2. Installation notes are here: notes #1, notes #2.
The two versions can co-exist. V2 is installed to /usr/bin/tesseract while v3 is installed to /usr/local/bin/tesseract.
There is a Python library for Tesseract called pytesser.
I’ve integrated pytesser in my jabbapylib library, supporting Tesseract v2 and v3 too.
Update (20120505)
Installing tesseract v3 from source is now integrated in jabbatron.