free OCR solutions

You have an image with some text / digits, and you want to convert it to text. Then you can process the text with a program easily.

Use an OCR. Let’s see two free solutions:

(1) tesseract
Tesseract is probably the most accurate open source OCR engine available. Combined with the Leptonica Image Processing Library it can read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google. It is released under the Apache License 2.0.” (source)

Under Ubuntu you can install it with the good old apt-get.

Let’s take the following image:

Print the result to the standard output:

$ tesseract numbers.jpg stdout

Print the result to a file:

$ tesseract numbers.jpg output
$ cat output.txt 

As can be seen, the extension “.txt” is added automatically.

(2) online OCR
Just google “online OCR” :) Free OCR worked pretty well for me. Just upload your image, fill out a captcha and there you go.

