Tesseract is a great and easy to use OCR (Optical Character Recognition) tool with support for several languages.

Table of Contents

Installation

Install tesseract package with your package manager. English language data will probably be installed automatically. To install other languages, look for tesseract-ocr, tesseract-data or tesseract-langpack (depends on the Linux system).

# For Debian/Ubuntu, installing spanish language data
sudo apt install tesseract-ocr-spa

Usage

Usage is very simple, just type tesseract, specifying the text language (with -l), input image and the basename of the output file. If you don’t specify a language, it will search for english text.

tesseract -l spa image.jpg output-text
  • You can replace output file basename with - to use standard output.

You can check which languages are installed with tesseract --list-langs. Language data are usually located inside /usr/share/tessdata.

For more info about tesseract usage, check its man page.

Test with this online terminal:

If you have any suggestion, feel free to contact me via social media or email.