Tesseract: extract text from images
Tesseract is a great and easy to use OCR (Optical Character Recognition) tool with support for several languages.
Table of Contents
tesseract package with your package manager. English language data will probably be installed automatically. To install other languages, look for
tesseract-langpack (depends on the Linux system).
# For Debian/Ubuntu, installing spanish language data sudo apt install tesseract-ocr-spa
Usage is very simple, just type
tesseract, specifying the text language (with
-l), input image and the basename of the output file. If you don’t specify a language, it will search for english text.
tesseract -l spa image.jpg output-text
- You can replace output file basename with
-to use standard output.
You can check which languages are installed with
For more info about tesseract usage, check its man page.