Tesseract OCR Engine for Image-to-Text Workflows
Tesseract OCR is a widely used open source optical character recognition engine with command line and library interfaces. It can extract text from images and scanned documents, supports more than 100 languages, and outputs plain text, hOCR, TSV, and PDF variants.
What it does
Tesseract OCR Engine for Image-to-Text Workflows
Tesseract OCR is a widely used open source optical character recognition engine with command line and library interfaces. It can extract text from images and scanned documents, supports more than 100 languages, and outputs plain text, hOCR, TSV, and PDF variants.
Prerequisites
go
Installation
Requirements and caveats from upstream:
- NOTE: This software depends on other packages that may be licensed under different open source licenses.
Basic usage or getting-started notes:
-
It also needs traineddata files which support the legacy engine, for example those from the tessdata repository.
-
Basic command line usage:
-
Examples can be found in the documentation.
-
Extracted from upstream docs: https://raw.githubusercontent.com/tesseract-ocr/tesseract/HEAD/README.md
Documentation
Source
Capabilities
Install
Quality
deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,372 chars)