Tesseract OCR Document Extractor
Extracts structured text from scanned documents and images using Tesseract OCR with custom LSTM training data. Supports table detection via OpenCV contour analysis and PDF/A output generation.
What it does
Tesseract OCR Document Extractor
Extracts structured text from scanned documents and images using Tesseract OCR with custom LSTM training data. Supports table detection via OpenCV contour analysis and PDF/A output generation.
Installation
Requirements and caveats from upstream:
- NOTE: This software depends on other packages that may be licensed under different open source licenses.
Basic usage or getting-started notes:
-
It also needs traineddata files which support the legacy engine, for example those from the tessdata repository.
-
Basic command line usage:
-
Examples can be found in the documentation.
-
Extracted from upstream docs: https://raw.githubusercontent.com/tesseract-ocr/tesseract/HEAD/README.md
Source
Capabilities
Install
Quality
deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,172 chars)