Tesseract OCR Data Extractor
Extracts structured data from scanned documents using Tesseract OCR engine with LSTM models. Supports table detection via OpenCV contour analysis and outputs to CSV, JSON, or Pandas DataFrames.
What it does
Tesseract OCR Data Extractor
Extracts structured data from scanned documents using Tesseract OCR engine with LSTM models. Supports table detection via OpenCV contour analysis and outputs to CSV, JSON, or Pandas DataFrames.
Prerequisites
Tesseract OCR, OpenCV
Installation
Requirements and caveats from upstream:
- NOTE: This software depends on other packages that may be licensed under different open source licenses.
Basic usage or getting-started notes:
-
It also needs traineddata files which support the legacy engine, for example those from the tessdata repository.
-
Basic command line usage:
-
Examples can be found in the documentation.
-
Extracted from upstream docs: https://raw.githubusercontent.com/tesseract-ocr/tesseract/HEAD/README.md
Documentation
Source
Capabilities
Install
Quality
deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,268 chars)