OCRmyPDF Searchable PDF OCR Pipeline
OCRmyPDF is an open source tool that adds a searchable OCR text layer to scanned PDFs. It is useful when an agent needs to turn image-based documents into text-searchable files without rebuilding a full document pipeline.
What it does
OCRmyPDF Searchable PDF OCR Pipeline
OCRmyPDF is an open source tool that adds a searchable OCR text layer to scanned PDFs. It is useful when an agent needs to turn image-based documents into text-searchable files without rebuilding a full document pipeline.
Installation
Use the upstream install or setup path that matches your environment:
- brew install tesseract-lang
Requirements and caveats from upstream:
- Linux, Windows, macOS and FreeBSD are supported. Docker images are also available, for both x64 and ARM.
-
Add an OCR layer and require PDF/A
Basic usage or getting-started notes:
-
[![PyPI version][pypi]](https://pypi.org/project/ocrmypdf...
-
| Operating system | Install command |
-
| ----------------------------- | ------------------------------|
-
Extracted from upstream docs: https://raw.githubusercontent.com/ocrmypdf/OCRmyPDF/HEAD/README.md
Source
Capabilities
Install
Quality
deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,290 chars)