Apache Tika Document Parser Agent
Extracts text and metadata from 1000+ file formats using Apache Tika server REST API. Handles PDF OCR via Tesseract integration, Office document parsing, and email archive extraction with MIME detection.
What it does
Apache Tika Document Parser Agent
Extracts text and metadata from 1000+ file formats using Apache Tika server REST API. Handles PDF OCR via Tesseract integration, Office document parsing, and email archive extraction with MIME detection.
Installation
Requirements and caveats from upstream:
- N.B. Docker is used for tests in tika-integration-tests. If Docker is not installed, those tests are skipped.
Basic usage or getting-started notes:
-
===========
-
Parse a file in Java:
-
java
-
Source: https://github.com/apache/tika
-
Extracted from upstream docs: https://raw.githubusercontent.com/apache/tika/HEAD/README.md
Source
Capabilities
Install
Quality
deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (792 chars)