{"id":"5d827102-9158-4cd2-a188-1bd6dc1a4522","shortId":"8cb9U6","kind":"skill","title":"Apache Tika Document Extractor","tagline":"Wraps Apache Tika Server REST API for extracting structured text from PDFs, DOCX, PPTX, and 1,200+ file formats. Outputs clean markdown with metadata preservation using Tika /rmeta/text endpoint and recursive parsing mode.","description":"# Apache Tika Document Extractor\n\nWraps Apache Tika Server REST API for extracting structured text from PDFs, DOCX, PPTX, and 1,200+ file formats. Outputs clean markdown with metadata preservation using Tika /rmeta/text endpoint and recursive parsing mode.\n\n## Installation\n\nRequirements and caveats from upstream:\n- **N.B.** [Docker](https://www.docker.com/products/personal) is used for tests in tika-integration-tests. If Docker is not installed, those tests are skipped.\n\nBasic usage or getting-started notes:\n- ===========\n- **Parse a file in Java:**\n- java\n\n- Source: https://github.com/apache/tika\n- Extracted from upstream docs: https://raw.githubusercontent.com/apache/tika/HEAD/README.md\n\n## Source\n\n- [Agent Skill Exchange](https://agentskillexchange.com/skills/apache-tika-document-extractor/)","tags":["apache","tika","document","extractor","skills","agentskillexchange","agent-skills","ai-agents","ai-tools","awesome-list","claude-code","codex"],"capabilities":["skill","source-agentskillexchange","skill-apache-tika-document-extractor","topic-agent-skills","topic-ai-agents","topic-ai-tools","topic-awesome-list","topic-claude-code","topic-codex","topic-cursor","topic-llm","topic-mcp","topic-npx-skills","topic-openclaw","topic-skills-catalog"],"categories":["skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/agentskillexchange/skills/apache-tika-document-extractor","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add agentskillexchange/skills","source_repo":"https://github.com/agentskillexchange/skills","install_from":"skills.sh"}},"qualityScore":"0.454","qualityRationale":"deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (805 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T19:09:23.148Z","embedding":null,"createdAt":"2026-05-18T13:15:05.969Z","updatedAt":"2026-05-18T19:09:23.148Z","lastSeenAt":"2026-05-18T19:09:23.148Z","tsv":"'/apache/tika':120 '/apache/tika/head/readme.md':127 '/products/personal)':85 '/rmeta/text':32,69 '/skills/apache-tika-document-extractor/)':134 '1':20,57 '200':21,58 'agent':129 'agentskillexchange.com':133 'agentskillexchange.com/skills/apache-tika-document-extractor/)':132 'apach':1,6,38,43 'api':10,47 'basic':104 'caveat':78 'clean':25,62 'doc':124 'docker':82,96 'document':3,40 'docx':17,54 'endpoint':33,70 'exchang':131 'extract':12,49,121 'extractor':4,41 'file':22,59,113 'format':23,60 'get':108 'getting-start':107 'github.com':119 'github.com/apache/tika':118 'instal':75,99 'integr':93 'java':115,116 'markdown':26,63 'metadata':28,65 'mode':37,74 'n.b':81 'note':110 'output':24,61 'pars':36,73,111 'pdfs':16,53 'pptx':18,55 'preserv':29,66 'raw.githubusercontent.com':126 'raw.githubusercontent.com/apache/tika/head/readme.md':125 'recurs':35,72 'requir':76 'rest':9,46 'server':8,45 'skill':130 'skill-apache-tika-document-extractor' 'skip':103 'sourc':117,128 'source-agentskillexchange' 'start':109 'structur':13,50 'test':89,94,101 'text':14,51 'tika':2,7,31,39,44,68,92 'tika-integration-test':91 'topic-agent-skills' 'topic-ai-agents' 'topic-ai-tools' 'topic-awesome-list' 'topic-claude-code' 'topic-codex' 'topic-cursor' 'topic-llm' 'topic-mcp' 'topic-npx-skills' 'topic-openclaw' 'topic-skills-catalog' 'upstream':80,123 'usag':105 'use':30,67,87 'wrap':5,42 'www.docker.com':84 'www.docker.com/products/personal)':83","prices":[{"id":"e7635182-4120-46ad-b4a1-36e939e667f7","listingId":"5d827102-9158-4cd2-a188-1bd6dc1a4522","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"agentskillexchange","category":"skills","install_from":"skills.sh"},"createdAt":"2026-05-18T13:15:05.969Z"}],"sources":[{"listingId":"5d827102-9158-4cd2-a188-1bd6dc1a4522","source":"github","sourceId":"agentskillexchange/skills/apache-tika-document-extractor","sourceUrl":"https://github.com/agentskillexchange/skills/tree/main/skills/apache-tika-document-extractor","isPrimary":false,"firstSeenAt":"2026-05-18T13:15:05.969Z","lastSeenAt":"2026-05-18T19:09:23.148Z"}],"details":{"listingId":"5d827102-9158-4cd2-a188-1bd6dc1a4522","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"agentskillexchange","slug":"apache-tika-document-extractor","github":{"repo":"agentskillexchange/skills","stars":8,"topics":["agent-skills","ai-agents","ai-tools","awesome-list","claude-code","codex","cursor","llm","mcp","npx-skills","openclaw","skills-catalog"],"license":"mit","html_url":"https://github.com/agentskillexchange/skills","pushed_at":"2026-05-18T19:02:17Z","description":"The open catalog of AI agent skills — 2,000+ security-scanned skills for Claude Code, Cursor, Codex, and more.","skill_md_sha":"cf81a14ea8b5e85b94a21bcc41684e558a8de9af","skill_md_path":"skills/apache-tika-document-extractor/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/agentskillexchange/skills/tree/main/skills/apache-tika-document-extractor"},"layout":"multi","source":"github","category":"skills","frontmatter":{"name":"Apache Tika Document Extractor","description":"Wraps Apache Tika Server REST API for extracting structured text from PDFs, DOCX, PPTX, and 1,200+ file formats. Outputs clean markdown with metadata preservation using Tika /rmeta/text endpoint and recursive parsing mode."},"skills_sh_url":"https://skills.sh/agentskillexchange/skills/apache-tika-document-extractor"},"updatedAt":"2026-05-18T19:09:23.148Z"}}