{"id":"4e73d642-1bd7-44e1-b39a-5b5f4f8d8aa7","shortId":"Lj9tjW","kind":"skill","title":"Tesseract OCR Document Extractor","tagline":"Extracts structured text from scanned documents and images using Tesseract OCR with custom LSTM training data. Supports table detection via OpenCV contour analysis and PDF/A output generation.","description":"# Tesseract OCR Document Extractor\n\nExtracts structured text from scanned documents and images using Tesseract OCR with custom LSTM training data. Supports table detection via OpenCV contour analysis and PDF/A output generation.\n\n## Installation\n\nRequirements and caveats from upstream:\n- **NOTE**: This software depends on other packages that may be licensed under different open source licenses.\n\nBasic usage or getting-started notes:\n- It also needs [traineddata](https://tesseract-ocr.github.io/tessdoc/Data-Files.html) files which support the legacy engine, for example those from the [tessdata](https://github.com/tesseract-ocr/tessdata) repository.\n- Basic **[command line usage](https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage.html)**:\n- Examples can be found in the [documentation](https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage.html#simplest-invocation-to-ocr-an-image).\n\n- Source: https://github.com/tesseract-ocr/tesseract\n- Extracted from upstream docs: https://raw.githubusercontent.com/tesseract-ocr/tesseract/HEAD/README.md\n\n## Source\n\n- [Agent Skill Exchange](https://agentskillexchange.com/skills/tesseract-ocr-document-extractor/)","tags":["tesseract","ocr","document","extractor","skills","agentskillexchange","agent-skills","ai-agents","ai-tools","awesome-list","claude-code","codex"],"capabilities":["skill","source-agentskillexchange","skill-tesseract-ocr-document-extractor","topic-agent-skills","topic-ai-agents","topic-ai-tools","topic-awesome-list","topic-claude-code","topic-codex","topic-cursor","topic-llm","topic-mcp","topic-npx-skills","topic-openclaw","topic-skills-catalog"],"categories":["skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/agentskillexchange/skills/tesseract-ocr-document-extractor","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add agentskillexchange/skills","source_repo":"https://github.com/agentskillexchange/skills","install_from":"skills.sh"}},"qualityScore":"0.454","qualityRationale":"deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,172 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T19:12:49.908Z","embedding":null,"createdAt":"2026-05-18T13:19:55.481Z","updatedAt":"2026-05-18T19:12:49.908Z","lastSeenAt":"2026-05-18T19:12:49.908Z","tsv":"'/skills/tesseract-ocr-document-extractor/)':149 '/tessdoc/command-line-usage.html#simplest-invocation-to-ocr-an-image).':131 '/tessdoc/command-line-usage.html)**:':121 '/tessdoc/data-files.html)':98 '/tesseract-ocr/tessdata)':113 '/tesseract-ocr/tesseract':135 '/tesseract-ocr/tesseract/head/readme.md':142 'agent':144 'agentskillexchange.com':148 'agentskillexchange.com/skills/tesseract-ocr-document-extractor/)':147 'also':93 'analysi':27,58 'basic':85,115 'caveat':66 'command':116 'contour':26,57 'custom':17,48 'data':20,51 'depend':72 'detect':23,54 'differ':81 'doc':139 'document':3,10,34,41,128 'engin':104 'exampl':106,122 'exchang':146 'extract':5,36,136 'extractor':4,35 'file':99 'found':125 'generat':31,62 'get':89 'getting-start':88 'github.com':112,134 'github.com/tesseract-ocr/tessdata)':111 'github.com/tesseract-ocr/tesseract':133 'imag':12,43 'instal':63 'legaci':103 'licens':79,84 'line':117 'lstm':18,49 'may':77 'need':94 'note':69,91 'ocr':2,15,33,46 'open':82 'opencv':25,56 'output':30,61 'packag':75 'pdf/a':29,60 'raw.githubusercontent.com':141 'raw.githubusercontent.com/tesseract-ocr/tesseract/head/readme.md':140 'repositori':114 'requir':64 'scan':9,40 'skill':145 'skill-tesseract-ocr-document-extractor' 'softwar':71 'sourc':83,132,143 'source-agentskillexchange' 'start':90 'structur':6,37 'support':21,52,101 'tabl':22,53 'tessdata':110 'tesseract':1,14,32,45 'tesseract-ocr.github.io':97,120,130 'tesseract-ocr.github.io/tessdoc/command-line-usage.html#simplest-invocation-to-ocr-an-image).':129 'tesseract-ocr.github.io/tessdoc/command-line-usage.html)**:':119 'tesseract-ocr.github.io/tessdoc/data-files.html)':96 'text':7,38 'topic-agent-skills' 'topic-ai-agents' 'topic-ai-tools' 'topic-awesome-list' 'topic-claude-code' 'topic-codex' 'topic-cursor' 'topic-llm' 'topic-mcp' 'topic-npx-skills' 'topic-openclaw' 'topic-skills-catalog' 'train':19,50 'traineddata':95 'upstream':68,138 'usag':86,118 'use':13,44 'via':24,55","prices":[{"id":"dba4ceee-ed8e-45b4-ab0c-93073d92a67d","listingId":"4e73d642-1bd7-44e1-b39a-5b5f4f8d8aa7","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"agentskillexchange","category":"skills","install_from":"skills.sh"},"createdAt":"2026-05-18T13:19:55.481Z"}],"sources":[{"listingId":"4e73d642-1bd7-44e1-b39a-5b5f4f8d8aa7","source":"github","sourceId":"agentskillexchange/skills/tesseract-ocr-document-extractor","sourceUrl":"https://github.com/agentskillexchange/skills/tree/main/skills/tesseract-ocr-document-extractor","isPrimary":false,"firstSeenAt":"2026-05-18T13:19:55.481Z","lastSeenAt":"2026-05-18T19:12:49.908Z"}],"details":{"listingId":"4e73d642-1bd7-44e1-b39a-5b5f4f8d8aa7","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"agentskillexchange","slug":"tesseract-ocr-document-extractor","github":{"repo":"agentskillexchange/skills","stars":8,"topics":["agent-skills","ai-agents","ai-tools","awesome-list","claude-code","codex","cursor","llm","mcp","npx-skills","openclaw","skills-catalog"],"license":"mit","html_url":"https://github.com/agentskillexchange/skills","pushed_at":"2026-05-18T19:02:17Z","description":"The open catalog of AI agent skills — 2,000+ security-scanned skills for Claude Code, Cursor, Codex, and more.","skill_md_sha":"0071371c3cc9c2d6b2a3f20f479d3db72a0add6f","skill_md_path":"skills/tesseract-ocr-document-extractor/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/agentskillexchange/skills/tree/main/skills/tesseract-ocr-document-extractor"},"layout":"multi","source":"github","category":"skills","frontmatter":{"name":"Tesseract OCR Document Extractor","description":"Extracts structured text from scanned documents and images using Tesseract OCR with custom LSTM training data. Supports table detection via OpenCV contour analysis and PDF/A output generation."},"skills_sh_url":"https://skills.sh/agentskillexchange/skills/tesseract-ocr-document-extractor"},"updatedAt":"2026-05-18T19:12:49.908Z"}}