{"id":"dda3abea-e248-4d54-8cf9-dc4f3643342c","shortId":"LuXQjV","kind":"skill","title":"Tesseract OCR Data Extractor","tagline":"Extracts structured data from scanned documents using Tesseract OCR engine with LSTM models. Supports table detection via OpenCV contour analysis and outputs to CSV, JSON, or Pandas DataFrames.","description":"# Tesseract OCR Data Extractor\n\nExtracts structured data from scanned documents using Tesseract OCR engine with LSTM models. Supports table detection via OpenCV contour analysis and outputs to CSV, JSON, or Pandas DataFrames.\n\n## Prerequisites\n\nTesseract OCR, OpenCV\n\n## Installation\n\nRequirements and caveats from upstream:\n- **NOTE**: This software depends on other packages that may be licensed under different open source licenses.\n\nBasic usage or getting-started notes:\n- It also needs [traineddata](https://tesseract-ocr.github.io/tessdoc/Data-Files.html) files which support the legacy engine, for example those from the [tessdata](https://github.com/tesseract-ocr/tessdata) repository.\n- Basic **[command line usage](https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage.html)**:\n- Examples can be found in the [documentation](https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage.html#simplest-invocation-to-ocr-an-image).\n\n- Source: https://github.com/tesseract-ocr/tesseract\n- Extracted from upstream docs: https://raw.githubusercontent.com/tesseract-ocr/tesseract/HEAD/README.md\n\n## Documentation\n\n- https://tesseract-ocr.github.io/tessdoc/\n\n## Source\n\n- [Agent Skill Exchange](https://agentskillexchange.com/skills/tesseract-ocr-data-extractor/)","tags":["tesseract","ocr","data","extractor","skills","agentskillexchange","agent-skills","ai-agents","ai-tools","awesome-list","claude-code","codex"],"capabilities":["skill","source-agentskillexchange","skill-tesseract-ocr-data-extractor","topic-agent-skills","topic-ai-agents","topic-ai-tools","topic-awesome-list","topic-claude-code","topic-codex","topic-cursor","topic-llm","topic-mcp","topic-npx-skills","topic-openclaw","topic-skills-catalog"],"categories":["skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/agentskillexchange/skills/tesseract-ocr-data-extractor","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add agentskillexchange/skills","source_repo":"https://github.com/agentskillexchange/skills","install_from":"skills.sh"}},"qualityScore":"0.454","qualityRationale":"deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,268 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T19:12:49.817Z","embedding":null,"createdAt":"2026-05-18T13:19:55.384Z","updatedAt":"2026-05-18T19:12:49.817Z","lastSeenAt":"2026-05-18T19:12:49.817Z","tsv":"'/skills/tesseract-ocr-data-extractor/)':159 '/tessdoc/':152 '/tessdoc/command-line-usage.html#simplest-invocation-to-ocr-an-image).':137 '/tessdoc/command-line-usage.html)**:':127 '/tessdoc/data-files.html)':104 '/tesseract-ocr/tessdata)':119 '/tesseract-ocr/tesseract':141 '/tesseract-ocr/tesseract/head/readme.md':148 'agent':154 'agentskillexchange.com':158 'agentskillexchange.com/skills/tesseract-ocr-data-extractor/)':157 'also':99 'analysi':24,56 'basic':91,121 'caveat':72 'command':122 'contour':23,55 'csv':28,60 'data':3,7,35,39 'datafram':32,64 'depend':78 'detect':20,52 'differ':87 'doc':145 'document':10,42,134,149 'engin':14,46,110 'exampl':112,128 'exchang':156 'extract':5,37,142 'extractor':4,36 'file':105 'found':131 'get':95 'getting-start':94 'github.com':118,140 'github.com/tesseract-ocr/tessdata)':117 'github.com/tesseract-ocr/tesseract':139 'instal':69 'json':29,61 'legaci':109 'licens':85,90 'line':123 'lstm':16,48 'may':83 'model':17,49 'need':100 'note':75,97 'ocr':2,13,34,45,67 'open':88 'opencv':22,54,68 'output':26,58 'packag':81 'panda':31,63 'prerequisit':65 'raw.githubusercontent.com':147 'raw.githubusercontent.com/tesseract-ocr/tesseract/head/readme.md':146 'repositori':120 'requir':70 'scan':9,41 'skill':155 'skill-tesseract-ocr-data-extractor' 'softwar':77 'sourc':89,138,153 'source-agentskillexchange' 'start':96 'structur':6,38 'support':18,50,107 'tabl':19,51 'tessdata':116 'tesseract':1,12,33,44,66 'tesseract-ocr.github.io':103,126,136,151 'tesseract-ocr.github.io/tessdoc/':150 'tesseract-ocr.github.io/tessdoc/command-line-usage.html#simplest-invocation-to-ocr-an-image).':135 'tesseract-ocr.github.io/tessdoc/command-line-usage.html)**:':125 'tesseract-ocr.github.io/tessdoc/data-files.html)':102 'topic-agent-skills' 'topic-ai-agents' 'topic-ai-tools' 'topic-awesome-list' 'topic-claude-code' 'topic-codex' 'topic-cursor' 'topic-llm' 'topic-mcp' 'topic-npx-skills' 'topic-openclaw' 'topic-skills-catalog' 'traineddata':101 'upstream':74,144 'usag':92,124 'use':11,43 'via':21,53","prices":[{"id":"2c0da479-6593-480c-adc0-fb8028824118","listingId":"dda3abea-e248-4d54-8cf9-dc4f3643342c","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"agentskillexchange","category":"skills","install_from":"skills.sh"},"createdAt":"2026-05-18T13:19:55.384Z"}],"sources":[{"listingId":"dda3abea-e248-4d54-8cf9-dc4f3643342c","source":"github","sourceId":"agentskillexchange/skills/tesseract-ocr-data-extractor","sourceUrl":"https://github.com/agentskillexchange/skills/tree/main/skills/tesseract-ocr-data-extractor","isPrimary":false,"firstSeenAt":"2026-05-18T13:19:55.384Z","lastSeenAt":"2026-05-18T19:12:49.817Z"}],"details":{"listingId":"dda3abea-e248-4d54-8cf9-dc4f3643342c","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"agentskillexchange","slug":"tesseract-ocr-data-extractor","github":{"repo":"agentskillexchange/skills","stars":8,"topics":["agent-skills","ai-agents","ai-tools","awesome-list","claude-code","codex","cursor","llm","mcp","npx-skills","openclaw","skills-catalog"],"license":"mit","html_url":"https://github.com/agentskillexchange/skills","pushed_at":"2026-05-18T19:02:17Z","description":"The open catalog of AI agent skills — 2,000+ security-scanned skills for Claude Code, Cursor, Codex, and more.","skill_md_sha":"2a2b0303aa4a9087ad370692e98ea8c9566cc361","skill_md_path":"skills/tesseract-ocr-data-extractor/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/agentskillexchange/skills/tree/main/skills/tesseract-ocr-data-extractor"},"layout":"multi","source":"github","category":"skills","frontmatter":{"name":"Tesseract OCR Data Extractor","description":"Extracts structured data from scanned documents using Tesseract OCR engine with LSTM models. Supports table detection via OpenCV contour analysis and outputs to CSV, JSON, or Pandas DataFrames."},"skills_sh_url":"https://skills.sh/agentskillexchange/skills/tesseract-ocr-data-extractor"},"updatedAt":"2026-05-18T19:12:49.817Z"}}