{"id":"a907c57e-8a1c-472d-80fd-60a18adcedb0","shortId":"29pkKQ","kind":"skill","title":"Unstructured Document ETL Toolkit","tagline":"Unstructured is an open source document ETL toolkit for converting PDFs, HTML, emails, and office files into structured data. This skill covers how to use the real Unstructured project for partitioning documents, normalizing content, and feeding downstream agent or RAG pipelines.","description":"# Unstructured Document ETL Toolkit\n\nUnstructured is an open source document ETL toolkit for converting PDFs, HTML, emails, and office files into structured data. This skill covers how to use the real Unstructured project for partitioning documents, normalizing content, and feeding downstream agent or RAG pipelines.\n\n## Prerequisites\n\nPython\n\n## Installation\n\nUse the upstream install or setup path that matches your environment:\n- docker pull downloads.unstructured.io/unstructured-io/unstructured:latest\n- docker run -dt --name unstructured downloads.unstructured.io/unstructured-io/unstructured:latest\n- docker exec -it unstructured bash\n- make docker-build\n\nRequirements and caveats from upstream:\n- <a href=\"https://github.com/Unstructured-IO/unstructured/blob/main/LICENSE.md\">![https://pypi.python.org/pypi/unstructured/](https://img.shields.io/pypi/l/unstructured.svg)</a>\n- <a href=\"https://pypi.python.org/pypi/unstructured/\">![https://pypi.python.org/pypi/unstructured/](https://img.shields.io/pypi/pyversions/unstructured.svg)</a>\n- <a href=\"https://pypi.python.org/pypi/unstructured/\">![https://github.com/Naereen/badges/](https://badgen.net/badge/Open%20Source%20%3F/Yes%21/blue?icon=github)</a>\n\nBasic usage or getting-started notes:\n- ## :eight_pointed_black_star: Quick Start\n- [Run the library in a container](https://github.com/Unstructured-IO/unstructured#run-the-library-in-a-container) or\n- ### Run the library in a container\n\n- Source: https://github.com/Unstructured-IO/unstructured\n- Extracted from upstream docs: https://raw.githubusercontent.com/Unstructured-IO/unstructured/HEAD/README.md\n\n## Documentation\n\n- https://docs.unstructured.io\n\n## Source\n\n- [Agent Skill Exchange](https://agentskillexchange.com/skills/unstructured-document-etl-toolkit/)","tags":["unstructured","document","etl","toolkit","skills","agentskillexchange","agent-skills","ai-agents","ai-tools","awesome-list","claude-code","codex"],"capabilities":["skill","source-agentskillexchange","skill-unstructured-document-etl-toolkit","topic-agent-skills","topic-ai-agents","topic-ai-tools","topic-awesome-list","topic-claude-code","topic-codex","topic-cursor","topic-llm","topic-mcp","topic-npx-skills","topic-openclaw","topic-skills-catalog"],"categories":["skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/agentskillexchange/skills/unstructured-document-etl-toolkit","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add agentskillexchange/skills","source_repo":"https://github.com/agentskillexchange/skills","install_from":"skills.sh"}},"qualityScore":"0.454","qualityRationale":"deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,779 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T19:12:59.220Z","embedding":null,"createdAt":"2026-05-18T13:20:09.335Z","updatedAt":"2026-05-18T19:12:59.220Z","lastSeenAt":"2026-05-18T19:12:59.220Z","tsv":"'/naereen/badges/](https://badgen.net/badge/open%20source%20%3f/yes%21/blue?icon=github)':140 '/pypi/unstructured/](https://img.shields.io/pypi/l/unstructured.svg)':134 '/pypi/unstructured/](https://img.shields.io/pypi/pyversions/unstructured.svg)':137 '/skills/unstructured-document-etl-toolkit/)':189 '/unstructured-io/unstructured':173 '/unstructured-io/unstructured#run-the-library-in-a-container)':162 '/unstructured-io/unstructured/head/readme.md':180 '/unstructured-io/unstructured:latest':109,117 'agent':42,87,184 'agentskillexchange.com':188 'agentskillexchange.com/skills/unstructured-document-etl-toolkit/)':187 'bash':122 'basic':141 'black':150 'build':126 'caveat':129 'contain':159,169 'content':38,83 'convert':14,59 'cover':26,71 'data':23,68 'doc':177 'docker':105,110,118,125 'docker-build':124 'docs.unstructured.io':182 'document':2,10,36,47,55,81,181 'downloads.unstructured.io':108,116 'downloads.unstructured.io/unstructured-io/unstructured:latest':107,115 'downstream':41,86 'dt':112 'eight':148 'email':17,62 'environ':104 'etl':3,11,48,56 'exchang':186 'exec':119 'extract':174 'feed':40,85 'file':20,65 'get':145 'getting-start':144 'github.com':139,161,172 'github.com/naereen/badges/](https://badgen.net/badge/open%20source%20%3f/yes%21/blue?icon=github)':138 'github.com/unstructured-io/unstructured':171 'github.com/unstructured-io/unstructured#run-the-library-in-a-container)':160 'html':16,61 'instal':93,97 'librari':156,166 'make':123 'match':102 'name':113 'normal':37,82 'note':147 'offic':19,64 'open':8,53 'partit':35,80 'path':100 'pdfs':15,60 'pipelin':45,90 'point':149 'prerequisit':91 'project':33,78 'pull':106 'pypi.python.org':133,136 'pypi.python.org/pypi/unstructured/](https://img.shields.io/pypi/l/unstructured.svg)':132 'pypi.python.org/pypi/unstructured/](https://img.shields.io/pypi/pyversions/unstructured.svg)':135 'python':92 'quick':152 'rag':44,89 'raw.githubusercontent.com':179 'raw.githubusercontent.com/unstructured-io/unstructured/head/readme.md':178 'real':31,76 'requir':127 'run':111,154,164 'setup':99 'skill':25,70,185 'skill-unstructured-document-etl-toolkit' 'sourc':9,54,170,183 'source-agentskillexchange' 'star':151 'start':146,153 'structur':22,67 'toolkit':4,12,49,57 'topic-agent-skills' 'topic-ai-agents' 'topic-ai-tools' 'topic-awesome-list' 'topic-claude-code' 'topic-codex' 'topic-cursor' 'topic-llm' 'topic-mcp' 'topic-npx-skills' 'topic-openclaw' 'topic-skills-catalog' 'unstructur':1,5,32,46,50,77,114,121 'upstream':96,131,176 'usag':142 'use':29,74,94","prices":[{"id":"caf5b672-d11c-4dd4-9e55-36ac80b81e6c","listingId":"a907c57e-8a1c-472d-80fd-60a18adcedb0","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"agentskillexchange","category":"skills","install_from":"skills.sh"},"createdAt":"2026-05-18T13:20:09.335Z"}],"sources":[{"listingId":"a907c57e-8a1c-472d-80fd-60a18adcedb0","source":"github","sourceId":"agentskillexchange/skills/unstructured-document-etl-toolkit","sourceUrl":"https://github.com/agentskillexchange/skills/tree/main/skills/unstructured-document-etl-toolkit","isPrimary":false,"firstSeenAt":"2026-05-18T13:20:09.335Z","lastSeenAt":"2026-05-18T19:12:59.220Z"}],"details":{"listingId":"a907c57e-8a1c-472d-80fd-60a18adcedb0","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"agentskillexchange","slug":"unstructured-document-etl-toolkit","github":{"repo":"agentskillexchange/skills","stars":8,"topics":["agent-skills","ai-agents","ai-tools","awesome-list","claude-code","codex","cursor","llm","mcp","npx-skills","openclaw","skills-catalog"],"license":"mit","html_url":"https://github.com/agentskillexchange/skills","pushed_at":"2026-05-18T19:02:17Z","description":"The open catalog of AI agent skills — 2,000+ security-scanned skills for Claude Code, Cursor, Codex, and more.","skill_md_sha":"3565824d943b3517be80a2e504d251b7b23add04","skill_md_path":"skills/unstructured-document-etl-toolkit/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/agentskillexchange/skills/tree/main/skills/unstructured-document-etl-toolkit"},"layout":"multi","source":"github","category":"skills","frontmatter":{"name":"Unstructured Document ETL Toolkit","description":"Unstructured is an open source document ETL toolkit for converting PDFs, HTML, emails, and office files into structured data. This skill covers how to use the real Unstructured project for partitioning documents, normalizing content, and feeding downstream agent or RAG pipelines."},"skills_sh_url":"https://skills.sh/agentskillexchange/skills/unstructured-document-etl-toolkit"},"updatedAt":"2026-05-18T19:12:59.220Z"}}