{"id":"25a95549-2d8c-48ae-9e6a-2ecf9b87e344","shortId":"t3Xu5q","kind":"skill","title":"Unstructured Document Partitioning and ETL Library for LLM Pipelines","tagline":"Unstructured is an open-source library for ingesting and partitioning PDFs, HTML, Office documents, emails, and other unstructured inputs into structured elements and metadata. It is commonly used as a preprocessing layer for RAG, search, extraction, and downstream AI pipelines.","description":"# Unstructured Document Partitioning and ETL Library for LLM Pipelines\n\nUnstructured is an open-source library for ingesting and partitioning PDFs, HTML, Office documents, emails, and other unstructured inputs into structured elements and metadata. It is commonly used as a preprocessing layer for RAG, search, extraction, and downstream AI pipelines.\n\n## Prerequisites\n\nPython 3.11+\n\n## Installation\n\nUse the upstream install or setup path that matches your environment:\n- docker pull downloads.unstructured.io/unstructured-io/unstructured:latest\n- docker run -dt --name unstructured downloads.unstructured.io/unstructured-io/unstructured:latest\n- docker exec -it unstructured bash\n- make docker-build\n\nRequirements and caveats from upstream:\n- <a href=\"https://github.com/Unstructured-IO/unstructured/blob/main/LICENSE.md\">![https://pypi.python.org/pypi/unstructured/](https://img.shields.io/pypi/l/unstructured.svg)</a>\n- <a href=\"https://pypi.python.org/pypi/unstructured/\">![https://pypi.python.org/pypi/unstructured/](https://img.shields.io/pypi/pyversions/unstructured.svg)</a>\n- <a href=\"https://pypi.python.org/pypi/unstructured/\">![https://github.com/Naereen/badges/](https://badgen.net/badge/Open%20Source%20%3F/Yes%21/blue?icon=github)</a>\n\nBasic usage or getting-started notes:\n- ## :eight_pointed_black_star: Quick Start\n- [Run the library in a container](https://github.com/Unstructured-IO/unstructured#run-the-library-in-a-container) or\n- ### Run the library in a container\n\n- Source: https://github.com/Unstructured-IO/unstructured\n- Extracted from upstream docs: https://raw.githubusercontent.com/Unstructured-IO/unstructured/HEAD/README.md\n\n## Documentation\n\n- https://docs.unstructured.io/\n\n## Source\n\n- [Agent Skill Exchange](https://agentskillexchange.com/skills/unstructured-document-partitioning-etl-library-llm-pipelines/)","tags":["unstructured","document","partitioning","etl","library","llm","pipelines","skills","agentskillexchange","agent-skills","ai-agents","ai-tools"],"capabilities":["skill","source-agentskillexchange","skill-unstructured-document-partitioning-etl-library-llm-pipelines","topic-agent-skills","topic-ai-agents","topic-ai-tools","topic-awesome-list","topic-claude-code","topic-codex","topic-cursor","topic-llm","topic-mcp","topic-npx-skills","topic-openclaw","topic-skills-catalog"],"categories":["skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/agentskillexchange/skills/unstructured-document-partitioning-etl-library-llm-pipelines","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add agentskillexchange/skills","source_repo":"https://github.com/agentskillexchange/skills","install_from":"skills.sh"}},"qualityScore":"0.454","qualityRationale":"deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,847 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T19:12:59.329Z","embedding":null,"createdAt":"2026-05-18T13:20:09.454Z","updatedAt":"2026-05-18T19:12:59.329Z","lastSeenAt":"2026-05-18T19:12:59.329Z","tsv":"'/naereen/badges/](https://badgen.net/badge/open%20source%20%3f/yes%21/blue?icon=github)':151 '/pypi/unstructured/](https://img.shields.io/pypi/l/unstructured.svg)':145 '/pypi/unstructured/](https://img.shields.io/pypi/pyversions/unstructured.svg)':148 '/skills/unstructured-document-partitioning-etl-library-llm-pipelines/)':200 '/unstructured-io/unstructured':184 '/unstructured-io/unstructured#run-the-library-in-a-container)':173 '/unstructured-io/unstructured/head/readme.md':191 '/unstructured-io/unstructured:latest':120,128 '3.11':103 'agent':195 'agentskillexchange.com':199 'agentskillexchange.com/skills/unstructured-document-partitioning-etl-library-llm-pipelines/)':198 'ai':49,99 'bash':133 'basic':152 'black':161 'build':137 'caveat':140 'common':37,87 'contain':170,180 'doc':188 'docker':116,121,129,136 'docker-build':135 'docs.unstructured.io':193 'document':2,24,52,74,192 'downloads.unstructured.io':119,127 'downloads.unstructured.io/unstructured-io/unstructured:latest':118,126 'downstream':48,98 'dt':123 'eight':159 'element':32,82 'email':25,75 'environ':115 'etl':5,55 'exchang':197 'exec':130 'extract':46,96,185 'get':156 'getting-start':155 'github.com':150,172,183 'github.com/naereen/badges/](https://badgen.net/badge/open%20source%20%3f/yes%21/blue?icon=github)':149 'github.com/unstructured-io/unstructured':182 'github.com/unstructured-io/unstructured#run-the-library-in-a-container)':171 'html':22,72 'ingest':18,68 'input':29,79 'instal':104,108 'layer':42,92 'librari':6,16,56,66,167,177 'llm':8,58 'make':134 'match':113 'metadata':34,84 'name':124 'note':158 'offic':23,73 'open':14,64 'open-sourc':13,63 'partit':3,20,53,70 'path':111 'pdfs':21,71 'pipelin':9,50,59,100 'point':160 'preprocess':41,91 'prerequisit':101 'pull':117 'pypi.python.org':144,147 'pypi.python.org/pypi/unstructured/](https://img.shields.io/pypi/l/unstructured.svg)':143 'pypi.python.org/pypi/unstructured/](https://img.shields.io/pypi/pyversions/unstructured.svg)':146 'python':102 'quick':163 'rag':44,94 'raw.githubusercontent.com':190 'raw.githubusercontent.com/unstructured-io/unstructured/head/readme.md':189 'requir':138 'run':122,165,175 'search':45,95 'setup':110 'skill':196 'skill-unstructured-document-partitioning-etl-library-llm-pipelines' 'sourc':15,65,181,194 'source-agentskillexchange' 'star':162 'start':157,164 'structur':31,81 'topic-agent-skills' 'topic-ai-agents' 'topic-ai-tools' 'topic-awesome-list' 'topic-claude-code' 'topic-codex' 'topic-cursor' 'topic-llm' 'topic-mcp' 'topic-npx-skills' 'topic-openclaw' 'topic-skills-catalog' 'unstructur':1,10,28,51,60,78,125,132 'upstream':107,142,187 'usag':153 'use':38,88,105","prices":[{"id":"487d30a5-fadd-4894-b1dd-4278cb002333","listingId":"25a95549-2d8c-48ae-9e6a-2ecf9b87e344","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"agentskillexchange","category":"skills","install_from":"skills.sh"},"createdAt":"2026-05-18T13:20:09.454Z"}],"sources":[{"listingId":"25a95549-2d8c-48ae-9e6a-2ecf9b87e344","source":"github","sourceId":"agentskillexchange/skills/unstructured-document-partitioning-etl-library-llm-pipelines","sourceUrl":"https://github.com/agentskillexchange/skills/tree/main/skills/unstructured-document-partitioning-etl-library-llm-pipelines","isPrimary":false,"firstSeenAt":"2026-05-18T13:20:09.454Z","lastSeenAt":"2026-05-18T19:12:59.329Z"}],"details":{"listingId":"25a95549-2d8c-48ae-9e6a-2ecf9b87e344","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"agentskillexchange","slug":"unstructured-document-partitioning-etl-library-llm-pipelines","github":{"repo":"agentskillexchange/skills","stars":8,"topics":["agent-skills","ai-agents","ai-tools","awesome-list","claude-code","codex","cursor","llm","mcp","npx-skills","openclaw","skills-catalog"],"license":"mit","html_url":"https://github.com/agentskillexchange/skills","pushed_at":"2026-05-18T19:02:17Z","description":"The open catalog of AI agent skills — 2,000+ security-scanned skills for Claude Code, Cursor, Codex, and more.","skill_md_sha":"c586c38a9b3ca0a686d1777668cc9c53eb66762b","skill_md_path":"skills/unstructured-document-partitioning-etl-library-llm-pipelines/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/agentskillexchange/skills/tree/main/skills/unstructured-document-partitioning-etl-library-llm-pipelines"},"layout":"multi","source":"github","category":"skills","frontmatter":{"name":"Unstructured Document Partitioning and ETL Library for LLM Pipelines","description":"Unstructured is an open-source library for ingesting and partitioning PDFs, HTML, Office documents, emails, and other unstructured inputs into structured elements and metadata. It is commonly used as a preprocessing layer for RAG, search, extraction, and downstream AI pipelines."},"skills_sh_url":"https://skills.sh/agentskillexchange/skills/unstructured-document-partitioning-etl-library-llm-pipelines"},"updatedAt":"2026-05-18T19:12:59.329Z"}}