{"id":"7f1014cf-f25f-4e09-85a2-6c5ee8d44720","shortId":"XDFr4g","kind":"skill","title":"Unstructured Document ETL for LLM Pipelines","tagline":"Unstructured is an open source document processing library that converts PDFs, HTML, Office files, emails, and other formats into structured data for downstream AI workflows. It is a practical intake layer for extraction, chunking, and preprocessing before embeddings, search, or ","description":"# Unstructured Document ETL for LLM Pipelines\n\nUnstructured is an open source document processing library that converts PDFs, HTML, Office files, emails, and other formats into structured data for downstream AI workflows. It is a practical intake layer for extraction, chunking, and preprocessing before embeddings, search, or agent use.\n\n## Prerequisites\n\nbun, python, pip, uv, docker, go\n\n## Installation\n\nUse the upstream install or setup path that matches your environment:\n- docker pull downloads.unstructured.io/unstructured-io/unstructured:latest\n- docker run -dt --name unstructured downloads.unstructured.io/unstructured-io/unstructured:latest\n- docker exec -it unstructured bash\n- make docker-build\n\nRequirements and caveats from upstream:\n- <a href=\"https://github.com/Unstructured-IO/unstructured/blob/main/LICENSE.md\">![https://pypi.python.org/pypi/unstructured/](https://img.shields.io/pypi/l/unstructured.svg)</a>\n- <a href=\"https://pypi.python.org/pypi/unstructured/\">![https://pypi.python.org/pypi/unstructured/](https://img.shields.io/pypi/pyversions/unstructured.svg)</a>\n- <a href=\"https://pypi.python.org/pypi/unstructured/\">![https://github.com/Naereen/badges/](https://badgen.net/badge/Open%20Source%20%3F/Yes%21/blue?icon=github)</a>\n\nBasic usage or getting-started notes:\n- ## :eight_pointed_black_star: Quick Start\n- [Run the library in a container](https://github.com/Unstructured-IO/unstructured#run-the-library-in-a-container) or\n- ### Run the library in a container\n\n- Source: https://github.com/Unstructured-IO/unstructured\n- Extracted from upstream docs: https://raw.githubusercontent.com/Unstructured-IO/unstructured/HEAD/README.md\n\n## Documentation\n\n- https://unstructured-io.github.io/unstructured/installing.html#installation-with-conda-on-windows\n\n## Source\n\n- [Agent Skill Exchange](https://agentskillexchange.com/skills/unstructured-document-etl-for-llm-pipelines/)","tags":["unstructured","document","etl","for","llm","pipelines","skills","agentskillexchange","agent-skills","ai-agents","ai-tools","awesome-list"],"capabilities":["skill","source-agentskillexchange","skill-unstructured-document-etl-for-llm-pipelines","topic-agent-skills","topic-ai-agents","topic-ai-tools","topic-awesome-list","topic-claude-code","topic-codex","topic-cursor","topic-llm","topic-mcp","topic-npx-skills","topic-openclaw","topic-skills-catalog"],"categories":["skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/agentskillexchange/skills/unstructured-document-etl-for-llm-pipelines","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add agentskillexchange/skills","source_repo":"https://github.com/agentskillexchange/skills","install_from":"skills.sh"}},"qualityScore":"0.454","qualityRationale":"deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,904 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T19:12:59.111Z","embedding":null,"createdAt":"2026-05-18T13:20:09.173Z","updatedAt":"2026-05-18T19:12:59.111Z","lastSeenAt":"2026-05-18T19:12:59.111Z","tsv":"'/naereen/badges/](https://badgen.net/badge/open%20source%20%3f/yes%21/blue?icon=github)':149 '/pypi/unstructured/](https://img.shields.io/pypi/l/unstructured.svg)':143 '/pypi/unstructured/](https://img.shields.io/pypi/pyversions/unstructured.svg)':146 '/skills/unstructured-document-etl-for-llm-pipelines/)':200 '/unstructured-io/unstructured':182 '/unstructured-io/unstructured#run-the-library-in-a-container)':171 '/unstructured-io/unstructured/head/readme.md':189 '/unstructured-io/unstructured:latest':118,126 '/unstructured/installing.html#installation-with-conda-on-windows':193 'agent':93,195 'agentskillexchange.com':199 'agentskillexchange.com/skills/unstructured-document-etl-for-llm-pipelines/)':198 'ai':30,76 'bash':131 'basic':150 'black':159 'build':135 'bun':96 'caveat':138 'chunk':40,86 'contain':168,178 'convert':16,62 'data':27,73 'doc':186 'docker':100,114,119,127,134 'docker-build':133 'document':2,12,48,58,190 'downloads.unstructured.io':117,125 'downloads.unstructured.io/unstructured-io/unstructured:latest':116,124 'downstream':29,75 'dt':121 'eight':157 'email':21,67 'embed':44,90 'environ':113 'etl':3,49 'exchang':197 'exec':128 'extract':39,85,183 'file':20,66 'format':24,70 'get':154 'getting-start':153 'github.com':148,170,181 'github.com/naereen/badges/](https://badgen.net/badge/open%20source%20%3f/yes%21/blue?icon=github)':147 'github.com/unstructured-io/unstructured':180 'github.com/unstructured-io/unstructured#run-the-library-in-a-container)':169 'go':101 'html':18,64 'instal':102,106 'intak':36,82 'layer':37,83 'librari':14,60,165,175 'llm':5,51 'make':132 'match':111 'name':122 'note':156 'offic':19,65 'open':10,56 'path':109 'pdfs':17,63 'pip':98 'pipelin':6,52 'point':158 'practic':35,81 'preprocess':42,88 'prerequisit':95 'process':13,59 'pull':115 'pypi.python.org':142,145 'pypi.python.org/pypi/unstructured/](https://img.shields.io/pypi/l/unstructured.svg)':141 'pypi.python.org/pypi/unstructured/](https://img.shields.io/pypi/pyversions/unstructured.svg)':144 'python':97 'quick':161 'raw.githubusercontent.com':188 'raw.githubusercontent.com/unstructured-io/unstructured/head/readme.md':187 'requir':136 'run':120,163,173 'search':45,91 'setup':108 'skill':196 'skill-unstructured-document-etl-for-llm-pipelines' 'sourc':11,57,179,194 'source-agentskillexchange' 'star':160 'start':155,162 'structur':26,72 'topic-agent-skills' 'topic-ai-agents' 'topic-ai-tools' 'topic-awesome-list' 'topic-claude-code' 'topic-codex' 'topic-cursor' 'topic-llm' 'topic-mcp' 'topic-npx-skills' 'topic-openclaw' 'topic-skills-catalog' 'unstructur':1,7,47,53,123,130 'unstructured-io.github.io':192 'unstructured-io.github.io/unstructured/installing.html#installation-with-conda-on-windows':191 'upstream':105,140,185 'usag':151 'use':94,103 'uv':99 'workflow':31,77","prices":[{"id":"19501ee8-4df8-4884-b4f9-04846919ca55","listingId":"7f1014cf-f25f-4e09-85a2-6c5ee8d44720","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"agentskillexchange","category":"skills","install_from":"skills.sh"},"createdAt":"2026-05-18T13:20:09.173Z"}],"sources":[{"listingId":"7f1014cf-f25f-4e09-85a2-6c5ee8d44720","source":"github","sourceId":"agentskillexchange/skills/unstructured-document-etl-for-llm-pipelines","sourceUrl":"https://github.com/agentskillexchange/skills/tree/main/skills/unstructured-document-etl-for-llm-pipelines","isPrimary":false,"firstSeenAt":"2026-05-18T13:20:09.173Z","lastSeenAt":"2026-05-18T19:12:59.111Z"}],"details":{"listingId":"7f1014cf-f25f-4e09-85a2-6c5ee8d44720","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"agentskillexchange","slug":"unstructured-document-etl-for-llm-pipelines","github":{"repo":"agentskillexchange/skills","stars":8,"topics":["agent-skills","ai-agents","ai-tools","awesome-list","claude-code","codex","cursor","llm","mcp","npx-skills","openclaw","skills-catalog"],"license":"mit","html_url":"https://github.com/agentskillexchange/skills","pushed_at":"2026-05-18T19:02:17Z","description":"The open catalog of AI agent skills — 2,000+ security-scanned skills for Claude Code, Cursor, Codex, and more.","skill_md_sha":"b5c5c720c2ea2b65b928948b264ca72f9eb271b9","skill_md_path":"skills/unstructured-document-etl-for-llm-pipelines/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/agentskillexchange/skills/tree/main/skills/unstructured-document-etl-for-llm-pipelines"},"layout":"multi","source":"github","category":"skills","frontmatter":{"name":"Unstructured Document ETL for LLM Pipelines","description":"Unstructured is an open source document processing library that converts PDFs, HTML, Office files, emails, and other formats into structured data for downstream AI workflows. It is a practical intake layer for extraction, chunking, and preprocessing before embeddings, search, or agent use."},"skills_sh_url":"https://skills.sh/agentskillexchange/skills/unstructured-document-etl-for-llm-pipelines"},"updatedAt":"2026-05-18T19:12:59.111Z"}}