{"id":"d10bc60c-c4ad-4f3c-9d7f-92d702fa8350","shortId":"jvZfDW","kind":"skill","title":"Turn messy document collections into structured rows with DocETL","tagline":"Define repeatable extraction pipelines that pull fields from large document collections, normalize outputs, and audit failures across the corpus.","description":"# Turn messy document collections into structured rows with DocETL\n\nDefine repeatable extraction pipelines that pull fields from large document collections, normalize outputs, and audit failures across the corpus.\n\n## Prerequisites\n\nPython 3.10+, DocETL, document corpus, extraction configuration\n\n## Installation\n\nUse the upstream install or setup path that matches your environment:\n- Use Docker (recommended for quick start): make docker\n- pip install docetl\n- Run Docker:\n- make docker\n\nRequirements and caveats from upstream:\n- A Python package for running production pipelines from the command line or Python code\n- ### 2. 📦 Python Package (For Production Use)\n- If you want to use DocETL as a Python package:\n\nBasic usage or getting-started notes:\n- ## 🚀 Getting Started\n- DocWrangler is hosted at [docetl.org/playground](https://docetl.org/playground). But to run the playground locally, you can either:\n- OpenAI API key\n\n- Source: https://github.com/ucbepic/docetl\n- Extracted from upstream docs: https://raw.githubusercontent.com/ucbepic/docetl/HEAD/README.md\n\n## Documentation\n\n- https://docetl.org/\n\n## Source\n\n- [Agent Skill Exchange](https://agentskillexchange.com/skills/turn-messy-document-collections-into-structured-rows-with-docetl/)","tags":["turn","messy","document","collections","into","structured","rows","with","docetl","skills","agentskillexchange","agent-skills"],"capabilities":["skill","source-agentskillexchange","skill-turn-messy-document-collections-into-structured-rows-with-docetl","topic-agent-skills","topic-ai-agents","topic-ai-tools","topic-awesome-list","topic-claude-code","topic-codex","topic-cursor","topic-llm","topic-mcp","topic-npx-skills","topic-openclaw","topic-skills-catalog"],"categories":["skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/agentskillexchange/skills/turn-messy-document-collections-into-structured-rows-with-docetl","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add agentskillexchange/skills","source_repo":"https://github.com/agentskillexchange/skills","install_from":"skills.sh"}},"qualityScore":"0.454","qualityRationale":"deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,254 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T19:12:56.725Z","embedding":null,"createdAt":"2026-05-18T13:20:05.843Z","updatedAt":"2026-05-18T19:12:56.725Z","lastSeenAt":"2026-05-18T19:12:56.725Z","tsv":"'/playground](https://docetl.org/playground).':142 '/skills/turn-messy-document-collections-into-structured-rows-with-docetl/)':174 '/ucbepic/docetl':158 '/ucbepic/docetl/head/readme.md':165 '2':111 '3.10':59 'across':26,54 'agent':169 'agentskillexchange.com':173 'agentskillexchange.com/skills/turn-messy-document-collections-into-structured-rows-with-docetl/)':172 'api':153 'audit':24,52 'basic':127 'caveat':94 'code':110 'collect':4,20,32,48 'command':106 'configur':64 'corpus':28,56,62 'defin':10,38 'doc':162 'docetl':9,37,60,87,122 'docetl.org':141,167 'docetl.org/playground](https://docetl.org/playground).':140 'docker':78,84,89,91 'document':3,19,31,47,61,166 'docwrangl':136 'either':151 'environ':76 'exchang':171 'extract':12,40,63,159 'failur':25,53 'field':16,44 'get':131,134 'getting-start':130 'github.com':157 'github.com/ucbepic/docetl':156 'host':138 'instal':65,69,86 'key':154 'larg':18,46 'line':107 'local':148 'make':83,90 'match':74 'messi':2,30 'normal':21,49 'note':133 'openai':152 'output':22,50 'packag':99,113,126 'path':72 'pip':85 'pipelin':13,41,103 'playground':147 'prerequisit':57 'product':102,115 'pull':15,43 'python':58,98,109,112,125 'quick':81 'raw.githubusercontent.com':164 'raw.githubusercontent.com/ucbepic/docetl/head/readme.md':163 'recommend':79 'repeat':11,39 'requir':92 'row':7,35 'run':88,101,145 'setup':71 'skill':170 'skill-turn-messy-document-collections-into-structured-rows-with-docetl' 'sourc':155,168 'source-agentskillexchange' 'start':82,132,135 'structur':6,34 'topic-agent-skills' 'topic-ai-agents' 'topic-ai-tools' 'topic-awesome-list' 'topic-claude-code' 'topic-codex' 'topic-cursor' 'topic-llm' 'topic-mcp' 'topic-npx-skills' 'topic-openclaw' 'topic-skills-catalog' 'turn':1,29 'upstream':68,96,161 'usag':128 'use':66,77,116,121 'want':119","prices":[{"id":"2908eb1e-2e4b-471c-84fc-cb780d6a8877","listingId":"d10bc60c-c4ad-4f3c-9d7f-92d702fa8350","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"agentskillexchange","category":"skills","install_from":"skills.sh"},"createdAt":"2026-05-18T13:20:05.843Z"}],"sources":[{"listingId":"d10bc60c-c4ad-4f3c-9d7f-92d702fa8350","source":"github","sourceId":"agentskillexchange/skills/turn-messy-document-collections-into-structured-rows-with-docetl","sourceUrl":"https://github.com/agentskillexchange/skills/tree/main/skills/turn-messy-document-collections-into-structured-rows-with-docetl","isPrimary":false,"firstSeenAt":"2026-05-18T13:20:05.843Z","lastSeenAt":"2026-05-18T19:12:56.725Z"}],"details":{"listingId":"d10bc60c-c4ad-4f3c-9d7f-92d702fa8350","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"agentskillexchange","slug":"turn-messy-document-collections-into-structured-rows-with-docetl","github":{"repo":"agentskillexchange/skills","stars":8,"topics":["agent-skills","ai-agents","ai-tools","awesome-list","claude-code","codex","cursor","llm","mcp","npx-skills","openclaw","skills-catalog"],"license":"mit","html_url":"https://github.com/agentskillexchange/skills","pushed_at":"2026-05-18T19:02:17Z","description":"The open catalog of AI agent skills — 2,000+ security-scanned skills for Claude Code, Cursor, Codex, and more.","skill_md_sha":"29e46fe8d00ddc66d36fa196fcf1c3cce29c4cca","skill_md_path":"skills/turn-messy-document-collections-into-structured-rows-with-docetl/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/agentskillexchange/skills/tree/main/skills/turn-messy-document-collections-into-structured-rows-with-docetl"},"layout":"multi","source":"github","category":"skills","frontmatter":{"name":"Turn messy document collections into structured rows with DocETL","description":"Define repeatable extraction pipelines that pull fields from large document collections, normalize outputs, and audit failures across the corpus."},"skills_sh_url":"https://skills.sh/agentskillexchange/skills/turn-messy-document-collections-into-structured-rows-with-docetl"},"updatedAt":"2026-05-18T19:12:56.725Z"}}