{"id":"8bc8654e-a875-4d37-9d37-38f857c69a82","shortId":"T3CDwY","kind":"skill","title":"Extract structured text, metadata, tables, and images from mixed documents through an MCP server with Kreuzberg","tagline":"Expose one document-extraction surface to MCP-compatible agents so they can normalize PDFs, Office files, images, HTML, and other mixed inputs before downstream review or indexing.","description":"# Extract structured text, metadata, tables, and images from mixed documents through an MCP server with Kreuzberg\n\nExpose one document-extraction surface to MCP-compatible agents so they can normalize PDFs, Office files, images, HTML, and other mixed inputs before downstream review or indexing.\n\n## Prerequisites\n\nKreuzberg install or container image, document files to process, MCP-compatible client\n\n## Installation\n\nUse the upstream install or setup path that matches your environment:\n- npx skills add kreuzberg-dev/kreuzberg\n\nRequirements and caveats from upstream:\n- <img src=\"https://img.shields.io/pypi/v/kreuzberg?label=Python&color=007ec6\" alt=\"Python\">\n- <a href=\"https://www.npmjs.com/package/@kreuzberg/node\">\n- <img src=\"https://img.shields.io/npm/v/@kreuzberg/node?label=Node.js&color=007ec6\" alt=\"Node.js\">\n\nBasic usage or getting-started notes:\n- Each language binding provides comprehensive documentation with examples and best practices. Choose your platform to get started:\n- **Scripting Languages:**\n- **[Ruby](https://github.com/kreuzberg-dev/kreuzberg/tree/main/packages/ruby)** – RubyGems package, idiomatic Ruby API, native bindings\n\n- Source: https://github.com/kreuzberg-dev/kreuzberg\n- Extracted from upstream docs: https://raw.githubusercontent.com/kreuzberg-dev/kreuzberg/HEAD/README.md\n\n## Documentation\n\n- https://github.com/kreuzberg-dev/kreuzberg#readme\n\n## Source\n\n- [Agent Skill Exchange](https://agentskillexchange.com/skills/extract-structured-text-metadata-tables-and-images-from-mixed-documents-through-an-mcp-server-with-kreuzberg/)","tags":["extract","structured","text","metadata","tables","and","images","from","mixed","documents","through","mcp"],"capabilities":["skill","source-agentskillexchange","skill-extract-structured-text-metadata-tables-and-images-from-mixed-documents-through-an-mcp-server-with-kreuzberg","topic-agent-skills","topic-ai-agents","topic-ai-tools","topic-awesome-list","topic-claude-code","topic-codex","topic-cursor","topic-llm","topic-mcp","topic-npx-skills","topic-openclaw","topic-skills-catalog"],"categories":["skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/agentskillexchange/skills/extract-structured-text-metadata-tables-and-images-from-mixed-documents-through-an-mcp-server-with-kreuzberg","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add agentskillexchange/skills","source_repo":"https://github.com/agentskillexchange/skills","install_from":"skills.sh"}},"qualityScore":"0.454","qualityRationale":"deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,574 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T19:10:25.022Z","embedding":null,"createdAt":"2026-05-18T13:16:29.590Z","updatedAt":"2026-05-18T19:10:25.022Z","lastSeenAt":"2026-05-18T19:10:25.022Z","tsv":"'/kreuzberg':123 '/kreuzberg-dev/kreuzberg':169 '/kreuzberg-dev/kreuzberg#readme':180 '/kreuzberg-dev/kreuzberg/head/readme.md':176 '/kreuzberg-dev/kreuzberg/tree/main/packages/ruby)**':158 '/skills/extract-structured-text-metadata-tables-and-images-from-mixed-documents-through-an-mcp-server-with-kreuzberg/)':187 'add':119 'agent':27,72,182 'agentskillexchange.com':186 'agentskillexchange.com/skills/extract-structured-text-metadata-tables-and-images-from-mixed-documents-through-an-mcp-server-with-kreuzberg/)':185 'api':163 'basic':129 'best':145 'bind':138,165 'caveat':126 'choos':147 'client':104 'compat':26,71,103 'comprehens':140 'contain':95 'dev':122 'doc':173 'document':10,20,55,65,97,141,177 'document-extract':19,64 'downstream':42,87 'environ':116 'exampl':143 'exchang':184 'expos':17,62 'extract':1,21,46,66,170 'file':34,79,98 'get':133,151 'getting-start':132 'github.com':157,168,179 'github.com/kreuzberg-dev/kreuzberg':167 'github.com/kreuzberg-dev/kreuzberg#readme':178 'github.com/kreuzberg-dev/kreuzberg/tree/main/packages/ruby)**':156 'html':36,81 'idiomat':161 'imag':7,35,52,80,96 'index':45,90 'input':40,85 'instal':93,105,109 'kreuzberg':16,61,92,121 'kreuzberg-dev':120 'languag':137,154 'match':114 'mcp':13,25,58,70,102 'mcp-compat':24,69,101 'metadata':4,49 'mix':9,39,54,84 'nativ':164 'normal':31,76 'note':135 'npx':117 'offic':33,78 'one':18,63 'packag':160 'path':112 'pdfs':32,77 'platform':149 'practic':146 'prerequisit':91 'process':100 'provid':139 'raw.githubusercontent.com':175 'raw.githubusercontent.com/kreuzberg-dev/kreuzberg/head/readme.md':174 'requir':124 'review':43,88 'rubi':155,162 'rubygem':159 'script':153 'server':14,59 'setup':111 'skill':118,183 'skill-extract-structured-text-metadata-tables-and-images-from-mixed-documents-through-an-mcp-server-with-kreuzberg' 'sourc':166,181 'source-agentskillexchange' 'start':134,152 'structur':2,47 'surfac':22,67 'tabl':5,50 'text':3,48 'topic-agent-skills' 'topic-ai-agents' 'topic-ai-tools' 'topic-awesome-list' 'topic-claude-code' 'topic-codex' 'topic-cursor' 'topic-llm' 'topic-mcp' 'topic-npx-skills' 'topic-openclaw' 'topic-skills-catalog' 'upstream':108,128,172 'usag':130 'use':106","prices":[{"id":"8c85c052-03a6-4861-9909-ba217c0e4b51","listingId":"8bc8654e-a875-4d37-9d37-38f857c69a82","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"agentskillexchange","category":"skills","install_from":"skills.sh"},"createdAt":"2026-05-18T13:16:29.590Z"}],"sources":[{"listingId":"8bc8654e-a875-4d37-9d37-38f857c69a82","source":"github","sourceId":"agentskillexchange/skills/extract-structured-text-metadata-tables-and-images-from-mixed-documents-through-an-mcp-server-with-kreuzberg","sourceUrl":"https://github.com/agentskillexchange/skills/tree/main/skills/extract-structured-text-metadata-tables-and-images-from-mixed-documents-through-an-mcp-server-with-kreuzberg","isPrimary":false,"firstSeenAt":"2026-05-18T13:16:29.590Z","lastSeenAt":"2026-05-18T19:10:25.022Z"}],"details":{"listingId":"8bc8654e-a875-4d37-9d37-38f857c69a82","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"agentskillexchange","slug":"extract-structured-text-metadata-tables-and-images-from-mixed-documents-through-an-mcp-server-with-kreuzberg","github":{"repo":"agentskillexchange/skills","stars":8,"topics":["agent-skills","ai-agents","ai-tools","awesome-list","claude-code","codex","cursor","llm","mcp","npx-skills","openclaw","skills-catalog"],"license":"mit","html_url":"https://github.com/agentskillexchange/skills","pushed_at":"2026-05-18T19:02:17Z","description":"The open catalog of AI agent skills — 2,000+ security-scanned skills for Claude Code, Cursor, Codex, and more.","skill_md_sha":"ca45ff29016b877308179a260f75ff5a5289a1c0","skill_md_path":"skills/extract-structured-text-metadata-tables-and-images-from-mixed-documents-through-an-mcp-server-with-kreuzberg/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/agentskillexchange/skills/tree/main/skills/extract-structured-text-metadata-tables-and-images-from-mixed-documents-through-an-mcp-server-with-kreuzberg"},"layout":"multi","source":"github","category":"skills","frontmatter":{"name":"Extract structured text, metadata, tables, and images from mixed documents through an MCP server with Kreuzberg","description":"Expose one document-extraction surface to MCP-compatible agents so they can normalize PDFs, Office files, images, HTML, and other mixed inputs before downstream review or indexing."},"skills_sh_url":"https://skills.sh/agentskillexchange/skills/extract-structured-text-metadata-tables-and-images-from-mixed-documents-through-an-mcp-server-with-kreuzberg"},"updatedAt":"2026-05-18T19:10:25.022Z"}}