{"id":"d620897d-5c91-40a9-a728-f80493f8d374","shortId":"Lw4hQu","kind":"skill","title":"Convert dense PDFs into LLM-ready text and page-aligned markdown with olmOCR","tagline":"Use olmOCR when an agent needs to turn scanned or layout-heavy documents into clean markdown or text before chunking, search, extraction, or citation workflows.","description":"# Convert dense PDFs into LLM-ready text and page-aligned markdown with olmOCR\n\nUse olmOCR when an agent needs to turn scanned or layout-heavy documents into clean markdown or text before chunking, search, extraction, or citation workflows.\n\n## Prerequisites\n\nPython 3.11, pip or conda, poppler-utils, optional NVIDIA GPU for local inference\n\n## Installation\n\nUse the upstream install or setup path that matches your environment:\n- conda create -n olmocr python=3.11\n- conda activate olmocr\n- pip install olmocr\n- pip install olmocr[gpu] --extra-index-url https://download.pytorch.org/whl/cu128\n\nRequirements and caveats from upstream:\n- (Based on a 7B parameter VLM, so it requires a GPU)\n- June 17, 2025 - v0.1.75 - Switch from sglang to vllm based inference pipeline, updated docker image to CUDA 12.8.\n- May 23, 2025 - v0.1.70 - Official docker support and images are now available! [See Docker usage](#using-docker)\n\nBasic usage or getting-started notes:\n- #### System Dependencies\n- You will need to install poppler-utils and additional fonts for rendering PDF images.\n- bash\n\n- Source: https://github.com/allenai/olmocr\n- Extracted from upstream docs: https://raw.githubusercontent.com/allenai/olmocr/HEAD/README.md\n\n## Documentation\n\n- https://github.com/allenai/olmocr#readme\n\n## Source\n\n- [Agent Skill Exchange](https://agentskillexchange.com/skills/convert-dense-pdfs-into-llm-ready-text-and-page-aligned-markdown-with-olmocr/)","tags":["convert","dense","pdfs","into","llm","ready","text","and","page","aligned","markdown","with"],"capabilities":["skill","source-agentskillexchange","skill-convert-dense-pdfs-into-llm-ready-text-and-page-aligned-markdown-with-olmocr","topic-agent-skills","topic-ai-agents","topic-ai-tools","topic-awesome-list","topic-claude-code","topic-codex","topic-cursor","topic-llm","topic-mcp","topic-npx-skills","topic-openclaw","topic-skills-catalog"],"categories":["skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/agentskillexchange/skills/convert-dense-pdfs-into-llm-ready-text-and-page-aligned-markdown-with-olmocr","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add agentskillexchange/skills","source_repo":"https://github.com/agentskillexchange/skills","install_from":"skills.sh"}},"qualityScore":"0.454","qualityRationale":"deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,438 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T19:09:56.559Z","embedding":null,"createdAt":"2026-05-18T13:15:51.045Z","updatedAt":"2026-05-18T19:09:56.559Z","lastSeenAt":"2026-05-18T19:09:56.559Z","tsv":"'/allenai/olmocr':213 '/allenai/olmocr#readme':224 '/allenai/olmocr/head/readme.md':220 '/skills/convert-dense-pdfs-into-llm-ready-text-and-page-aligned-markdown-with-olmocr/)':231 '/whl/cu128':132 '12.8':166 '17':150 '2025':151,169 '23':168 '3.11':85,115 '7b':141 'activ':117 'addit':203 'agent':20,61,226 'agentskillexchange.com':230 'agentskillexchange.com/skills/convert-dense-pdfs-into-llm-ready-text-and-page-aligned-markdown-with-olmocr/)':229 'align':12,53 'avail':178 'base':138,158 'bash':209 'basic':185 'caveat':135 'chunk':36,77 'citat':40,81 'clean':31,72 'conda':88,110,116 'convert':1,42 'creat':111 'cuda':165 'dens':2,43 'depend':193 'doc':217 'docker':162,172,180,184 'document':29,70,221 'download.pytorch.org':131 'download.pytorch.org/whl/cu128':130 'environ':109 'exchang':228 'extra':127 'extra-index-url':126 'extract':38,79,214 'font':204 'get':189 'getting-start':188 'github.com':212,223 'github.com/allenai/olmocr':211 'github.com/allenai/olmocr#readme':222 'gpu':94,125,148 'heavi':28,69 'imag':163,175,208 'index':128 'infer':97,159 'instal':98,102,120,123,198 'june':149 'layout':27,68 'layout-heavi':26,67 'llm':6,47 'llm-readi':5,46 'local':96 'markdown':13,32,54,73 'match':107 'may':167 'n':112 'need':21,62,196 'note':191 'nvidia':93 'offici':171 'olmocr':15,17,56,58,113,118,121,124 'option':92 'page':11,52 'page-align':10,51 'paramet':142 'path':105 'pdf':207 'pdfs':3,44 'pip':86,119,122 'pipelin':160 'poppler':90,200 'poppler-util':89,199 'prerequisit':83 'python':84,114 'raw.githubusercontent.com':219 'raw.githubusercontent.com/allenai/olmocr/head/readme.md':218 'readi':7,48 'render':206 'requir':133,146 'scan':24,65 'search':37,78 'see':179 'setup':104 'sglang':155 'skill':227 'skill-convert-dense-pdfs-into-llm-ready-text-and-page-aligned-markdown-with-olmocr' 'sourc':210,225 'source-agentskillexchange' 'start':190 'support':173 'switch':153 'system':192 'text':8,34,49,75 'topic-agent-skills' 'topic-ai-agents' 'topic-ai-tools' 'topic-awesome-list' 'topic-claude-code' 'topic-codex' 'topic-cursor' 'topic-llm' 'topic-mcp' 'topic-npx-skills' 'topic-openclaw' 'topic-skills-catalog' 'turn':23,64 'updat':161 'upstream':101,137,216 'url':129 'usag':181,186 'use':16,57,99,183 'using-dock':182 'util':91,201 'v0.1.70':170 'v0.1.75':152 'vllm':157 'vlm':143 'workflow':41,82","prices":[{"id":"ebab1792-9232-4213-9a44-8bade8170a99","listingId":"d620897d-5c91-40a9-a728-f80493f8d374","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"agentskillexchange","category":"skills","install_from":"skills.sh"},"createdAt":"2026-05-18T13:15:51.045Z"}],"sources":[{"listingId":"d620897d-5c91-40a9-a728-f80493f8d374","source":"github","sourceId":"agentskillexchange/skills/convert-dense-pdfs-into-llm-ready-text-and-page-aligned-markdown-with-olmocr","sourceUrl":"https://github.com/agentskillexchange/skills/tree/main/skills/convert-dense-pdfs-into-llm-ready-text-and-page-aligned-markdown-with-olmocr","isPrimary":false,"firstSeenAt":"2026-05-18T13:15:51.045Z","lastSeenAt":"2026-05-18T19:09:56.559Z"}],"details":{"listingId":"d620897d-5c91-40a9-a728-f80493f8d374","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"agentskillexchange","slug":"convert-dense-pdfs-into-llm-ready-text-and-page-aligned-markdown-with-olmocr","github":{"repo":"agentskillexchange/skills","stars":8,"topics":["agent-skills","ai-agents","ai-tools","awesome-list","claude-code","codex","cursor","llm","mcp","npx-skills","openclaw","skills-catalog"],"license":"mit","html_url":"https://github.com/agentskillexchange/skills","pushed_at":"2026-05-18T19:02:17Z","description":"The open catalog of AI agent skills — 2,000+ security-scanned skills for Claude Code, Cursor, Codex, and more.","skill_md_sha":"ec607c29f8458a5532fd1287f4d137561ce5fc0b","skill_md_path":"skills/convert-dense-pdfs-into-llm-ready-text-and-page-aligned-markdown-with-olmocr/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/agentskillexchange/skills/tree/main/skills/convert-dense-pdfs-into-llm-ready-text-and-page-aligned-markdown-with-olmocr"},"layout":"multi","source":"github","category":"skills","frontmatter":{"name":"Convert dense PDFs into LLM-ready text and page-aligned markdown with olmOCR","description":"Use olmOCR when an agent needs to turn scanned or layout-heavy documents into clean markdown or text before chunking, search, extraction, or citation workflows."},"skills_sh_url":"https://skills.sh/agentskillexchange/skills/convert-dense-pdfs-into-llm-ready-text-and-page-aligned-markdown-with-olmocr"},"updatedAt":"2026-05-18T19:09:56.559Z"}}