{"id":"38fc7c36-c480-475a-93b2-8cb798c7b9f8","shortId":"NJdhFX","kind":"skill","title":"arxiv-doc-builder","tagline":"Convert arXiv papers to Markdown documentation. Fetches available materials from arXiv (LaTeX source when available + PDF), converts LaTeX to Markdown via pandoc (happy path). PDF-only papers get a naive single-column fallback — use the specialized PDF scripts for better results.","description":"# arXiv Document Builder\n\nAutomatically converts arXiv papers into structured Markdown documentation for implementation reference.\n\n## Capabilities\n\nThis skill automatically:\n\n1. **Fetches paper materials from arXiv**\n   - Attempts to download LaTeX source (preferred) and PDF (idempotent — skips if cached)\n   - Handles all HTTP requests, extraction, and directory setup\n\n2. **Converts LaTeX source to structured Markdown** (happy path)\n   - LaTeX source → Markdown via pandoc (preserves all math and structure)\n   - Preserves mathematical formulas in MathJax/LaTeX format (`$...$`, `$$...$$`)\n   - Maintains section hierarchy and document structure\n   - Includes abstracts, figures, and references\n\n3. **PDF fallback** (naive — output quality must be verified)\n   - When no LaTeX source is available, `convert-paper` runs `convert_pdf_simple.py` (single-column pdfplumber extraction) as a best-effort fallback\n   - This produces usable output only for simple, single-column papers\n   - For 2-column papers, math-heavy papers, or complex layouts, inspect the output and use the specialized PDF scripts manually (see below)\n\n4. **Generates implementation-ready documentation**\n   - Output saved to `{ARXIV_ID}/{ARXIV_ID}.md` under the output directory (default: current working directory)\n   - Easy to reference during code implementation\n   - Optimized for Claude to read and understand\n\n## When to Use This Skill\n\nInvoke this skill when the user requests:\n- \"Convert arXiv paper {ID} to markdown\"\n- \"Fetch and process paper {ID}\"\n- \"Create documentation for arXiv:{ID}\"\n- \"I need to read/reference paper {ID}\"\n\n## How It Works\n\n### Single Entry Point\n\nUse the main orchestrator script or the globally installed `convert-paper` command:\n\n```bash\n# Using global command (recommended)\nconvert-paper ARXIV_ID [--output-dir DIR]\n\n# Using script directly\nuv run arxiv_doc_builder/convert_paper.py ARXIV_ID [--output-dir DIR]\n```\n\n- `--output-dir`: Directory where `{ARXIV_ID}/{ARXIV_ID}.md` will be created. **Default: current working directory** (not a `papers/` subdirectory).\n- Use absolute paths to control output location precisely.\n\nThe orchestrator:\n1. Calls `fetch_paper.py` to download available materials — source if available + PDF (idempotent — cached files are reused)\n2. Detects available format (LaTeX source or PDF)\n3. Calls the appropriate converter (`convert_latex.py` or `convert_pdf_simple.py`)\n4. Outputs structured Markdown to `{output-dir}/{ARXIV_ID}/{ARXIV_ID}.md`\n\nAll HTTP requests (curl), file extraction (tar), and directory creation (mkdir) are handled automatically.\n\n### Source Detection\n\n- **LaTeX source available**: Converts with pandoc — this is the reliable path\n- **PDF only**: Falls back to naive single-column text extraction. Output quality varies and should be inspected. For better results, use the specialized PDF scripts below\n\n## Output Structure\n\nGenerated Markdown includes:\n- Title, authors, and abstract\n- Full paper content with section hierarchy\n- Inline math: `$f(x) = x^2$`\n- Display math: `$$\\int_0^\\infty e^{-x} dx = 1$$`\n- Preserved LaTeX commands for complex formulas\n- References section\n\nOutput location: `{output-dir}/{ARXIV_ID}/{ARXIV_ID}.md` (default output-dir is current working directory)\n\n## PDF Conversion Scripts\n\n`convert-paper` only calls `convert_pdf_simple.py` as a naive fallback. The other scripts below are for manual or agent-driven use when the naive output is insufficient. Iterate by trying different scripts and inspecting results.\n\n### convert_pdf_simple.py\n\nConvert all pages as single-column layout.\n\n```bash\nuv run arxiv_doc_builder/convert_pdf_simple.py paper.pdf -o output.md\n```\n\n### convert_pdf_double_column.py\n\nConvert all pages as double-column layout (for academic papers).\n\n```bash\nuv run arxiv_doc_builder/convert_pdf_double_column.py paper.pdf -o output.md\n```\n\n### convert_pdf_extract.py\n\nExtract specific pages with optional double-column processing.\n\n```bash\n# Extract specific pages\nuv run arxiv_doc_builder/convert_pdf_extract.py paper.pdf --pages 1-5,10 -o output.md\n\n# Extract with mixed column layouts\nuv run arxiv_doc_builder/convert_pdf_extract.py paper.pdf --pages 1-10 --double-column-pages 3-7 -o output.md\n```\n\n**Note:** `--double-column-pages` must be a subset of `--pages`. Invalid page ranges cause immediate error.\n\n### Architecture\n\nAll three scripts share common conversion logic through `pdf_converter_lib.py`, ensuring consistent behavior while keeping each script focused on its specific use case.\n\n## Advanced: Vision-Based PDF Conversion\n\nFor papers with complex mathematical formulas where text extraction fails, a vision-based approach is available as a manual fallback:\n\n```bash\n# Generate high-resolution images from PDF\npython arxiv_doc_builder/convert_pdf_with_vision.py paper.pdf --dpi 300 --columns 2\n```\n\nThis creates page images (with optional column splitting) that can be read manually with Claude's vision capabilities for maximum accuracy. This is NOT part of the automatic workflow—use it only when automatic conversion produces poor results.\n\n### PDF Conversion Quality\n\nPDF conversion is inherently lossy:\n- Math formulas are not in LaTeX format\n- Complex layouts (2-column with column-spanning elements) may break reading order\n- Tables may need manual fixing\n- References may be malformed\n\nPDF conversion is acceptable when no LaTeX source is available and the paper is primarily text. For math-heavy papers, use the vision-based approach above or keep the PDF as the primary reference.\n\n**Fallback strategy for complex papers:**\n1. Extract structure and text via `convert_pdf_simple.py`\n2. Keep PDF link for reference\n3. Use vision-based conversion for pages with dense math\n4. Focus on readable prose sections\n\n## Troubleshooting: Multiple \\documentclass Files\n\nSome arXiv papers (e.g., PRL with supplemental material) contain multiple `.tex` files, each with its own `\\documentclass`. Automatic selection is unreliable in this case — the canonical example is `1911.04882`, which ships both the main PRL paper and an independent PRL supplement, and either can convert successfully. Since pandoc succeeding is not evidence that the selected file is the correct entry point, `convert-paper` refuses to guess: it fails explicitly with **exit code 2** and lists all candidates.\n\nExample failure output:\n\n```\nError: Found 2 files with \\documentclass in /path/to/1911.04882/source:\n  - /path/to/1911.04882/source/main_paper.tex\n  - /path/to/1911.04882/source/supplemental_material.tex\n\nMain .tex selection is ambiguous. Re-run with --tex-file pointing at the correct file, e.g.:\n  convert-paper <ARXIV_ID> --tex-file /path/to/1911.04882/source/main_paper.tex\n\nIf you originally passed --output-dir, include the same value in the re-run.\n```\n\nTo resolve, re-run `convert-paper` with `--tex-file` pointing at the correct main file. The fetch step is idempotent, so the already-downloaded source is reused without touching the network:\n\n```bash\nconvert-paper 1911.04882 --tex-file /path/to/1911.04882/source/main_paper.tex\n```\n\nIf the original run used `--output-dir`, pass the same value again so that `convert-paper` reconstructs the correct paper directory.\n\n## Troubleshooting: pandoc Conversion Failures\n\nWhen pandoc fails on a LaTeX source, the error may point to `\\end{document}` with `unexpected \\end`. This means pandoc's parser broke down due to a syntax issue elsewhere — `\\end{document}` itself is not the cause. Do NOT attempt broad preprocessing (replacing documentclass, expanding `\\newcommand`, removing environments, etc.) — pandoc handles revtex4/revtex4-2, custom commands, `picture` environments, and theorem environments correctly.\n\n### Diagnosis steps\n\n1. **Binary search for the failing line.** Extract the body (`\\begin{document}` to `\\end{document}`), then test pandoc with increasing prefixes to find the first line that causes failure.\n2. **Check that line for brace mismatches.** The most common cause is an unbalanced `{` or `}` in the LaTeX source. LaTeX's TeX engine silently tolerates these, but pandoc's structured parser does not.\n3. **Fix only the mismatch and re-run `convert-paper`.** A single-character fix (e.g., removing an orphaned `{`) is usually sufficient. The fetch step is idempotent, so the cached source and PDF are reused without network access.\n\n### Example\n\nThe source `(see, e.g., {\\cite{makhlin})` has an unmatched `{`. LaTeX compiles fine but pandoc fails. Fix: remove the stray `{`.\n\n## Directory Structure\n\nOutput is created under `--output-dir` (default: current working directory):\n\n```\n{output-dir}/\n└── {ARXIV_ID}/\n    ├── source/           # LaTeX source files (if available)\n    ├── pdf/              # PDF file\n    ├── {ARXIV_ID}.md     # Generated Markdown output\n    └── figures/          # Extracted figures (if any)\n```","tags":["arxiv","doc","builder","skills","ultimatile","agent-skills"],"capabilities":["skill","source-ultimatile","skill-arxiv-doc-builder","topic-agent-skills"],"categories":["arxiv-skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/ultimatile/arxiv-skills/arxiv-doc-builder","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add ultimatile/arxiv-skills","source_repo":"https://github.com/ultimatile/arxiv-skills","install_from":"skills.sh"}},"qualityScore":"0.457","qualityRationale":"deterministic score 0.46 from registry signals: · indexed on github topic:agent-skills · 15 github stars · SKILL.md body (8,814 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-04-22T19:06:34.358Z","embedding":null,"createdAt":"2026-04-18T23:07:08.943Z","updatedAt":"2026-04-22T19:06:34.358Z","lastSeenAt":"2026-04-22T19:06:34.358Z","tsv":"'-10':612 '-5':595 '-7':618 '/path/to/1911.04882/source':943 '/path/to/1911.04882/source/main_paper.tex':944,970,1030 '/path/to/1911.04882/source/supplemental_material.tex':945 '0':463 '1':66,340,468,594,611,821,1120 '10':596 '1911.04882':883,1026 '2':92,171,356,459,704,760,828,928,938,1149 '3':128,364,617,834,1182 '300':702 '4':193,372,845 'absolut':331 'abstract':124,447 'academ':562 'accept':783 'access':1221 'accuraci':725 'advanc':661 'agent':517 'agent-driven':516 'alreadi':1013 'already-download':1012 'ambigu':950 'approach':681,806 'appropri':367 'architectur':638 'arxiv':2,6,15,48,53,71,202,204,241,254,289,300,303,314,316,380,382,482,484,546,567,589,606,697,856,1258,1269 'arxiv-doc-build':1 'attempt':72,1097 'author':445 'automat':51,65,398,732,738,872 'avail':12,19,142,345,349,358,403,683,789,1265 'back':415 'base':664,680,805,838 'bash':281,543,564,583,688,1022 'begin':1130 'behavior':650 'best':156 'best-effort':155 'better':46,431 'binari':1121 'bodi':1129 'brace':1154 'break':768 'broad':1098 'broke':1080 'builder':4,50 'builder/convert_paper.py':302 'builder/convert_pdf_double_column.py':569 'builder/convert_pdf_extract.py':591,608 'builder/convert_pdf_simple.py':548 'builder/convert_pdf_with_vision.py':699 'cach':83,352,1213 'call':341,365,502 'candid':932 'canon':880 'capabl':62,722 'case':660,878 'caus':635,1094,1147,1159 'charact':1197 'check':1150 'cite':1227 'claud':223,719 'code':219,927 'column':38,150,168,172,420,541,559,581,602,615,624,703,711,761,764 'column-span':763 'command':280,284,471,1111 'common':643,1158 'compil':1233 'complex':179,473,670,758,819 'consist':649 'contain':863 'content':450 'control':334 'convers':496,644,666,739,744,747,781,839,1056 'convert':5,21,52,93,144,240,278,287,368,404,499,535,553,899,917,965,993,1024,1047,1192 'convert-pap':143,277,286,498,916,964,992,1023,1046,1191 'convert_latex.py':369 'convert_pdf_double_column.py':552 'convert_pdf_extract.py':573 'convert_pdf_simple.py':147,371,503,534,827 'correct':913,961,1002,1051,1117 'creat':251,321,706,1246 'creation':394 'curl':388 'current':212,323,492,1252 'custom':1110 'default':211,322,487,1251 'dens':843 'detect':357,400 'diagnosi':1118 'differ':529 'dir':293,294,307,308,311,379,481,490,977,1038,1250,1257 'direct':297 'directori':90,210,214,312,325,393,494,1053,1242,1254 'display':460 'doc':3,301,547,568,590,607,698 'document':10,49,58,121,198,252,1071,1089,1131,1134 'documentclass':853,871,941,1101 'doubl':558,580,614,623 'double-column':557,579 'double-column-pag':613,622 'download':74,344,1014 'dpi':701 'driven':518 'due':1082 'dx':467 'e':465 'e.g':858,963,1199,1226 'easi':215 'effort':157 'either':897 'element':766 'elsewher':1087 'end':1070,1074,1088,1133 'engin':1171 'ensur':648 'entri':266,914 'environ':1105,1113,1116 'error':637,936,1066 'etc':1106 'evid':906 'exampl':881,933,1222 'exit':926 'expand':1102 'explicit':924 'extract':88,152,390,422,574,584,599,675,822,1127,1276 'f':456 'fail':676,923,1060,1125,1237 'failur':934,1057,1148 'fall':414 'fallback':39,130,158,507,687,816 'fetch':11,67,246,1006,1207 'fetch_paper.py':342 'figur':125,1275,1277 'file':353,389,854,866,910,939,957,962,969,998,1004,1029,1263,1268 'find':1142 'fine':1234 'first':1144 'fix':775,1183,1198,1238 'focus':655,846 'format':116,359,757 'formula':113,474,672,752 'found':937 'full':448 'generat':194,441,689,1272 'get':33 'global':275,283 'guess':921 'handl':84,397,1108 'happi':27,99 'heavi':176,799 'hierarchi':119,453 'high':691 'high-resolut':690 'http':86,386 'id':203,205,243,250,255,261,290,304,315,317,381,383,483,485,1259,1270 'idempot':80,351,1009,1210 'imag':693,708 'immedi':636 'implement':60,196,220 'implementation-readi':195 'includ':123,443,978 'increas':1139 'independ':893 'infti':464 'inher':749 'inlin':454 'inspect':181,429,532 'instal':276 'insuffici':525 'int':462 'invalid':632 'invok':233 'issu':1086 'iter':526 'keep':652,809,829 'latex':16,22,75,94,101,139,360,401,470,756,786,1063,1166,1168,1232,1261 'layout':180,542,560,603,759 'line':1126,1145,1152 'link':831 'list':930 'locat':336,478 'logic':645 'lossi':750 'main':270,888,946,1003 'maintain':117 'makhlin':1228 'malform':779 'manual':190,514,686,717,774 'markdown':9,24,57,98,103,245,375,442,1273 'materi':13,69,346,862 'math':108,175,455,461,751,798,844 'math-heavi':174,797 'mathemat':112,671 'mathjax/latex':115 'maximum':724 'may':767,772,777,1067 'md':206,318,384,486,1271 'mean':1076 'mismatch':1155,1186 'mix':601 'mkdir':395 'multipl':852,864 'must':134,626 'naiv':35,131,417,506,522 'need':257,773 'network':1021,1220 'newcommand':1103 'note':621 'o':550,571,597,619 'optim':221 'option':578,710 'orchestr':271,339 'order':770 'origin':973,1033 'orphan':1202 'output':132,162,183,199,209,292,306,310,335,373,378,423,439,477,480,489,523,935,976,1037,1244,1249,1256,1274 'output-dir':291,305,309,377,479,488,975,1036,1248,1255 'output.md':551,572,598,620 'page':537,555,576,586,593,610,616,625,631,633,707,841 'pandoc':26,105,406,902,1055,1059,1077,1107,1137,1176,1236 'paper':7,32,54,68,145,169,173,177,242,249,260,279,288,328,449,500,563,668,792,800,820,857,890,918,966,994,1025,1048,1052,1193 'paper.pdf':549,570,592,609,700 'parser':1079,1179 'part':729 'pass':974,1039 'path':28,100,332,411 'pdf':20,30,43,79,129,188,350,363,412,436,495,665,695,743,746,780,811,830,1216,1266,1267 'pdf-on':29 'pdf_converter_lib.py':647 'pdfplumber':151 'pictur':1112 'point':267,915,958,999,1068 'poor':741 'precis':337 'prefer':77 'prefix':1140 'preprocess':1099 'preserv':106,111,469 'primari':814 'primarili':794 'prl':859,889,894 'process':248,582 'produc':160,740 'prose':849 'python':696 'qualiti':133,424,745 'rang':634 're':952,985,990,1189 're-run':951,984,989,1188 'read':225,716,769 'read/reference':259 'readabl':848 'readi':197 'recommend':285 'reconstruct':1049 'refer':61,127,217,475,776,815,833 'refus':919 'reliabl':410 'remov':1104,1200,1239 'replac':1100 'request':87,239,387 'resolut':692 'resolv':988 'result':47,432,533,742 'reus':355,1017,1218 'revtex4/revtex4-2':1109 'run':146,299,545,566,588,605,953,986,991,1034,1190 'save':200 'script':44,189,272,296,437,497,510,530,641,654 'search':1122 'section':118,452,476,850 'see':191,1225 'select':873,909,948 'setup':91 'share':642 'ship':885 'silent':1172 'simpl':165 'sinc':901 'singl':37,149,167,265,419,540,1196 'single-charact':1195 'single-column':36,148,166,418,539 'skill':64,232,235 'skill-arxiv-doc-builder' 'skip':81 'sourc':17,76,95,102,140,347,361,399,402,787,1015,1064,1167,1214,1224,1260,1262 'source-ultimatile' 'span':765 'special':42,187,435 'specif':575,585,658 'split':712 'step':1007,1119,1208 'strategi':817 'stray':1241 'structur':56,97,110,122,374,440,823,1178,1243 'subdirectori':329 'subset':629 'succeed':903 'success':900 'suffici':1205 'supplement':861,895 'syntax':1085 'tabl':771 'tar':391 'test':1136 'tex':865,947,956,968,997,1028,1170 'tex-fil':955,967,996,1027 'text':421,674,795,825 'theorem':1115 'three':640 'titl':444 'toler':1173 'topic-agent-skills' 'touch':1019 'tri':528 'troubleshoot':851,1054 'unbalanc':1162 'understand':227 'unexpect':1073 'unmatch':1231 'unreli':875 'usabl':161 'use':40,185,230,268,282,295,330,433,519,659,734,801,835,1035 'user':238 'usual':1204 'uv':298,544,565,587,604 'valu':981,1042 'vari':425 'verifi':136 'via':25,104,826 'vision':663,679,721,804,837 'vision-bas':662,678,803,836 'without':1018,1219 'work':213,264,324,493,1253 'workflow':733 'x':457,458,466","prices":[{"id":"4226cf93-8bb7-4d81-85eb-90af7a63f004","listingId":"38fc7c36-c480-475a-93b2-8cb798c7b9f8","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"ultimatile","category":"arxiv-skills","install_from":"skills.sh"},"createdAt":"2026-04-18T23:07:08.943Z"}],"sources":[{"listingId":"38fc7c36-c480-475a-93b2-8cb798c7b9f8","source":"github","sourceId":"ultimatile/arxiv-skills/arxiv-doc-builder","sourceUrl":"https://github.com/ultimatile/arxiv-skills/tree/main/skills/arxiv-doc-builder","isPrimary":false,"firstSeenAt":"2026-04-18T23:07:08.943Z","lastSeenAt":"2026-04-22T19:06:34.358Z"}],"details":{"listingId":"38fc7c36-c480-475a-93b2-8cb798c7b9f8","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"ultimatile","slug":"arxiv-doc-builder","github":{"repo":"ultimatile/arxiv-skills","stars":15,"topics":["agent-skills"],"license":"mit","html_url":"https://github.com/ultimatile/arxiv-skills","pushed_at":"2026-04-22T12:01:06Z","description":null,"skill_md_sha":"9f1812713f2cfcb3caa8646952ef040cf759c20d","skill_md_path":"skills/arxiv-doc-builder/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/ultimatile/arxiv-skills/tree/main/skills/arxiv-doc-builder"},"layout":"multi","source":"github","category":"arxiv-skills","frontmatter":{"name":"arxiv-doc-builder","description":"Convert arXiv papers to Markdown documentation. Fetches available materials from arXiv (LaTeX source when available + PDF), converts LaTeX to Markdown via pandoc (happy path). PDF-only papers get a naive single-column fallback — use the specialized PDF scripts for better results."},"skills_sh_url":"https://skills.sh/ultimatile/arxiv-skills/arxiv-doc-builder"},"updatedAt":"2026-04-22T19:06:34.358Z"}}