Skillquality 0.49

ref-downloader

Use when the user asks to batch-download all references for a paper from a DOI or local PDF with ref-downloader. Not for one-off PDFs, paper search, or Zotero import.

Price
free
Protocol
skill
Verified
no

What it does

Ref Downloader — Agent Runbook

Slim entry for agent mode. The full 8-step manual runbook with code snippets, DOI-resolution fallback chain, and PUBLISHER_MAP extension procedure lives in references/agent-runbook.md. Human users see ../../README.md.

<SKILL_DIR> = this folder (skills/ref-downloader in the source repo, or wherever the user copied this skill — e.g. ~/.claude/skills/ref-downloader/). Python scripts live in <SKILL_DIR>/scripts/; config files (config.example.toml, config.local.toml) live at <SKILL_DIR>/.

Install prerequisites (before first invocation)

The skill protocol can't manage Python deps. If python -c "import playwright" fails, the user needs:

cd "<SKILL_DIR>"
pip install playwright pymupdf
playwright install msedge          # downloads Edge driver
cp config.example.toml config.local.toml   # then user edits [crossref].mailto

If the user is developing from the source repo instead of an installed skill copy, they can also install from the repo root with pip install -r requirements.txt -r requirements-dev.txt.

When to invoke

Trigger phrases:

  • "帮我下载 [paper / DOI] 的参考文献" / "批量下载引用文献" / "把这篇论文的所有引用下载下来"
  • "Download all refs of [paper / DOI]" / "Batch-download every reference"
  • User provides a DOI (10.x/y form) or local PDF path and asks for "all references" / "全部参考文献"

Don't invoke for:

  • Downloading one arbitrary PDF (not a reference list)
  • Generic web scraping
  • Paper search / Zotero import — different tools

Primary entry

python "<SKILL_DIR>/scripts/run_ref_downloader.py" <DOI_OR_PDF_PATH>

The wrapper handles DOI resolution (Zotero → fitz fallback), output-dir layout, sequential 3-stage pipeline (extract_refs.pyvalidate_refs.pydownload_refs.py), and end-of-run cleanup.

Useful flags:

  • --yes — non-interactive (CI/batch), overwrite prompts default-yes
  • --auto — forwarded to download_refs.py: skip "press Enter" confirm + shorter challenge wait
  • --output-dir <path> — override default output location
  • --config <path> — alternate TOML config (overrides config.local.toml)

Pre-flight checklist (confirm before running)

  1. DOI correct? Echo back to user: 即将下载参考文献:DOI=<doi>
  2. Edge fully closed? All msedge.exe processes killed (Task Manager check). The script claims the user's persistent Edge profile and needs exclusive access.
  3. Config set? First-run users need <SKILL_DIR>/config.local.toml with [crossref].mailto. Missing config → wrapper prints a WARNING but continues with placeholder defaults.
  4. Output location agreed? Default for DOI input: <cwd>/<project_name>_refs/. For PDF input: <pdf_dir>/<pdf_stem>_refs/. Override with --output-dir.

Output layout

<OUTPUT_DIR>/
├── <PROJECT_NAME>/
│   ├── refs_raw.json           # extract_refs.py output
│   ├── refs_validated.json     # validate_refs.py output
│   ├── download_report.csv     # per-ref status (only on graceful completion)
│   ├── *.pdf                   # reference PDFs
│   └── *_SI.pdf                # supplementary files (where supported)
└── runs/<timestamp>-round-03/
    └── events.jsonl            # full event trace per ref

Interruption note: if the run is interrupted (Ctrl+C / Edge crash / VPN drop), the root download_report.csv may be stale. Trust the latest runs/<timestamp>/events.jsonl + actual files in <PROJECT_NAME>/.

Common failure modes

Status / symptomMeaningAction
manual_pending (auth_redirect)Bounced to institution SSOUser signs in via live Edge tab; re-run (incremental skips done refs)
manual_pending (challenge_timeout)Cloudflare / publisher challenge unsolved in timeRe-run interactively; solve captcha when prompted
manual_pending (elsevier_crasolve_shell)Elsevier viewer stuck in transitionHot-session auto-retry usually picks it up; else manual click in live page
failed (auto)Generic auto path failedCheck events.jsonl for that ref; may need a publisher-specific patch
ignored (ignored_institution_access)DOI listed in [institution].ignored_access_doisSkip-by-design; remove from config to retry
Edge won't launchBackground msedge.exe still holding profileKill all msedge.exe in Task Manager, re-run
ModuleNotFoundError: playwrightInstall prereqs not doneSee "Install prerequisites" section above
WARNING: crossref.mailto is the placeholderFirst-run config uncustomizedEdit <SKILL_DIR>/config.local.toml → set [crossref].mailto to a real email (Crossref polite pool)

Manual / debug mode

If the wrapper fails partway, run the 3 scripts standalone for partial re-execution:

python <SKILL_DIR>/scripts/extract_refs.py <DOI>           # → refs_raw.json
python <SKILL_DIR>/scripts/validate_refs.py <PROJECT>      # → refs_validated.json
python <SKILL_DIR>/scripts/download_refs.py <PROJECT>      # → PDFs + download_report.csv

Full 8-step manual flow with code snippets, DOI-resolution fallback chain (Zotero query → fitz text → user prompt), and the procedure for extending PUBLISHER_MAP when encountering an unknown DOI prefix → references/agent-runbook.md.

See also

Capabilities

skillsource-ltczding-gifskill-ref-downloadertopic-academic-toolstopic-agent-skillstopic-claude-codetopic-codex-skillstopic-crossreftopic-literature-reviewtopic-pdf-downloadertopic-playwrighttopic-pythontopic-zotero

Install

Installnpx skills add ltczding-gif/ref-downloader
Transportskills-sh
Protocolskill

Quality

0.49/ 1.00

deterministic score 0.49 from registry signals: · indexed on github topic:agent-skills · 78 github stars · SKILL.md body (5,836 chars)

Provenance

Indexed fromgithub
Enriched2026-05-18 18:56:49Z · deterministic:skill-github:v1 · v1
First seen2026-05-12
Last seen2026-05-18

Agent access