Search large PDFs and read only the relevant pages before answering
Use pdf-mcp to inspect a PDF, search it, and load only the pages that matter so an agent can answer questions from long documents without brute-forcing the whole file into context.
What it does
Search large PDFs and read only the relevant pages before answering
Use pdf-mcp to inspect a PDF, search it, and load only the pages that matter so an agent can answer questions from long documents without brute-forcing the whole file into context.
Prerequisites
Python 3.10+; an MCP-compatible client; local PDFs or accessible PDF URLs; optional extra dependencies for semantic search.
Installation
Use the upstream install or setup path that matches your environment:
- pip install pdf-mcp
- pip install 'pdf-mcp[semantic]'
- brew install tesseract
- git clone https://github.com/jztan/pdf-mcp.git
Requirements and caveats from upstream:
- A Model Context Protocol (MCP) server that enables AI agents to read, search, and extract content from PDF files. Built with Python and PyMuPDF, with SQLite-based caching for persis...
- For OCR on scanned PDFs (requires system Tesseract):
Basic usage or getting-started notes:
-
bash
-
For semantic search (adds fastembed and numpy, ~67 MB model download on first use):
-
macOS
-
Source: https://github.com/jztan/pdf-mcp
-
Extracted from upstream docs: https://raw.githubusercontent.com/jztan/pdf-mcp/HEAD/README.md
Documentation
Source
Capabilities
Install
Quality
deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,518 chars)