PDFMux
Universal PDF extraction orchestrator that routes pages to the best backend with quality auditing and confidence scor...
What it does
Universal PDF extraction orchestrator that routes pages to the best backend with quality auditing and confidence scoring.
PDFMux is an intelligent PDF text extraction orchestrator that routes pages to the most appropriate backend based on content type. It supports PyMuPDF, OpenDataLoader, RapidOCR, Docling, and Surya backends with cost-aware economy, balanced, and premium processing modes. Features schema-guided extraction for invoices and contracts, RAG-ready chunking, and confidence scoring without requiring a GPU.
Capabilities
Server
Quality
deterministic score 0.68 from registry signals: · indexed on pulsemcp · has source repo · 63 github stars · registry-generated description present