Docling AI Document Intelligence Pipeline
Docling is an IBM-backed open-source toolkit that converts PDF, DOCX, PPTX, XLSX, HTML, images, audio, and LaTeX files into structured formats for gen AI workflows. It features advanced PDF layout understanding, OCR, table extraction, and integrations with LangChain, LlamaIndex,
What it does
Docling AI Document Intelligence Pipeline
Docling is an IBM-backed open-source toolkit that converts PDF, DOCX, PPTX, XLSX, HTML, images, audio, and LaTeX files into structured formats for gen AI workflows. It features advanced PDF layout understanding, OCR, table extraction, and integrations with LangChain, LlamaIndex, and CrewAI.
Installation
Use the upstream install or setup path that matches your environment:
- pip install docling
Requirements and caveats from upstream:
- Note: Python 3.9 support was dropped in docling version 2.70.0. Please use Python 3.10 or higher.
-
3. Python usage (recommended)
Basic usage or getting-started notes:
-
๐ Connect to any agent using the MCP server
-
๐ MCP server for agentic applications
-
1. Install
-
Extracted from upstream docs: https://raw.githubusercontent.com/docling-project/docling/HEAD/README.md
Source
Capabilities
Install
Quality
deterministic score 0.45 from registry signals: ยท indexed on github topic:agent-skills ยท 8 github stars ยท SKILL.md body (1,272 chars)