Docling
IBM's open-source document parser for converting PDFs, DOCX, and more into structured formats for AI
Open SourceOpen source (MIT), free to useOpen Source api
Visit DoclingAbout Docling
Docling by IBM Research is a document parsing library that converts PDF, DOCX, PPTX, HTML, and other formats into structured representations. It uses advanced layout analysis and table structure recognition to extract clean text, tables, and metadata. Docling is designed for preparing documents for RAG and LLM applications.
Key Features
- Multi-format parsing
- Table structure recognition
- Layout analysis
- OCR support
- Markdown output
- LangChain/LlamaIndex integration
- Batch processing
Pros
- Excellent PDF parsing
- IBM Research quality
- Multi-format support
- Active development
Cons
- Newer tool
- Large model downloads
- CPU-intensive
Tags
document-parsingpdfibmopen-sourcerag
Alternatives to Docling
01Unstructured
Open-source tools for ingesting and pre-processing unstructured documents for LLM applicationsMore Developer Infrastructure ToolsView All
01Hugging Face
The leading open-source platform for sharing, discovering, and deploying ML models, datasets, and SpacesLangChain
Open-source framework for building LLM-powered applications with chains, agents, and retrieval-augmented generationPinecone
Managed vector database for building high-performance AI applications with similarity search at scaleReplicate
Run and deploy open-source ML models in the cloud with a simple API, no infrastructure neededWeights & Biases (W&B)
ML experiment tracking, model versioning, and dataset management platform for AI teamsWeaviate
Open-source vector database with built-in vectorization modules and hybrid search capabilities