AIDEX
Docling logo

Docling

by IBM Research

IBM's open-source document parser for converting PDFs, DOCX, and more into structured formats for AI

Open SourceOpen source (MIT), free to useOpen Source api
Visit Docling

About Docling

Docling by IBM Research is a document parsing library that converts PDF, DOCX, PPTX, HTML, and other formats into structured representations. It uses advanced layout analysis and table structure recognition to extract clean text, tables, and metadata. Docling is designed for preparing documents for RAG and LLM applications.

Key Features

  • Multi-format parsing
  • Table structure recognition
  • Layout analysis
  • OCR support
  • Markdown output
  • LangChain/LlamaIndex integration
  • Batch processing

Pros

  • Excellent PDF parsing
  • IBM Research quality
  • Multi-format support
  • Active development

Cons

  • Newer tool
  • Large model downloads
  • CPU-intensive

Tags

document-parsingpdfibmopen-sourcerag