AIDEX
Unstructured logo

Unstructured

by Unstructured

Open-source tools for ingesting and pre-processing unstructured documents for LLM applications

Open SourceOpen source library free, Hosted API free tier (1000 pages), Pro from $10/mo APIOpen Source api
Visit Unstructured

About Unstructured

Unstructured provides tools to extract and transform data from documents (PDFs, HTML, images, Office files) into clean, structured formats ready for LLM applications. The open-source library handles parsing, chunking, and cleaning, while the hosted platform offers API access with OCR and table extraction capabilities.

Key Features

  • Multi-format document parsing
  • Intelligent chunking
  • Table extraction
  • OCR support
  • Metadata extraction
  • Connector framework
  • LangChain/LlamaIndex integration

Pros

  • Handles many document formats
  • Good chunking strategies
  • Active open source
  • Essential for RAG

Cons

  • Quality varies by document type
  • OCR can be slow
  • Complex documents sometimes fail

Tags

document-processingetlragopen-sourceparsing