Dagster
Data orchestration platform for building, testing, and monitoring data pipelines as software-defined assets
Open SourceOpen source (Apache 2.0), Dagster Cloud from $100/mo, Enterprise custom APIOpen Source web api
Visit DagsterAbout Dagster
Dagster is a data orchestration platform that treats data assets as first-class citizens. Its asset-based approach (Software-Defined Assets) makes it natural to build and manage data pipelines, including ML feature pipelines and model training workflows. Dagster provides rich testing, observability, and scheduling capabilities.
Key Features
- Software-Defined Assets
- Rich testing framework
- Dagster Cloud hosting
- Scheduling
- Observability
- Partitioned assets
- Integration ecosystem
Pros
- Modern asset-based paradigm
- Great developer experience
- Strong testing
- Good documentation
Cons
- Paradigm shift from Airflow
- Cloud pricing
- Smaller community than Airflow
Tags
data-orchestrationpipelinesmlopsopen-sourceassets
Alternatives to Dagster
01Prefect
Modern Python workflow orchestration with dynamic workflows and developer-friendly designMetaflow
Python framework for building and managing real-life data science and ML projects at scaleMore Developer Infrastructure ToolsView All
01Hugging Face
The leading open-source platform for sharing, discovering, and deploying ML models, datasets, and SpacesLangChain
Open-source framework for building LLM-powered applications with chains, agents, and retrieval-augmented generationPinecone
Managed vector database for building high-performance AI applications with similarity search at scaleReplicate
Run and deploy open-source ML models in the cloud with a simple API, no infrastructure neededWeights & Biases (W&B)
ML experiment tracking, model versioning, and dataset management platform for AI teamsWeaviate
Open-source vector database with built-in vectorization modules and hybrid search capabilities