- Complete Quickstart to understand basic operations
- Ensure you have LLM Providers configured
- Have a Modal account and tokens configured locally (
modal setup
) - Create a Modal Secret named
distributed_cognee
with your environment variables
What Distributed Execution Does
- Distributes per-item task execution to Modal functions
- Keeps your code unchanged; you can keep using
add
→cognify
→search
or custom pipelines - Scales processing across multiple containers for large datasets
What is Modal?
Modal is a serverless cloud platform that provides compute-intensive applications without thinking about infrastructure. It’s perfect for running generative AI models, large-scale batch workflows, and job queues at scale. When you enable distributed execution, Cognee automatically uses Modal to run your processing tasks across multiple containers, making it much faster for large datasets.Prerequisites
Install extras with Modal support and configure your environment:LLM_API_KEY
, DB configs, S3 creds if used).
Full Working Example
This simple example uses basic text data for demonstration. In practice, you can process large datasets, files, or S3 URIs - the distributed execution scales automatically across Modal containers.
What Just Happened
Step 1: Enable Distribution
run_tasks
to run_tasks_distributed
(Modal) via this toggle.
Step 2: Add Your Data
add
function. The same approach works with files, S3 URIs, or large datasets.
Step 3: Process Distributed
cognify
operation automatically runs distributed across Modal containers when COGNEE_DISTRIBUTED=true
is set.
Step 4: Search Your Data
What Happens Under the Hood
WhenCOGNEE_DISTRIBUTED=true
:
- Tasks are distributed to Modal functions automatically
- Each task runs in its own container
- Results are collected and merged back
- Database schemas are created on first run
- Costs are tracked in your Modal workspace
Start small and confirm costs in your Modal workspace. For non-pipeline first calls that write to DBs, call
await setup()
once.