Distributed Execution

A minimal guide to running Cognee pipelines across Modal containers with a one-line toggle. Good fit for large batches or slow tasks. Before you start:

Complete Quickstart to understand basic operations
Ensure you have LLM Providers configured
Have a Modal account and tokens configured locally (modal setup)
Create a Modal Secret named distributed_cognee with your environment variables

What Distributed Execution Does

Distributes per-item task execution to Modal functions
Keeps your code unchanged; you can keep using add → cognify → search or custom pipelines
Scales processing across multiple containers for large datasets

Modal is a serverless cloud platform that provides compute-intensive applications without thinking about infrastructure. It’s perfect for running generative AI models, large-scale batch workflows, and job queues at scale. When you enable distributed execution, Cognee automatically uses Modal to run your processing tasks across multiple containers, making it much faster for large datasets.

Prerequisites

Install extras with Modal support and configure your environment:

# Install with distributed support
pip install cognee[distributed]

# Configure Modal (creates account if needed)
modal setup

# Create Modal Secret with your environment variables
modal secret create distributed_cognee

Add your environment variables to the Modal Secret (e.g., LLM_API_KEY, DB configs, S3 creds if used).

Code in Action

import asyncio
import cognee
from cognee import SearchType

async def main():
    # COGNEE_DISTRIBUTED=true is picked up implicitly
    # 1) Add data (text, files, or S3 URIs)
    await cognee.add([
        "Alice knows Bob. Bob works at ACME.",
        "NLP is a subfield of computer science.",
    ], dataset_name="dist_demo")

    # 2) Build the knowledge graph (runs distributed)
    await cognee.cognify(datasets=["dist_demo"]) 

    # 3) Query
    answers = await cognee.search(
        query_type=SearchType.GRAPH_COMPLETION,
        query_text="Who does Alice know?",
        top_k=5,
    )
    print(answers)

asyncio.run(main())

This simple example uses basic text data for demonstration. In practice, you can process large datasets, files, or S3 URIs - the distributed execution scales automatically across Modal containers.

What Just Happened

Step 1: Enable Distribution

export COGNEE_DISTRIBUTED=true
python your_script.py

Set the environment variable and run your code as usual. Internally, pipelines switch from run_tasks to run_tasks_distributed (Modal) via this toggle.

Step 2: Add Your Data

await cognee.add([
    "Alice knows Bob. Bob works at ACME.",
    "NLP is a subfield of computer science.",
], dataset_name="dist_demo")

Add your data using the standard add function. The same approach works with files, S3 URIs, or large datasets.

Step 3: Process Distributed

await cognee.cognify(datasets=["dist_demo"])

The cognify operation automatically runs distributed across Modal containers when COGNEE_DISTRIBUTED=true is set.

Step 4: Search Your Data

answers = await cognee.search(
    query_type=SearchType.GRAPH_COMPLETION,
    query_text="Who does Alice know?",
    top_k=5,
)

Search your processed data using the standard search methods. The results are the same whether processed locally or distributed.

What Happens Under the Hood

When COGNEE_DISTRIBUTED=true:

Tasks are distributed to Modal functions automatically
Each task runs in its own container
Results are collected and merged back
Database schemas are created on first run
Costs are tracked in your Modal workspace

Start small and confirm costs in your Modal workspace. For non-pipeline first calls that write to DBs, call await setup() once.

Deploy REST API

Learn about API deployment

Custom Tasks

Learn about custom tasks and pipelines

Core Concepts

Understand knowledge graph fundamentals

Getting Started

Core Concepts

Setup Configuration

Guides

Examples

CLI

What Distributed Execution Does

Prerequisites

Code in Action

What Just Happened

Step 1: Enable Distribution

Step 2: Add Your Data

Step 3: Process Distributed

Step 4: Search Your Data

What Happens Under the Hood

Deploy REST API

Custom Tasks

Core Concepts

Getting Started

Core Concepts

Setup Configuration

Guides

Examples

CLI

​What Distributed Execution Does

​What is Modal?

​Prerequisites

​Code in Action

​What Just Happened

​Step 1: Enable Distribution

​Step 2: Add Your Data

​Step 3: Process Distributed

​Step 4: Search Your Data

​What Happens Under the Hood

Deploy REST API

Custom Tasks

Core Concepts

What Distributed Execution Does

What is Modal?

Prerequisites

Code in Action

What Just Happened

Step 1: Enable Distribution

Step 2: Add Your Data

Step 3: Process Distributed

Step 4: Search Your Data

What Happens Under the Hood