> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cognee.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Distributed Execution

> Step-by-step guide to running Cognee pipelines across Modal containers

A minimal guide to running Cognee pipelines across [Modal](https://modal.com/docs) containers with a one-line toggle. Good fit for large batches or slow tasks.

**Before you start:**

* Complete [Quickstart](getting-started/quickstart) to understand basic operations
* Ensure you have [LLM Providers](setup-configuration/llm-providers) configured
* Have a Modal account and tokens configured locally (`modal setup`)
* Create a Modal Secret named `distributed_cognee` with your environment variables

## What Distributed Execution Does

* Distributes per-item task execution to Modal functions
* Keeps your code unchanged at the pipeline layer; high-level flows like `remember()` and `recall()` keep working while task execution moves to Modal
* Scales processing across multiple containers for large datasets

## What is Modal?

[Modal](https://modal.com/docs) is a serverless cloud platform that provides compute-intensive applications without thinking about infrastructure. It's perfect for running generative AI models, large-scale batch workflows, and job queues at scale.

When you enable distributed execution, Cognee automatically uses Modal to run your processing tasks across multiple containers, making it much faster for large datasets.

## Prerequisites

Install extras with Modal support and configure your environment:

```bash theme={null}
# Install with distributed support
pip install cognee[distributed]

# Configure Modal (creates account if needed)
modal setup

# Create Modal Secret with your environment variables
modal secret create distributed_cognee
```

Add your environment variables to the Modal Secret (e.g., `LLM_API_KEY`, DB configs, S3 creds if used).

## Code in Action

```python theme={null}
import asyncio
import cognee
from cognee import SearchType

async def main():
    # COGNEE_DISTRIBUTED=true is picked up implicitly
    # 1) Remember data (text, files, or S3 URIs)
    await cognee.remember([
        "Alice knows Bob. Bob works at ACME.",
        "NLP is a subfield of computer science.",
    ], dataset_name="dist_demo")

    # 2) Recall from the distributed dataset
    answers = await cognee.recall(
        query_type=SearchType.GRAPH_COMPLETION,
        query_text="Who does Alice know?",
        datasets=["dist_demo"],
        top_k=5,
    )
    print(answers)

asyncio.run(main())
```

<Note>
  This simple example uses basic text data for demonstration. In practice, you can process large datasets, files, or S3 URIs - the distributed execution scales automatically across Modal containers.
</Note>

## What Just Happened

### Step 1: Enable Distribution

```bash theme={null}
export COGNEE_DISTRIBUTED=true
python your_script.py
```

Set the environment variable and run your code as usual. Internally, pipelines switch from `run_tasks` to `run_tasks_distributed` (Modal) via this toggle.

### Step 2: Remember Your Data

```python theme={null}
await cognee.remember([
    "Alice knows Bob. Bob works at ACME.",
    "NLP is a subfield of computer science.",
], dataset_name="dist_demo")
```

Remember your data using the standard v1.0 interface. The same approach works with files, S3 URIs, or large datasets.

### Step 3: Process Distributed

You do not need a separate distributed API call. `remember()` still uses the same ingestion and pipeline layers under the hood, so when `COGNEE_DISTRIBUTED=true` is set the per-item work is routed through `run_tasks_distributed` automatically.

### Step 4: Recall From Your Data

```python theme={null}
answers = await cognee.recall(
    query_type=SearchType.GRAPH_COMPLETION,
    query_text="Who does Alice know?",
    datasets=["dist_demo"],
    top_k=5,
)
```

Recall from your processed dataset using the standard v1.0 API. The results are the same whether the underlying tasks ran locally or on Modal.

## What Happens Under the Hood

When `COGNEE_DISTRIBUTED=true`:

* Tasks are distributed to Modal functions automatically
* Each task runs in its own container
* Results are collected and merged back
* Database schemas are created on first run
* Costs are tracked in your Modal workspace

<Note>
  Start small and confirm costs in your Modal workspace. For non-pipeline first calls that write to DBs, call `await setup()` once.
</Note>

<Columns cols={3}>
  <Card title="Deploy REST API" icon="server" href="/guides/deploy-rest-api-server">
    Learn about API deployment
  </Card>

  <Card title="Custom Tasks" icon="workflow" href="/guides/custom-tasks-pipelines">
    Learn about custom tasks and pipelines
  </Card>

  <Card title="Core Concepts" icon="brain" href="/core-concepts/overview">
    Understand knowledge graph fundamentals
  </Card>
</Columns>
