Distributed Processing

Why Modal? A single-machine runs of large amount of data can take hours. Spawning ~50 containers on Modal slashes that to minutes.

Modal is a serverless compute platform that spins up Docker containers on-demand. Compared with running Cognee locally or on one VM you get:

Parallelism – each Cognify task can run in its own container.
Elasticity – burst to dozens of CPUs instantly, scale to zero when idle.
Cost efficiency – pay per-second billing; generous free tier for experimentation.

Step-by-step Example

Upload your raw data to S3 – every worker streams its partition directly from the bucket.

Pass the bucket URI to Cognee – the distributed entry-point reads the S3_BUCKET_PATH env var:


import os
 
s3_bucket_path = os.getenv("S3_BUCKET_PATH")
s3_data_path = "s3://" + s3_bucket_path

Add secrets to Modal – navigate to Secrets in the Modal dashboard and create one for all environment variables your pipeline needs:
- S3_BUCKET_PATH
- LLM_API_KEY, …
Install the Modal SDK & authenticate:

Run the distributed entry-point – clone Cognee and launch from the distributed directory:


git clone https://github.com/topoteretes/cognee.git
cd cognee/distributed
 
# first run builds the image, later runs start instantly
modal run entrypoint.py

Tuning worker counts

Open distributed/entrypoint.py:


# (graph nodes & edges)
number_of_graph_saving_workers = 1
 
# (embeddings → vector DB)
number_of_data_point_saving_workers = 5

Increase the second parameter if embedding uploads are the bottleneck; decrease if you hit rate-limits on your vector database.

Queue logic under the hood

The runner maintains two internal async queues:

graph_saving_queue – every task that creates a node or edge drops its payload here. One or more graph-saving workers dequeue the items and persist them to your graph database.
data_point_saving_queue – holds DataPoint objects that need embedding + insertion into the vector store. Because this step involves calling the embedding model and writing to an external DB it is much slower, therefore we default to more workers.

Your worker counts simply control how many consumer coroutines Modal spins up for each queue. If you notice graph commits lagging behind, bump number_of_graph_saving_workers. If embeddings are the bottleneck, raise number_of_data_point_saving_workers.

Resources

Join the Conversation!

Have questions? Join our community now to connect with professionals, share insights, and get your questions answered!