Skip to main content

Modal Deployment

Deploy Cognee on Modal for serverless, auto-scaling knowledge graph processing with minimal infrastructure management.
Modal is a cloud platform that lets you run code remotely with automatic scaling, perfect for variable Cognee workloads.

Why Modal?

Serverless Scaling

Automatically scales based on workload without server management

Cost Efficient

Pay only for compute time used, ideal for batch processing

Fast Deployment

Deploy within seconds with minimal configuration

GPU Support

Access to powerful GPUs for LLM processing when needed

Prerequisites

1

Modal Account

Create a free account at modal.com
2

Install Modal CLI

pip install modal
modal token new
3

Environment Variables

Set up your environment variables:
# Required
export OPENAI_API_KEY="your-openai-api-key"

# Optional - for external databases
export POSTGRES_URL="postgresql://user:pass@host:5432/db"
export NEO4J_URL="bolt://user:pass@host:7687"
export QDRANT_URL="http://host:6333"

Quick Deployment

1

Clone Repository

git clone https://github.com/topoteretes/cognee.git
cd cognee
2

Install Dependencies

# Install with uv (recommended)
uv sync --dev --all-extras --reinstall

# Activate virtual environment
source .venv/bin/activate
3

Deploy to Modal

# Run the Modal deployment script
modal run -d modal_deployment.py
The -d flag runs the deployment in detached mode. Monitor progress in your Modal dashboard.
4

Monitor Deployment

Visit your Modal dashboard to monitor the deployment status and view logs.

Configuration Options

  • Basic Setup
  • Production Setup
  • Hybrid Setup
Default ConfigurationUses embedded databases for quick testing:
# modal_deployment.py configuration
GRAPH_DATABASE = "networkx"
VECTOR_DATABASE = "lancedb"
RELATIONAL_DATABASE = "sqlite"

Deployment Architecture

Compute Resources

Modal automatically provisions compute resources based on your workload:
  • CPU: 2-16 cores per container
  • Memory: 4-64 GB RAM per container
  • GPU: Optional NVIDIA GPUs for LLM processing
  • Storage: Ephemeral storage per container
Modal scales your deployment automatically:
  • Cold Start: ~2-5 seconds to spin up new containers
  • Concurrent Processing: Multiple containers for parallel workloads
  • Auto-shutdown: Containers shut down when idle to save costs
Configure persistent storage for your data:
  • Volumes: Modal volumes for persistent file storage
  • External DBs: Connect to managed database services
  • S3 Integration: Direct S3 access for large datasets

Monitoring & Debugging

Modal Dashboard

Real-time MonitoringView logs, metrics, and container status in the Modal web interface.

Log Streaming

Live LogsStream logs directly to your terminal:
modal logs cognee-app

Video Tutorial

Cost Optimization

Batch Processing: Group multiple documents together to maximize container utilization and reduce cold start costs.
Database Costs: Consider using Modal’s built-in storage for development and external managed services for production.

Troubleshooting

Container Timeout
  • Increase timeout limits in modal_deployment.py
  • Break large datasets into smaller batches
Memory Errors
  • Increase container memory allocation
  • Use streaming processing for large files
Missing API Keys
  • Ensure all required environment variables are set
  • Use Modal secrets for sensitive data
Database Connections
  • Verify database URLs and credentials
  • Check network connectivity from Modal containers

Next Steps

Scale Up

Production DeploymentConfigure external databases and optimize for production workloads.

Monitor Usage

Track CostsMonitor compute usage and optimize batch sizes for cost efficiency.

Need Help?

Join our community for Modal deployment support and best practices.
I