Skip to main content

Deployment facts

These facts anchor the rest of the guide:
Common assumptionCorrect modelWhy it matters
The default graph backend is a separate named serviceThe default graph backend is embedded, file-backed Kuzu. NetworkX is the in-memory fallback.The on-disk graph directory is cognee_graph_kuzu.
A shared service gives moderate write concurrencyA shared service over file-backed Kuzu is still single-writer.The service boundary centralizes the writer; it does not make file-backed writes concurrent.
Helm includes an API, worker, and queue by defaultCognee ships one FastAPI image. Worker and queue splits are operator patterns.Add a queue and worker deployment only when your write model needs them.

Executive summary

Cognee runs either embedded in a Python process or as a FastAPI service. Operationally, each deployment comes down to three independent choices:
  • Writer ownership: who owns writes
  • Storage location: where graph, vector, and relational state live
  • Reader access path: how readers reach memory
The default embedded stack is file-backed: Kuzu for graph storage, LanceDB for vectors, and SQLite for relational metadata. In production, each tier can be externalized independently: Neo4j or the FalkorDB adapter for graph storage, Qdrant/pgvector/Pinecone/ChromaDB for vectors, and Postgres for relational metadata. Deployment patterns are composable. Choose a base shape first, such as Embedded SDK, Compose, Helm, sidecar, or Lambda, then add the write model, read-scaling pattern, and storage backend that match your workload.

Pick-a-path decision tree

Start with writer ownership. The first “no” on single-writer ownership pushes you toward queueing or external backends.
Single writer?
|-- yes
|   `-- Few readers?
|       |-- yes -> Embedded SDK
|       `-- no  -> Self-hosted FastAPI service
|                 `-- Add snapshot read replicas when reads need to scale
`-- no
    `-- Concurrent or multi-agent writes?
        |-- Bursty / event-driven -> Queue + single writer worker
        `-- Sustained concurrency -> Externalized backends
                                     Postgres + Neo4j/FalkorDB adapter + Qdrant/pgvector

Storage model

Cognee has three independent storage layers:
LayerSelectorDefaultProduction options
GraphGRAPH_DATABASE_PROVIDERkuzukuzu-remote, neo4j, FalkorDB adapter
VectorVECTOR_DB_PROVIDERlancedbpgvector, qdrant, pinecone, chromadb
RelationalDB_PROVIDERsqlitepostgres
Embedded storage is file-backed and easy to move:
~/.cognee/
`-- databases/
    |-- cognee_graph_kuzu
    |-- cognee_graph_kuzu.wal
    |-- lancedb/
    `-- system.db
That makes backups simple, but the same file-backed layout is why one process should own writes. In production, replace each layer with service-backed systems by changing the provider and connection variables.

Concurrency and the write model

This is the primary production decision. Packaging is secondary.
Write modelShapeMaps toConcurrency
Single processOne process owns local filesEmbedded SDKLowest
Shared serviceMany clients, one Cognee backend, one writerCompose, Helm, sidecarCentralized, still one writer
Queue-basedProducers enqueue, one worker consumesSQS, Kafka, RabbitMQ, RedisGood for bursty writes
Managed backendsGraph, vector, and relational tiers are external servicesNeo4j/FalkorDB adapter, Qdrant/pgvector, PostgresHighest
For concurrent multi-agent writes, do not rely on shared file-backed Kuzu. Use a single writer service, a queue, or an external graph backend.

1. Embedded deployment

Cognee runs inside the calling Python process. Storage defaults to the local Cognee directories and can be moved with DATA_ROOT_DIRECTORY and SYSTEM_ROOT_DIRECTORY. One process owns the writer lock; there is no service boundary or cross-machine sharing unless the directory is mounted or copied.
import os

os.environ["DATA_ROOT_DIRECTORY"] = "/data/.cognee_data"
os.environ["SYSTEM_ROOT_DIRECTORY"] = "/data/.cognee_system"
os.environ["GRAPH_DATABASE_PROVIDER"] = "kuzu"
os.environ["VECTOR_DB_PROVIDER"] = "lancedb"
os.environ["DB_PROVIDER"] = "sqlite"

import cognee

await cognee.remember("...", dataset_name="docs")
await cognee.recall("...")
Best for notebooks, CLIs, local agents, and single-process jobs. Avoid it for cross-process concurrent writes.
ProsCons
Zero infrastructure; runs in a notebook, CLI, or local agent.Single writer; no concurrent writes across processes.
Lowest latency because there is no network hop to storage.No service boundary or shared multi-machine access.
Simple backup model: snapshot one data/system directory.State is only as durable as the local disk or mount.
import asyncio
import os

os.environ["DATA_ROOT_DIRECTORY"] = "/data/.cognee_data"
os.environ["SYSTEM_ROOT_DIRECTORY"] = "/data/.cognee_system"
os.environ["GRAPH_DATABASE_PROVIDER"] = "kuzu"
os.environ["VECTOR_DB_PROVIDER"] = "lancedb"
os.environ["DB_PROVIDER"] = "sqlite"

import cognee


async def main():
    await cognee.remember("Cognee turns documents into AI memory.", dataset_name="docs")
    results = await cognee.recall("What does Cognee do?")
    print(results)


if __name__ == "__main__":
    asyncio.run(main())

2. Self-hosted service

Use Compose to validate Cognee inside customer or on-prem infrastructure before moving to Kubernetes. Pin the image, persist data to a named volume, expose health checks, and load secrets from a managed source rather than plaintext .env files.
services:
  cognee:
    image: cognee/cognee:1.0.6
    depends_on:
      postgres:
        condition: service_healthy
    environment:
      DB_PROVIDER: postgres
      DB_HOST: postgres
      DB_PORT: "5432"
      DB_NAME: cognee
      DB_USERNAME: cognee
      DB_PASSWORD_FILE: /run/secrets/db_password
      VECTOR_DB_PROVIDER: pgvector
      GRAPH_DATABASE_PROVIDER: kuzu
      DATA_ROOT_DIRECTORY: /data/.cognee_data
      SYSTEM_ROOT_DIRECTORY: /data/.cognee_system
    secrets:
      - db_password
    volumes:
      - cognee_data:/data
    ports:
      - "8000:8000"

  postgres:
    image: pgvector/pgvector:pg17
    environment:
      POSTGRES_USER: cognee
      POSTGRES_DB: cognee
      POSTGRES_PASSWORD_FILE: /run/secrets/db_password
    secrets:
      - db_password
    volumes:
      - pg_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U cognee -d cognee"]
      interval: 10s
      retries: 5

secrets:
  db_password:
    file: ./secrets/db_password.txt

volumes:
  cognee_data: {}
  pg_data: {}
See Docker Deployment for the full Compose workflow.
ProsCons
Fastest path to a persistent service on customer infrastructure.Single host; vertical scaling only.
Easy to externalize one tier at a time, such as Postgres/pgvector first.Compose secrets are weaker than a managed secret store.
Health checks and named volumes give practical readiness and durability.Graph writes stay single-writer until the graph tier is externalized.
services:
  cognee:
    image: cognee/cognee:1.0.6
    depends_on:
      postgres:
        condition: service_healthy
    environment:
      DB_PROVIDER: postgres
      DB_HOST: postgres
      DB_PORT: "5432"
      DB_NAME: cognee
      DB_USERNAME: cognee
      DB_PASSWORD_FILE: /run/secrets/db_password
      VECTOR_DB_PROVIDER: pgvector
      VECTOR_DB_HOST: postgres
      VECTOR_DB_PORT: "5432"
      VECTOR_DB_NAME: cognee
      GRAPH_DATABASE_PROVIDER: kuzu
      DATA_ROOT_DIRECTORY: /data/.cognee_data
      SYSTEM_ROOT_DIRECTORY: /data/.cognee_system
      LLM_PROVIDER: openai
      LLM_API_KEY_FILE: /run/secrets/llm_api_key
    secrets:
      - db_password
      - llm_api_key
    volumes:
      - cognee_data:/data
    ports:
      - "8000:8000"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 15s
      retries: 5

  postgres:
    image: pgvector/pgvector:pg17
    environment:
      POSTGRES_USER: cognee
      POSTGRES_DB: cognee
      POSTGRES_PASSWORD_FILE: /run/secrets/db_password
    secrets:
      - db_password
    volumes:
      - pg_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U cognee -d cognee"]
      interval: 10s
      retries: 5

secrets:
  db_password:
    file: ./secrets/db_password.txt
  llm_api_key:
    file: ./secrets/llm_api_key.txt

volumes:
  cognee_data: {}
  pg_data: {}

3. Scale-out patterns

Scale-out patterns are layered on top of a self-hosted deployment. They scale reads or isolate write jobs; they do not replace the write path.
Use one writer to run remember(), publish a snapshot to object storage, and let many readers pull the latest snapshot at startup. This scales reads without a clustered graph database, at the cost of freshness.
writer -> snapshot.tar -> S3 / MinIO
                       |-> reader-1
                       |-> reader-2
                       `-> reader-N
initContainers:
  - name: pull-snapshot
    image: amazon/aws-cli
    command:
      - sh
      - -c
      - aws s3 cp s3://cognee-snapshots/latest.tar /data/latest.tar && tar -xf /data/latest.tar -C /data
containers:
  - name: cognee-reader
    image: cognee/cognee:1.0.6
    env:
      - name: COGNEE_READ_ONLY
        value: "true"
Use this for heavy read traffic and static or slowly changing knowledge. Benchmark snapshot size because it drives reader cold-start time.
ProsCons
Horizontal read scaling without a clustered graph database.Freshness is bounded by snapshot cadence.
Readers are stateless and replaceable.Snapshot size drives cold-start time.
Cheap: object storage instead of a database fleet.Not for collaborative or strict-freshness workloads.
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cognee-reader
spec:
  replicas: 6
  selector:
    matchLabels:
      app: cognee-reader
  template:
    metadata:
      labels:
        app: cognee-reader
    spec:
      initContainers:
        - name: pull-snapshot
          image: amazon/aws-cli
          command:
            - sh
            - -c
            - aws s3 cp s3://cognee-snapshots/latest.tar /data/latest.tar && tar -xf /data/latest.tar -C /data
          volumeMounts:
            - name: snapshot
              mountPath: /data
      containers:
        - name: cognee-reader
          image: cognee/cognee:1.0.6
          env:
            - name: COGNEE_READ_ONLY
              value: "true"
            - name: DATA_ROOT_DIRECTORY
              value: /data/.cognee_data
            - name: SYSTEM_ROOT_DIRECTORY
              value: /data/.cognee_system
          volumeMounts:
            - name: snapshot
              mountPath: /data
      volumes:
        - name: snapshot
          emptyDir: {}

4. Serverless and managed

Serverless patterns are useful for HTTP-fronted memory APIs and scheduled jobs. They are rarely the primary on-prem pattern.
Run remember() offline, package the resulting Kuzu/LanceDB files into the deployment artifact or Lambda layer, and open them read-only at runtime.
resource "aws_lambda_function" "cognee_reader" {
  function_name = "cognee-reader"
  package_type  = "Image"
  image_uri     = "${var.ecr_repo}:1.0.6-snapshot"
  memory_size   = 3008
  timeout       = 30

  environment {
    variables = {
      COGNEE_READ_ONLY         = "true"
      DATA_ROOT_DIRECTORY     = "/var/task/.cognee_data"
      GRAPH_DATABASE_PROVIDER = "kuzu"
      VECTOR_DB_PROVIDER      = "lancedb"
    }
  }
}
This scales to zero and is rollback-friendly, but every knowledge update requires rebuilding and redeploying the snapshot.
ProsCons
No servers; scales to zero with per-request billing.Read-only; every knowledge update needs a rebuilt artifact.
Immutable artifact is reproducible and rollback-friendly.Cold start scales with artifact size.
No runtime database service to operate.Bounded by Lambda image and runtime limits.
resource "aws_lambda_function" "cognee_reader" {
  function_name = "cognee-reader"
  package_type  = "Image"
  image_uri     = "${var.ecr_repo}:1.0.6-snapshot"
  memory_size   = 3008
  timeout       = 30

  environment {
    variables = {
      COGNEE_READ_ONLY         = "true"
      DATA_ROOT_DIRECTORY     = "/var/task/.cognee_data"
      SYSTEM_ROOT_DIRECTORY   = "/var/task/.cognee_system"
      GRAPH_DATABASE_PROVIDER = "kuzu"
      VECTOR_DB_PROVIDER      = "lancedb"
      DB_PROVIDER             = "sqlite"
    }
  }
}

5. Externalized backends

Use external services for sustained multi-agent writes and independent scaling of each storage layer. This removes the file-backed graph single-writer ceiling once the graph tier is externalized.
resource "aws_db_instance" "cognee" {
  identifier                  = "cognee"
  engine                      = "postgres"
  engine_version              = "17"
  instance_class              = "db.r6g.xlarge"
  allocated_storage           = 100
  max_allocated_storage       = 1000
  storage_type                = "gp3"
  db_name                     = "cognee"
  username                    = "cognee"
  manage_master_user_password = true
  multi_az                    = true
  backup_retention_period     = 14
  storage_encrypted           = true
}
Wire the external services into Helm:
helm upgrade --install cognee ./cognee \
  --set env.DB_PROVIDER=postgres \
  --set env.VECTOR_DB_PROVIDER=pgvector \
  --set env.GRAPH_DATABASE_PROVIDER=neo4j \
  --set externalPostgres.host="$(terraform output -raw host)"
Typical production shape:
  • Postgres or RDS for relational metadata
  • pgvector, Qdrant, Pinecone, or ChromaDB for vectors
  • Neo4j or the FalkorDB adapter for graph writes
  • Cognee API pods configured as storage-backed application nodes with writable local paths for ingestion artifacts and caches
ProsCons
Highest write concurrency once graph storage is externalized.Most infrastructure to provision, secure, and pay for.
Each tier scales, backs up, and fails over independently.More moving parts means more failure modes and monitoring.
Managed services can provide HA, backups, and secret rotation.Cross-service latency replaces local file access on the hot path.
variable "name" {
  default = "cognee"
}

variable "vpc_id" {}
variable "subnet_ids" {
  type = list(string)
}
variable "app_sg" {}

resource "aws_db_subnet_group" "this" {
  name       = "${var.name}-db"
  subnet_ids = var.subnet_ids
}

resource "aws_security_group" "db" {
  name_prefix = "${var.name}-db-"
  vpc_id      = var.vpc_id

  ingress {
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [var.app_sg]
  }
}

resource "aws_db_instance" "cognee" {
  identifier                  = var.name
  engine                      = "postgres"
  engine_version              = "17"
  instance_class              = "db.r6g.xlarge"
  allocated_storage           = 100
  max_allocated_storage       = 1000
  storage_type                = "gp3"
  db_name                     = "cognee"
  username                    = "cognee"
  manage_master_user_password = true
  multi_az                    = true
  backup_retention_period     = 14
  storage_encrypted           = true
  db_subnet_group_name        = aws_db_subnet_group.this.name
  vpc_security_group_ids      = [aws_security_group.db.id]
}

# Create the pgvector extension once before using Postgres as a vector store.
# Run this through a migration, pre-deploy hook, or database init job:
# CREATE EXTENSION IF NOT EXISTS vector;

output "host" {
  value = aws_db_instance.cognee.address
}

output "secret_arn" {
  value = aws_db_instance.cognee.master_user_secret[0].secret_arn
}
helm upgrade --install cognee ./cognee \
  --set externalPostgres.host="$(terraform output -raw host)" \
  --set env.DB_PROVIDER=postgres \
  --set env.VECTOR_DB_PROVIDER=pgvector \
  --set env.GRAPH_DATABASE_PROVIDER=neo4j

6. Cloud service mapping

The patterns are cloud-agnostic. The concrete services differ by platform.
PrimitiveAWSAzure
Managed KubernetesEKSAKS
Writer block storageEBS gp3Managed Disks / Premium SSD
Shared filesystemEFSAzure Files Premium NFS
Snapshots / object storeS3Blob Storage
Private registryECRACR
SecretsSecrets Manager / SSMKey Vault
Pod identityIRSAWorkload Identity
Relational tierRDS / Aurora PostgresAzure Database for PostgreSQL Flexible Server
LLM endpointBedrock or self-hosted vLLMAzure OpenAI or self-hosted vLLM
Keep the hot graph on block storage or an external graph service. Use shared filesystems only when the pattern truly requires cross-process file sharing.

7. Production readiness

Schema migrations

  • Relational: pin the Cognee version per environment and run migrations before deploying the live writer.
  • Graph: additive model changes are safest. Renames, removals, and new required fields need a data migration or rebuild from source.
  • Vector: rebuild collections when embedding dimension, distance metric, or metadata schema changes.

Backups, restore, and DR

  • Embedded: route writes through one writer, then snapshot the data directory to object storage.
  • Externalized: back up each tier independently, such as Postgres dumps, managed snapshots, graph dumps, and vector snapshots.
  • Region failure: use cross-region snapshot replication and warm standby. Active-active DR is not a good fit for file-backed Kuzu.
  • Test restore on every release.

Tenant isolation

LevelMechanismNotes
Logicaldataset_name and user filtersCheap, but every query must apply the right scope.
BackendENABLE_BACKEND_ACCESS_CONTROL=truePer-user and per-dataset storage isolation.
InfrastructureSeparate deployments, namespaces, and databasesUse for hard regulatory boundaries.

LLM egress and authentication

  • Use self-hosted vLLM, customer-approved proxies, Bedrock private access, Azure OpenAI in-subscription, or fully air-gapped patterns when egress is restricted.
  • Front the Cognee API with the customer gateway. Terminate OIDC or mTLS there rather than exposing the Cognee port directly.
  • Use service-to-service mTLS or cluster-native identity.
  • Propagate user identity through a gateway-validated header when Cognee needs user-scoped access.

Appendix - options at a glance

OptionWriterBest forSkeleton
EmbeddedSingle processPrototypes, notebooks, single-user agentspip + env vars
Docker ComposeSingle serviceOn-prem validation, first production stepCompose + pgvector
HelmSingle serviceCustomers already on KubernetesChart + values
SidecarSingle servicePer-agent private memoryPod spec
Snapshot replicasOne writer, many readersHeavy read traffic, static knowledgeS3 + reader deployment
QueueOne workerBursty or event-driven ingestionSQS + worker
Lambda read-onlyOffline writerHTTP memory API, scale-to-zeroLambda image
Lambda + EFSSingle writer plus queueServerless writable memoryEFS + Lambda
ExternalizedExternal graph-backed writesSustained multi-agent writesRDS + Neo4j/FalkorDB adapter + vector DB
For provider-specific configuration, see Graph Stores, Vector Stores, and Relational Databases.