Deployment Options - Cognee Documentation

Deployment facts

These facts anchor the rest of the guide:

Common assumption	Correct model	Why it matters
The default graph backend is a separate named service	The default graph backend is embedded, file-backed Kuzu. NetworkX is the in-memory fallback.	The on-disk graph directory is `cognee_graph_kuzu`.
A shared service gives moderate write concurrency	A shared service over file-backed Kuzu is still single-writer.	The service boundary centralizes the writer; it does not make file-backed writes concurrent.
Helm includes an API, worker, and queue by default	Cognee ships one FastAPI image. Worker and queue splits are operator patterns.	Add a queue and worker deployment only when your write model needs them.

Executive summary

Cognee runs either embedded in a Python process or as a FastAPI service. Operationally, each deployment comes down to three independent choices:

Writer ownership: who owns writes
Storage location: where graph, vector, and relational state live
Reader access path: how readers reach memory

The default embedded stack is file-backed: Kuzu for graph storage, LanceDB for vectors, and SQLite for relational metadata. In production, each tier can be externalized independently: Neo4j or the FalkorDB adapter for graph storage, Qdrant/pgvector/Pinecone/ChromaDB for vectors, and Postgres for relational metadata. Deployment patterns are composable. Choose a base shape first, such as Embedded SDK, Compose, Helm, sidecar, or Lambda, then add the write model, read-scaling pattern, and storage backend that match your workload.

Pick-a-path decision tree

Start with writer ownership. The first “no” on single-writer ownership pushes you toward queueing or external backends.

Single writer?
|-- yes
|   `-- Few readers?
|       |-- yes -> Embedded SDK
|       `-- no  -> Self-hosted FastAPI service
|                 `-- Add snapshot read replicas when reads need to scale
`-- no
    `-- Concurrent or multi-agent writes?
        |-- Bursty / event-driven -> Queue + single writer worker
        `-- Sustained concurrency -> Externalized backends
                                     Postgres + Neo4j/FalkorDB adapter + Qdrant/pgvector

Storage model

Cognee has three independent storage layers:

Layer	Selector	Default	Production options
Graph	`GRAPH_DATABASE_PROVIDER`	`kuzu`	`kuzu-remote`, `neo4j`, FalkorDB adapter
Vector	`VECTOR_DB_PROVIDER`	`lancedb`	`pgvector`, `qdrant`, `pinecone`, `chromadb`
Relational	`DB_PROVIDER`	`sqlite`	`postgres`

Embedded storage is file-backed and easy to move. If SYSTEM_ROOT_DIRECTORY is unset, Cognee resolves it to .cognee_system, so the default on-disk layout still lands under <SYSTEM_ROOT_DIRECTORY>/databases:

<SYSTEM_ROOT_DIRECTORY>/
`-- databases/
    |-- cognee_graph_kuzu
    |-- cognee_graph_kuzu.wal
    |-- lancedb/
    `-- system.db

That makes backups simple, but the same file-backed layout is why one process should own writes. In production, replace each layer with service-backed systems by changing the provider and connection variables.

Concurrency and the write model

This is the primary production decision. Packaging is secondary.

Write model	Shape	Maps to	Concurrency
Single process	One process owns local files	Embedded SDK	Lowest
Shared service	Many clients, one Cognee backend, one writer	Compose, Helm, sidecar	Centralized, still one writer
Queue-based	Producers enqueue, one worker consumes	SQS, Kafka, RabbitMQ, Redis	Good for bursty writes
Managed backends	Graph, vector, and relational tiers are external services	Neo4j/FalkorDB adapter, Qdrant/pgvector, Postgres	Highest

For concurrent multi-agent writes, do not rely on shared file-backed Kuzu. Use a single writer service, a queue, or an external graph backend.

1. Embedded deployment

Cognee runs inside the calling Python process. Storage defaults to the local Cognee directories and can be moved with DATA_ROOT_DIRECTORY and SYSTEM_ROOT_DIRECTORY. One process owns the writer lock; there is no service boundary or cross-machine sharing unless the directory is mounted or copied.

import os

os.environ["DATA_ROOT_DIRECTORY"] = "/data/.cognee_data"
os.environ["SYSTEM_ROOT_DIRECTORY"] = "/data/.cognee_system"
os.environ["GRAPH_DATABASE_PROVIDER"] = "kuzu"
os.environ["VECTOR_DB_PROVIDER"] = "lancedb"
os.environ["DB_PROVIDER"] = "sqlite"

import cognee

await cognee.remember("...", dataset_name="docs")
await cognee.recall("...")

Best for notebooks, CLIs, local agents, and single-process jobs. Avoid it for cross-process concurrent writes.

Pros	Cons
Zero infrastructure; runs in a notebook, CLI, or local agent.	Single writer; no concurrent writes across processes.
Lowest latency because there is no network hop to storage.	No service boundary or shared multi-machine access.
Simple backup model: snapshot one data/system directory.	State is only as durable as the local disk or mount.

Full embedded script

import asyncio
import os

os.environ["DATA_ROOT_DIRECTORY"] = "/data/.cognee_data"
os.environ["SYSTEM_ROOT_DIRECTORY"] = "/data/.cognee_system"
os.environ["GRAPH_DATABASE_PROVIDER"] = "kuzu"
os.environ["VECTOR_DB_PROVIDER"] = "lancedb"
os.environ["DB_PROVIDER"] = "sqlite"

import cognee


async def main():
    await cognee.remember("Cognee turns documents into AI memory.", dataset_name="docs")
    results = await cognee.recall("What does Cognee do?")
    print(results)


if __name__ == "__main__":
    asyncio.run(main())

2. Self-hosted service

2a. Docker Compose
2b. Helm on Kubernetes
2c. Sidecar

Use Compose to validate Cognee inside customer or on-prem infrastructure before moving to Kubernetes. Pin the image, persist data to a named volume, expose health checks, and load secrets from a managed source rather than plaintext .env files.

services:
  cognee:
    image: cognee/cognee:1.0.6
    depends_on:
      postgres:
        condition: service_healthy
    environment:
      DB_PROVIDER: postgres
      DB_HOST: postgres
      DB_PORT: "5432"
      DB_NAME: cognee
      DB_USERNAME: cognee
      DB_PASSWORD_FILE: /run/secrets/db_password
      VECTOR_DB_PROVIDER: pgvector
      GRAPH_DATABASE_PROVIDER: kuzu
      DATA_ROOT_DIRECTORY: /data/.cognee_data
      SYSTEM_ROOT_DIRECTORY: /data/.cognee_system
    secrets:
      - db_password
    volumes:
      - cognee_data:/data
    ports:
      - "8000:8000"

  postgres:
    image: pgvector/pgvector:pg17
    environment:
      POSTGRES_USER: cognee
      POSTGRES_DB: cognee
      POSTGRES_PASSWORD_FILE: /run/secrets/db_password
    secrets:
      - db_password
    volumes:
      - pg_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U cognee -d cognee"]
      interval: 10s
      retries: 5

secrets:
  db_password:
    file: ./secrets/db_password.txt

volumes:
  cognee_data: {}
  pg_data: {}

See Docker Deployment for the full Compose workflow.

Pros	Cons
Fastest path to a persistent service on customer infrastructure.	Single host; vertical scaling only.
Easy to externalize one tier at a time, such as Postgres/pgvector first.	Compose secrets are weaker than a managed secret store.
Health checks and named volumes give practical readiness and durability.	Graph writes stay single-writer until the graph tier is externalized.

Full Compose skeleton

services:
  cognee:
    image: cognee/cognee:1.0.6
    depends_on:
      postgres:
        condition: service_healthy
    environment:
      DB_PROVIDER: postgres
      DB_HOST: postgres
      DB_PORT: "5432"
      DB_NAME: cognee
      DB_USERNAME: cognee
      DB_PASSWORD_FILE: /run/secrets/db_password
      VECTOR_DB_PROVIDER: pgvector
      VECTOR_DB_HOST: postgres
      VECTOR_DB_PORT: "5432"
      VECTOR_DB_NAME: cognee
      GRAPH_DATABASE_PROVIDER: kuzu
      DATA_ROOT_DIRECTORY: /data/.cognee_data
      SYSTEM_ROOT_DIRECTORY: /data/.cognee_system
      LLM_PROVIDER: openai
      LLM_API_KEY_FILE: /run/secrets/llm_api_key
    secrets:
      - db_password
      - llm_api_key
    volumes:
      - cognee_data:/data
    ports:
      - "8000:8000"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 15s
      retries: 5

  postgres:
    image: pgvector/pgvector:pg17
    environment:
      POSTGRES_USER: cognee
      POSTGRES_DB: cognee
      POSTGRES_PASSWORD_FILE: /run/secrets/db_password
    secrets:
      - db_password
    volumes:
      - pg_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U cognee -d cognee"]
      interval: 10s
      retries: 5

secrets:
  db_password:
    file: ./secrets/db_password.txt
  llm_api_key:
    file: ./secrets/llm_api_key.txt

volumes:
  cognee_data: {}
  pg_data: {}

Use Helm when the customer already operates Kubernetes. With embedded graph storage, encode the single-writer invariant with one replica, a Recreate strategy, and a read-write-once PVC. Queue and worker templates are add-ons, not defaults.

cognee/
|-- Chart.yaml
|-- values.yaml
`-- templates/
    |-- deployment.yaml
    |-- service.yaml
    |-- pvc.yaml
    |-- secret.yaml
    |-- networkpolicy.yaml
    |-- pdb.yaml
    `-- ingress.yaml

image:
  repository: cognee/cognee
  tag: "1.0.6"

replicas: 1
strategy: Recreate

persistence:
  storageClass: gp3
  size: 8Gi

env:
  GRAPH_DATABASE_PROVIDER: kuzu
  VECTOR_DB_PROVIDER: pgvector
  DB_PROVIDER: postgres
  ENABLE_BACKEND_ACCESS_CONTROL: "true"
  REQUIRE_AUTHENTICATION: "true"

externalPostgres:
  host: cognee.example.rds.amazonaws.com
  port: 5432
  database: cognee

Production hardening should include pinned image tags, managed secrets, resource limits, readiness probes, network policies, and a PodDisruptionBudget for the writer. See Kubernetes (Helm).

Pros	Cons
Production-grade packaging with secrets, network policy, and observability hooks.	Requires a Kubernetes platform and team to operate it.
Each storage tier can be externalized through values.	Queue, worker split, and ExternalSecrets are operator additions.
`Recreate`, RWO PVC, and PDB encode the single-writer invariant.	Single-writer still holds until the graph tier is externalized.

Full Helm skeleton

# templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cognee
spec:
  replicas: {{ .Values.replicas }}
  strategy:
    type: {{ .Values.strategy }}
  selector:
    matchLabels:
      app: cognee
  template:
    metadata:
      labels:
        app: cognee
    spec:
      initContainers:
        - name: wait-for-postgres
          image: pgvector/pgvector:pg17
          command:
            - sh
            - -c
            - until pg_isready -h {{ .Values.externalPostgres.host }}; do sleep 2; done
      containers:
        - name: cognee
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
          ports:
            - containerPort: 8000
          env:
            - name: DB_PROVIDER
              value: "{{ .Values.env.DB_PROVIDER }}"
            - name: DB_HOST
              value: "{{ .Values.externalPostgres.host }}"
            - name: VECTOR_DB_PROVIDER
              value: "{{ .Values.env.VECTOR_DB_PROVIDER }}"
            - name: GRAPH_DATABASE_PROVIDER
              value: "{{ .Values.env.GRAPH_DATABASE_PROVIDER }}"
            - name: LLM_API_KEY
              valueFrom:
                secretKeyRef:
                  name: cognee-secrets
                  key: llmApiKey
          volumeMounts:
            - name: data
              mountPath: /data
          readinessProbe:
            httpGet:
              path: /health
              port: 8000
          resources:
            {{- toYaml .Values.resources | nindent 12 }}
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: cognee-data
---
# templates/networkpolicy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: cognee
spec:
  podSelector:
    matchLabels:
      app: cognee
  policyTypes:
    - Egress
  egress:
    - to:
        - podSelector:
            matchLabels:
              app: postgres
      ports:
        - port: 5432
    - to:
        - ipBlock:
            cidr: 0.0.0.0/0
      ports:
        - port: 443
---
# templates/pdb.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: cognee
spec:
  maxUnavailable: 0
  selector:
    matchLabels:
      app: cognee

Run Cognee next to an agent in the same pod when one agent needs private, low-latency memory. The agent talks to Cognee over localhost; Cognee owns the storage connection and serializes writes.

containers:
  - name: agent
    image: my-agent:1.2.0
    env:
      - name: COGNEE_URL
        value: "http://localhost:8000"
  - name: cognee
    image: cognee/cognee:1.0.6
    ports:
      - containerPort: 8000
    volumeMounts:
      - name: memory
        mountPath: /data
volumes:
  - name: memory
    persistentVolumeClaim:
      claimName: agent-memory

This isolates memory per pod. It does not create a shared multi-agent write path unless storage is externalized.

Pros	Cons
Lowest-latency service boundary over `localhost`.	Memory is scoped to one pod unless storage is shared or externalized.
Per-agent isolation by construction.	Scales with agent pods, not independently.
No separate Cognee service lifecycle to operate.	Wasteful if Cognee is mostly idle beside each agent.

Full sidecar pod skeleton

apiVersion: v1
kind: Pod
metadata:
  name: agent-with-cognee
spec:
  containers:
    - name: agent
      image: my-agent:1.2.0
      env:
        - name: COGNEE_URL
          value: "http://localhost:8000"
    - name: cognee
      image: cognee/cognee:1.0.6
      ports:
        - containerPort: 8000
      env:
        - name: DATA_ROOT_DIRECTORY
          value: /data/.cognee_data
        - name: SYSTEM_ROOT_DIRECTORY
          value: /data/.cognee_system
      volumeMounts:
        - name: memory
          mountPath: /data
  volumes:
    - name: memory
      persistentVolumeClaim:
        claimName: agent-memory

3. Scale-out patterns

Scale-out patterns are layered on top of a self-hosted deployment. They scale reads or isolate write jobs; they do not replace the write path.

3a. Snapshot read replicas
3b. Queue-based write path

Use one writer to run remember(), publish a snapshot to object storage, and let many readers pull the latest snapshot at startup. This scales reads without a clustered graph database, at the cost of freshness.

writer -> snapshot.tar -> S3 / MinIO
                       |-> reader-1
                       |-> reader-2
                       `-> reader-N

initContainers:
  - name: pull-snapshot
    image: amazon/aws-cli
    command:
      - sh
      - -c
      - aws s3 cp s3://cognee-snapshots/latest.tar /data/latest.tar && tar -xf /data/latest.tar -C /data
containers:
  - name: cognee-reader
    image: cognee/cognee:1.0.6
    env:
      - name: COGNEE_READ_ONLY
        value: "true"

Use this for heavy read traffic and static or slowly changing knowledge. Benchmark snapshot size because it drives reader cold-start time.

Pros	Cons
Horizontal read scaling without a clustered graph database.	Freshness is bounded by snapshot cadence.
Readers are stateless and replaceable.	Snapshot size drives cold-start time.
Cheap: object storage instead of a database fleet.	Not for collaborative or strict-freshness workloads.

Full reader deployment skeleton

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cognee-reader
spec:
  replicas: 6
  selector:
    matchLabels:
      app: cognee-reader
  template:
    metadata:
      labels:
        app: cognee-reader
    spec:
      initContainers:
        - name: pull-snapshot
          image: amazon/aws-cli
          command:
            - sh
            - -c
            - aws s3 cp s3://cognee-snapshots/latest.tar /data/latest.tar && tar -xf /data/latest.tar -C /data
          volumeMounts:
            - name: snapshot
              mountPath: /data
      containers:
        - name: cognee-reader
          image: cognee/cognee:1.0.6
          env:
            - name: COGNEE_READ_ONLY
              value: "true"
            - name: DATA_ROOT_DIRECTORY
              value: /data/.cognee_data
            - name: SYSTEM_ROOT_DIRECTORY
              value: /data/.cognee_system
          volumeMounts:
            - name: snapshot
              mountPath: /data
      volumes:
        - name: snapshot
          emptyDir: {}

Use a durable queue when producers are bursty but Cognee still needs one writer. Producers submit jobs; one worker consumes them and owns the write path.

resource "aws_sqs_queue" "cognee_dlq" {
  name = "cognee-ingest-dlq"
}

resource "aws_sqs_queue" "cognee_ingest" {
  name                       = "cognee-ingest"
  visibility_timeout_seconds = 900
  redrive_policy = jsonencode({
    deadLetterTargetArn = aws_sqs_queue.cognee_dlq.arn
    maxReceiveCount     = 3
  })
}

Run the worker deployment with replicas: 1 so it protects the single-writer path.

Pros	Cons
Absorbs write bursts; producers do not block on the writer.	One worker is a throughput ceiling by design.
DLQ and retry give durable, observable ingestion.	Adds a queue to operate and monitor.
Works naturally for event sources such as Jira, Confluence, S3, Kafka, or dlt.	End-to-end write latency becomes eventual, not synchronous.

Full queue and worker skeleton

resource "aws_sqs_queue" "cognee_dlq" {
  name = "cognee-ingest-dlq"
}

resource "aws_sqs_queue" "cognee_ingest" {
  name                       = "cognee-ingest"
  visibility_timeout_seconds = 900
  redrive_policy = jsonencode({
    deadLetterTargetArn = aws_sqs_queue.cognee_dlq.arn
    maxReceiveCount     = 3
  })
}

output "ingest_queue_url" {
  value = aws_sqs_queue.cognee_ingest.url
}

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cognee-writer
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cognee-writer
  template:
    metadata:
      labels:
        app: cognee-writer
    spec:
      containers:
        - name: worker
          image: cognee/cognee:1.0.6
          env:
            - name: INGEST_QUEUE_URL
              valueFrom:
                secretKeyRef:
                  name: cognee-queue
                  key: ingestQueueUrl

4. Serverless and managed

Serverless patterns are useful for HTTP-fronted memory APIs and scheduled jobs. They are rarely the primary on-prem pattern.

4a. Lambda read-only artifact
4b. Lambda mutable graph on EFS

Run remember() offline, package the resulting Kuzu/LanceDB files into the deployment artifact or Lambda layer, and open them read-only at runtime.

resource "aws_lambda_function" "cognee_reader" {
  function_name = "cognee-reader"
  package_type  = "Image"
  image_uri     = "${var.ecr_repo}:1.0.6-snapshot"
  memory_size   = 3008
  timeout       = 30

  environment {
    variables = {
      COGNEE_READ_ONLY         = "true"
      DATA_ROOT_DIRECTORY     = "/var/task/.cognee_data"
      GRAPH_DATABASE_PROVIDER = "kuzu"
      VECTOR_DB_PROVIDER      = "lancedb"
    }
  }
}

This scales to zero and is rollback-friendly, but every knowledge update requires rebuilding and redeploying the snapshot.

Pros	Cons
No servers; scales to zero with per-request billing.	Read-only; every knowledge update needs a rebuilt artifact.
Immutable artifact is reproducible and rollback-friendly.	Cold start scales with artifact size.
No runtime database service to operate.	Bounded by Lambda image and runtime limits.

Full read-only Lambda skeleton

resource "aws_lambda_function" "cognee_reader" {
  function_name = "cognee-reader"
  package_type  = "Image"
  image_uri     = "${var.ecr_repo}:1.0.6-snapshot"
  memory_size   = 3008
  timeout       = 30

  environment {
    variables = {
      COGNEE_READ_ONLY         = "true"
      DATA_ROOT_DIRECTORY     = "/var/task/.cognee_data"
      SYSTEM_ROOT_DIRECTORY   = "/var/task/.cognee_system"
      GRAPH_DATABASE_PROVIDER = "kuzu"
      VECTOR_DB_PROVIDER      = "lancedb"
      DB_PROVIDER             = "sqlite"
    }
  }
}

Mount EFS and point Cognee at that mount for writable serverless memory. EFS gives shared storage, not write coordination, so concurrent writes still need a queue or external graph backend.

resource "aws_efs_file_system" "cognee" {
  encrypted = true
}

resource "aws_lambda_function" "cognee_efs" {
  function_name = "cognee-efs"
  package_type  = "Image"
  image_uri     = "${var.ecr_repo}:1.0.6"
  timeout       = 120

  file_system_config {
    arn              = aws_efs_access_point.cognee.arn
    local_mount_path = "/mnt/cognee"
  }

  environment {
    variables = {
      DATA_ROOT_DIRECTORY   = "/mnt/cognee/.cognee_data"
      SYSTEM_ROOT_DIRECTORY = "/mnt/cognee/.cognee_system"
    }
  }
}

Avoid EFS where POSIX locking matters under concurrency. Prefer block storage or an external graph backend for the hot graph path.

Pros	Cons
Writable persistent memory without managing a server.	No write coordination; concurrent writes still need queueing or external graph storage.
EFS survives Lambda redeploys.	EFS latency can hurt the hot graph path.
Shared across invocations.	Requires VPC wiring, security groups, and NAT or private LLM egress.

Full Lambda + EFS skeleton

resource "aws_efs_file_system" "cognee" {
  encrypted = true
}

resource "aws_efs_access_point" "cognee" {
  file_system_id = aws_efs_file_system.cognee.id

  posix_user {
    gid = 1000
    uid = 1000
  }

  root_directory {
    path = "/cognee"
    creation_info {
      owner_gid   = 1000
      owner_uid   = 1000
      permissions = "0755"
    }
  }
}

resource "aws_lambda_function" "cognee_efs" {
  function_name = "cognee-efs"
  package_type  = "Image"
  image_uri     = "${var.ecr_repo}:1.0.6"
  memory_size   = 3008
  timeout       = 120

  vpc_config {
    subnet_ids         = var.private_subnets
    security_group_ids = [var.lambda_sg]
  }

  file_system_config {
    arn              = aws_efs_access_point.cognee.arn
    local_mount_path = "/mnt/cognee"
  }

  environment {
    variables = {
      DATA_ROOT_DIRECTORY   = "/mnt/cognee/.cognee_data"
      SYSTEM_ROOT_DIRECTORY = "/mnt/cognee/.cognee_system"
    }
  }
}

Ephemeral cloud sandbox (Islo)

For a throwaway, HTTP-fronted API instance, Cognee ships a one-command deploy script that provisions an Islo cloud sandbox (2 vCPU / 4 GB / 10 GB), installs cognee[api] into a dedicated virtualenv, starts the FastAPI server on port 8000, gates on the internal /health endpoint, and then prints a public share URL that expires after 24 hours.

pip install islo
# Mint a key with the Islo CLI, then export it:
export ISLO_API_KEY=...   # islo api-key create cognee-deploy --expires 90 --show
export LLM_API_KEY=sk-...
python distributed/deploy/islo_sandbox.py

The Islo CLI is only used to mint the API key; the deployment itself is driven by the official Islo Python SDK. The sandbox name is fixed (cognee-api), so re-running the script while a previous deployment still exists fails with a name conflict — delete the old sandbox through the SDK first. You can still stop the sandbox separately when you want to pause it without recreating it. Best for demos and short-lived evaluation rather than durable state, since the share URL and sandbox are ephemeral. See distributed/deploy/README.md for the full runbook, required environment variables, and cleanup commands.

5. Externalized backends

Use external services for sustained multi-agent writes and independent scaling of each storage layer. This removes the file-backed graph single-writer ceiling once the graph tier is externalized.

resource "aws_db_instance" "cognee" {
  identifier                  = "cognee"
  engine                      = "postgres"
  engine_version              = "17"
  instance_class              = "db.r6g.xlarge"
  allocated_storage           = 100
  max_allocated_storage       = 1000
  storage_type                = "gp3"
  db_name                     = "cognee"
  username                    = "cognee"
  manage_master_user_password = true
  multi_az                    = true
  backup_retention_period     = 14
  storage_encrypted           = true
}

Wire the external services into Helm:

helm upgrade --install cognee ./cognee \
  --set env.DB_PROVIDER=postgres \
  --set env.VECTOR_DB_PROVIDER=pgvector \
  --set env.GRAPH_DATABASE_PROVIDER=neo4j \
  --set externalPostgres.host="$(terraform output -raw host)"

Typical production shape:

Postgres or RDS for relational metadata
pgvector, Qdrant, Pinecone, or ChromaDB for vectors
Neo4j or the FalkorDB adapter for graph writes
Cognee API pods configured as storage-backed application nodes with writable local paths for ingestion artifacts and caches

Pros	Cons
Highest write concurrency once graph storage is externalized.	Most infrastructure to provision, secure, and pay for.
Each tier scales, backs up, and fails over independently.	More moving parts means more failure modes and monitoring.
Managed services can provide HA, backups, and secret rotation.	Cross-service latency replaces local file access on the hot path.

Full externalized backend skeleton

variable "name" {
  default = "cognee"
}

variable "vpc_id" {}
variable "subnet_ids" {
  type = list(string)
}
variable "app_sg" {}

resource "aws_db_subnet_group" "this" {
  name       = "${var.name}-db"
  subnet_ids = var.subnet_ids
}

resource "aws_security_group" "db" {
  name_prefix = "${var.name}-db-"
  vpc_id      = var.vpc_id

  ingress {
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [var.app_sg]
  }
}

resource "aws_db_instance" "cognee" {
  identifier                  = var.name
  engine                      = "postgres"
  engine_version              = "17"
  instance_class              = "db.r6g.xlarge"
  allocated_storage           = 100
  max_allocated_storage       = 1000
  storage_type                = "gp3"
  db_name                     = "cognee"
  username                    = "cognee"
  manage_master_user_password = true
  multi_az                    = true
  backup_retention_period     = 14
  storage_encrypted           = true
  db_subnet_group_name        = aws_db_subnet_group.this.name
  vpc_security_group_ids      = [aws_security_group.db.id]
}

# Create the pgvector extension once before using Postgres as a vector store.
# Run this through a migration, pre-deploy hook, or database init job:
# CREATE EXTENSION IF NOT EXISTS vector;

output "host" {
  value = aws_db_instance.cognee.address
}

output "secret_arn" {
  value = aws_db_instance.cognee.master_user_secret[0].secret_arn
}

helm upgrade --install cognee ./cognee \
  --set externalPostgres.host="$(terraform output -raw host)" \
  --set env.DB_PROVIDER=postgres \
  --set env.VECTOR_DB_PROVIDER=pgvector \
  --set env.GRAPH_DATABASE_PROVIDER=neo4j

6. Cloud service mapping

The patterns are cloud-agnostic. The concrete services differ by platform.

Primitive	AWS	Azure
Managed Kubernetes	EKS	AKS
Writer block storage	EBS gp3	Managed Disks / Premium SSD
Shared filesystem	EFS	Azure Files Premium NFS
Snapshots / object store	S3	Blob Storage
Private registry	ECR	ACR
Secrets	Secrets Manager / SSM	Key Vault
Pod identity	IRSA	Workload Identity
Relational tier	RDS / Aurora Postgres	Azure Database for PostgreSQL Flexible Server
LLM endpoint	Bedrock or self-hosted vLLM	Azure OpenAI or self-hosted vLLM

Keep the hot graph on block storage or an external graph service. Use shared filesystems only when the pattern truly requires cross-process file sharing.

7. Production readiness

Schema migrations

Relational: pin the Cognee version per environment and run migrations before deploying the live writer.
Graph: additive model changes are safest. Renames, removals, and new required fields need a data migration or rebuild from source.
Vector: rebuild collections when embedding dimension, distance metric, or metadata schema changes.

Backups, restore, and DR

Embedded: route writes through one writer, then snapshot the data directory to object storage.
Externalized: back up each tier independently, such as Postgres dumps, managed snapshots, graph dumps, and vector snapshots.
Region failure: use cross-region snapshot replication and warm standby. Active-active DR is not a good fit for file-backed Kuzu.
Test restore on every release.

Tenant isolation

Level	Mechanism	Notes
Logical	`dataset_name` and user filters	Cheap, but every query must apply the right scope.
Backend	`ENABLE_BACKEND_ACCESS_CONTROL=true`	Per-user and per-dataset storage isolation.
Infrastructure	Separate deployments, namespaces, and databases	Use for hard regulatory boundaries.

LLM egress and authentication

Use self-hosted vLLM, customer-approved proxies, Bedrock private access, Azure OpenAI in-subscription, or fully air-gapped patterns when egress is restricted.
Front the Cognee API with the customer gateway. Terminate OIDC or mTLS there rather than exposing the Cognee port directly.
Use service-to-service mTLS or cluster-native identity.
Propagate user identity through a gateway-validated header when Cognee needs user-scoped access.

Appendix - options at a glance

Option	Writer	Best for	Skeleton
Embedded	Single process	Prototypes, notebooks, single-user agents	pip + env vars
Docker Compose	Single service	On-prem validation, first production step	Compose + pgvector
Helm	Single service	Customers already on Kubernetes	Chart + values
Sidecar	Single service	Per-agent private memory	Pod spec
Snapshot replicas	One writer, many readers	Heavy read traffic, static knowledge	S3 + reader deployment
Queue	One worker	Bursty or event-driven ingestion	SQS + worker
Lambda read-only	Offline writer	HTTP memory API, scale-to-zero	Lambda image
Lambda + EFS	Single writer plus queue	Serverless writable memory	EFS + Lambda
Islo sandbox	Single process	Ephemeral demos, throwaway API instances	`python distributed/deploy/islo_sandbox.py`
Externalized	External graph-backed writes	Sustained multi-agent writes	RDS + Neo4j/FalkorDB adapter + vector DB

For provider-specific configuration, see Graph Stores, Vector Stores, and Relational Databases.

​Deployment facts

​Executive summary

​Pick-a-path decision tree

​Storage model

​Concurrency and the write model

​1. Embedded deployment

​2. Self-hosted service

​3. Scale-out patterns

​4. Serverless and managed

​Ephemeral cloud sandbox (Islo)

​5. Externalized backends

​6. Cloud service mapping

​7. Production readiness

​Schema migrations

​Backups, restore, and DR

​Tenant isolation

​LLM egress and authentication

​Appendix - options at a glance

Deployment facts

Executive summary

Pick-a-path decision tree

Storage model

Concurrency and the write model

1. Embedded deployment

2. Self-hosted service

3. Scale-out patterns

4. Serverless and managed

Ephemeral cloud sandbox (Islo)

5. Externalized backends

6. Cloud service mapping

7. Production readiness

Schema migrations

Backups, restore, and DR

Tenant isolation

LLM egress and authentication

Appendix - options at a glance