Deployment facts
These facts anchor the rest of the guide:
Common assumption Correct model Why it matters The default graph backend is a separate named service The default graph backend is embedded, file-backed Kuzu. NetworkX is the in-memory fallback. The on-disk graph directory is cognee_graph_kuzu. A shared service gives moderate write concurrency A shared service over file-backed Kuzu is still single-writer. The service boundary centralizes the writer; it does not make file-backed writes concurrent. Helm includes an API, worker, and queue by default Cognee ships one FastAPI image. Worker and queue splits are operator patterns. Add a queue and worker deployment only when your write model needs them.
Executive summary
Cognee runs either embedded in a Python process or as a FastAPI service. Operationally, each deployment comes down to three independent choices:
Writer ownership: who owns writes
Storage location: where graph, vector, and relational state live
Reader access path: how readers reach memory
The default embedded stack is file-backed: Kuzu for graph storage, LanceDB for vectors, and SQLite for relational metadata. In production, each tier can be externalized independently: Neo4j or the FalkorDB adapter for graph storage, Qdrant/pgvector/Pinecone/ChromaDB for vectors, and Postgres for relational metadata.
Deployment patterns are composable. Choose a base shape first, such as Embedded SDK, Compose, Helm, sidecar, or Lambda, then add the write model, read-scaling pattern, and storage backend that match your workload.
Pick-a-path decision tree
Start with writer ownership. The first “no” on single-writer ownership pushes you toward queueing or external backends.
Single writer?
|-- yes
| `-- Few readers?
| |-- yes -> Embedded SDK
| `-- no -> Self-hosted FastAPI service
| `-- Add snapshot read replicas when reads need to scale
`-- no
`-- Concurrent or multi-agent writes?
|-- Bursty / event-driven -> Queue + single writer worker
`-- Sustained concurrency -> Externalized backends
Postgres + Neo4j/FalkorDB adapter + Qdrant/pgvector
Storage model
Cognee has three independent storage layers:
Layer Selector Default Production options Graph GRAPH_DATABASE_PROVIDERkuzukuzu-remote, neo4j, FalkorDB adapterVector VECTOR_DB_PROVIDERlancedbpgvector, qdrant, pinecone, chromadbRelational DB_PROVIDERsqlitepostgres
Embedded storage is file-backed and easy to move:
~/.cognee/
`-- databases/
|-- cognee_graph_kuzu
|-- cognee_graph_kuzu.wal
|-- lancedb/
`-- system.db
That makes backups simple, but the same file-backed layout is why one process should own writes. In production, replace each layer with service-backed systems by changing the provider and connection variables.
Concurrency and the write model
This is the primary production decision. Packaging is secondary.
Write model Shape Maps to Concurrency Single process One process owns local files Embedded SDK Lowest Shared service Many clients, one Cognee backend, one writer Compose, Helm, sidecar Centralized, still one writer Queue-based Producers enqueue, one worker consumes SQS, Kafka, RabbitMQ, Redis Good for bursty writes Managed backends Graph, vector, and relational tiers are external services Neo4j/FalkorDB adapter, Qdrant/pgvector, Postgres Highest
For concurrent multi-agent writes, do not rely on shared file-backed Kuzu. Use a single writer service, a queue, or an external graph backend.
1. Embedded deployment
Cognee runs inside the calling Python process. Storage defaults to the local Cognee directories and can be moved with DATA_ROOT_DIRECTORY and SYSTEM_ROOT_DIRECTORY. One process owns the writer lock; there is no service boundary or cross-machine sharing unless the directory is mounted or copied.
import os
os.environ[ "DATA_ROOT_DIRECTORY" ] = "/data/.cognee_data"
os.environ[ "SYSTEM_ROOT_DIRECTORY" ] = "/data/.cognee_system"
os.environ[ "GRAPH_DATABASE_PROVIDER" ] = "kuzu"
os.environ[ "VECTOR_DB_PROVIDER" ] = "lancedb"
os.environ[ "DB_PROVIDER" ] = "sqlite"
import cognee
await cognee.remember( "..." , dataset_name = "docs" )
await cognee.recall( "..." )
Best for notebooks, CLIs, local agents, and single-process jobs. Avoid it for cross-process concurrent writes.
Pros Cons Zero infrastructure; runs in a notebook, CLI, or local agent. Single writer; no concurrent writes across processes. Lowest latency because there is no network hop to storage. No service boundary or shared multi-machine access. Simple backup model: snapshot one data/system directory. State is only as durable as the local disk or mount.
import asyncio
import os
os.environ[ "DATA_ROOT_DIRECTORY" ] = "/data/.cognee_data"
os.environ[ "SYSTEM_ROOT_DIRECTORY" ] = "/data/.cognee_system"
os.environ[ "GRAPH_DATABASE_PROVIDER" ] = "kuzu"
os.environ[ "VECTOR_DB_PROVIDER" ] = "lancedb"
os.environ[ "DB_PROVIDER" ] = "sqlite"
import cognee
async def main ():
await cognee.remember( "Cognee turns documents into AI memory." , dataset_name = "docs" )
results = await cognee.recall( "What does Cognee do?" )
print (results)
if __name__ == "__main__" :
asyncio.run(main())
2. Self-hosted service
2a. Docker Compose
2b. Helm on Kubernetes
2c. Sidecar
Use Compose to validate Cognee inside customer or on-prem infrastructure before moving to Kubernetes. Pin the image, persist data to a named volume, expose health checks, and load secrets from a managed source rather than plaintext .env files. services :
cognee :
image : cognee/cognee:1.0.6
depends_on :
postgres :
condition : service_healthy
environment :
DB_PROVIDER : postgres
DB_HOST : postgres
DB_PORT : "5432"
DB_NAME : cognee
DB_USERNAME : cognee
DB_PASSWORD_FILE : /run/secrets/db_password
VECTOR_DB_PROVIDER : pgvector
GRAPH_DATABASE_PROVIDER : kuzu
DATA_ROOT_DIRECTORY : /data/.cognee_data
SYSTEM_ROOT_DIRECTORY : /data/.cognee_system
secrets :
- db_password
volumes :
- cognee_data:/data
ports :
- "8000:8000"
postgres :
image : pgvector/pgvector:pg17
environment :
POSTGRES_USER : cognee
POSTGRES_DB : cognee
POSTGRES_PASSWORD_FILE : /run/secrets/db_password
secrets :
- db_password
volumes :
- pg_data:/var/lib/postgresql/data
healthcheck :
test : [ "CMD-SHELL" , "pg_isready -U cognee -d cognee" ]
interval : 10s
retries : 5
secrets :
db_password :
file : ./secrets/db_password.txt
volumes :
cognee_data : {}
pg_data : {}
See Docker Deployment for the full Compose workflow. Pros Cons Fastest path to a persistent service on customer infrastructure. Single host; vertical scaling only. Easy to externalize one tier at a time, such as Postgres/pgvector first. Compose secrets are weaker than a managed secret store. Health checks and named volumes give practical readiness and durability. Graph writes stay single-writer until the graph tier is externalized.
services :
cognee :
image : cognee/cognee:1.0.6
depends_on :
postgres :
condition : service_healthy
environment :
DB_PROVIDER : postgres
DB_HOST : postgres
DB_PORT : "5432"
DB_NAME : cognee
DB_USERNAME : cognee
DB_PASSWORD_FILE : /run/secrets/db_password
VECTOR_DB_PROVIDER : pgvector
VECTOR_DB_HOST : postgres
VECTOR_DB_PORT : "5432"
VECTOR_DB_NAME : cognee
GRAPH_DATABASE_PROVIDER : kuzu
DATA_ROOT_DIRECTORY : /data/.cognee_data
SYSTEM_ROOT_DIRECTORY : /data/.cognee_system
LLM_PROVIDER : openai
LLM_API_KEY_FILE : /run/secrets/llm_api_key
secrets :
- db_password
- llm_api_key
volumes :
- cognee_data:/data
ports :
- "8000:8000"
healthcheck :
test : [ "CMD" , "curl" , "-f" , "http://localhost:8000/health" ]
interval : 15s
retries : 5
postgres :
image : pgvector/pgvector:pg17
environment :
POSTGRES_USER : cognee
POSTGRES_DB : cognee
POSTGRES_PASSWORD_FILE : /run/secrets/db_password
secrets :
- db_password
volumes :
- pg_data:/var/lib/postgresql/data
healthcheck :
test : [ "CMD-SHELL" , "pg_isready -U cognee -d cognee" ]
interval : 10s
retries : 5
secrets :
db_password :
file : ./secrets/db_password.txt
llm_api_key :
file : ./secrets/llm_api_key.txt
volumes :
cognee_data : {}
pg_data : {}
Use Helm when the customer already operates Kubernetes. With embedded graph storage, encode the single-writer invariant with one replica, a Recreate strategy, and a read-write-once PVC. Queue and worker templates are add-ons, not defaults. cognee/
|-- Chart.yaml
|-- values.yaml
`-- templates/
|-- deployment.yaml
|-- service.yaml
|-- pvc.yaml
|-- secret.yaml
|-- networkpolicy.yaml
|-- pdb.yaml
`-- ingress.yaml
image :
repository : cognee/cognee
tag : "1.0.6"
replicas : 1
strategy : Recreate
persistence :
storageClass : gp3
size : 8Gi
env :
GRAPH_DATABASE_PROVIDER : kuzu
VECTOR_DB_PROVIDER : pgvector
DB_PROVIDER : postgres
ENABLE_BACKEND_ACCESS_CONTROL : "true"
REQUIRE_AUTHENTICATION : "true"
externalPostgres :
host : cognee.example.rds.amazonaws.com
port : 5432
database : cognee
Production hardening should include pinned image tags, managed secrets, resource limits, readiness probes, network policies, and a PodDisruptionBudget for the writer. See Kubernetes (Helm) . Pros Cons Production-grade packaging with secrets, network policy, and observability hooks. Requires a Kubernetes platform and team to operate it. Each storage tier can be externalized through values. Queue, worker split, and ExternalSecrets are operator additions. Recreate, RWO PVC, and PDB encode the single-writer invariant.Single-writer still holds until the graph tier is externalized.
# templates/deployment.yaml
apiVersion : apps/v1
kind : Deployment
metadata :
name : cognee
spec :
replicas : {{ .Values.replicas }}
strategy :
type : {{ .Values.strategy }}
selector :
matchLabels :
app : cognee
template :
metadata :
labels :
app : cognee
spec :
initContainers :
- name : wait-for-postgres
image : pgvector/pgvector:pg17
command :
- sh
- -c
- until pg_isready -h {{ .Values.externalPostgres.host }}; do sleep 2; done
containers :
- name : cognee
image : "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
ports :
- containerPort : 8000
env :
- name : DB_PROVIDER
value : "{{ .Values.env.DB_PROVIDER }}"
- name : DB_HOST
value : "{{ .Values.externalPostgres.host }}"
- name : VECTOR_DB_PROVIDER
value : "{{ .Values.env.VECTOR_DB_PROVIDER }}"
- name : GRAPH_DATABASE_PROVIDER
value : "{{ .Values.env.GRAPH_DATABASE_PROVIDER }}"
- name : LLM_API_KEY
valueFrom :
secretKeyRef :
name : cognee-secrets
key : llmApiKey
volumeMounts :
- name : data
mountPath : /data
readinessProbe :
httpGet :
path : /health
port : 8000
resources :
{{- toYaml .Values.resources | nindent 12 }}
volumes :
- name : data
persistentVolumeClaim :
claimName : cognee-data
---
# templates/networkpolicy.yaml
apiVersion : networking.k8s.io/v1
kind : NetworkPolicy
metadata :
name : cognee
spec :
podSelector :
matchLabels :
app : cognee
policyTypes :
- Egress
egress :
- to :
- podSelector :
matchLabels :
app : postgres
ports :
- port : 5432
- to :
- ipBlock :
cidr : 0.0.0.0/0
ports :
- port : 443
---
# templates/pdb.yaml
apiVersion : policy/v1
kind : PodDisruptionBudget
metadata :
name : cognee
spec :
maxUnavailable : 0
selector :
matchLabels :
app : cognee
Run Cognee next to an agent in the same pod when one agent needs private, low-latency memory. The agent talks to Cognee over localhost; Cognee owns the storage connection and serializes writes. containers :
- name : agent
image : my-agent:1.2.0
env :
- name : COGNEE_URL
value : "http://localhost:8000"
- name : cognee
image : cognee/cognee:1.0.6
ports :
- containerPort : 8000
volumeMounts :
- name : memory
mountPath : /data
volumes :
- name : memory
persistentVolumeClaim :
claimName : agent-memory
This isolates memory per pod. It does not create a shared multi-agent write path unless storage is externalized. Pros Cons Lowest-latency service boundary over localhost. Memory is scoped to one pod unless storage is shared or externalized. Per-agent isolation by construction. Scales with agent pods, not independently. No separate Cognee service lifecycle to operate. Wasteful if Cognee is mostly idle beside each agent.
Full sidecar pod skeleton
apiVersion : v1
kind : Pod
metadata :
name : agent-with-cognee
spec :
containers :
- name : agent
image : my-agent:1.2.0
env :
- name : COGNEE_URL
value : "http://localhost:8000"
- name : cognee
image : cognee/cognee:1.0.6
ports :
- containerPort : 8000
env :
- name : DATA_ROOT_DIRECTORY
value : /data/.cognee_data
- name : SYSTEM_ROOT_DIRECTORY
value : /data/.cognee_system
volumeMounts :
- name : memory
mountPath : /data
volumes :
- name : memory
persistentVolumeClaim :
claimName : agent-memory
3. Scale-out patterns
Scale-out patterns are layered on top of a self-hosted deployment. They scale reads or isolate write jobs; they do not replace the write path.
Use one writer to run remember(), publish a snapshot to object storage, and let many readers pull the latest snapshot at startup. This scales reads without a clustered graph database, at the cost of freshness. writer -> snapshot.tar -> S3 / MinIO
|-> reader-1
|-> reader-2
`-> reader-N
initContainers :
- name : pull-snapshot
image : amazon/aws-cli
command :
- sh
- -c
- aws s3 cp s3://cognee-snapshots/latest.tar /data/latest.tar && tar -xf /data/latest.tar -C /data
containers :
- name : cognee-reader
image : cognee/cognee:1.0.6
env :
- name : COGNEE_READ_ONLY
value : "true"
Use this for heavy read traffic and static or slowly changing knowledge. Benchmark snapshot size because it drives reader cold-start time. Pros Cons Horizontal read scaling without a clustered graph database. Freshness is bounded by snapshot cadence. Readers are stateless and replaceable. Snapshot size drives cold-start time. Cheap: object storage instead of a database fleet. Not for collaborative or strict-freshness workloads.
Full reader deployment skeleton
apiVersion : apps/v1
kind : Deployment
metadata :
name : cognee-reader
spec :
replicas : 6
selector :
matchLabels :
app : cognee-reader
template :
metadata :
labels :
app : cognee-reader
spec :
initContainers :
- name : pull-snapshot
image : amazon/aws-cli
command :
- sh
- -c
- aws s3 cp s3://cognee-snapshots/latest.tar /data/latest.tar && tar -xf /data/latest.tar -C /data
volumeMounts :
- name : snapshot
mountPath : /data
containers :
- name : cognee-reader
image : cognee/cognee:1.0.6
env :
- name : COGNEE_READ_ONLY
value : "true"
- name : DATA_ROOT_DIRECTORY
value : /data/.cognee_data
- name : SYSTEM_ROOT_DIRECTORY
value : /data/.cognee_system
volumeMounts :
- name : snapshot
mountPath : /data
volumes :
- name : snapshot
emptyDir : {}
Use a durable queue when producers are bursty but Cognee still needs one writer. Producers submit jobs; one worker consumes them and owns the write path. resource "aws_sqs_queue" "cognee_dlq" {
name = "cognee-ingest-dlq"
}
resource "aws_sqs_queue" "cognee_ingest" {
name = "cognee-ingest"
visibility_timeout_seconds = 900
redrive_policy = jsonencode ({
deadLetterTargetArn = aws_sqs_queue.cognee_dlq.arn
maxReceiveCount = 3
})
}
Run the worker deployment with replicas: 1 so it protects the single-writer path. Pros Cons Absorbs write bursts; producers do not block on the writer. One worker is a throughput ceiling by design. DLQ and retry give durable, observable ingestion. Adds a queue to operate and monitor. Works naturally for event sources such as Jira, Confluence, S3, Kafka, or dlt. End-to-end write latency becomes eventual, not synchronous.
Full queue and worker skeleton
resource "aws_sqs_queue" "cognee_dlq" {
name = "cognee-ingest-dlq"
}
resource "aws_sqs_queue" "cognee_ingest" {
name = "cognee-ingest"
visibility_timeout_seconds = 900
redrive_policy = jsonencode ({
deadLetterTargetArn = aws_sqs_queue.cognee_dlq.arn
maxReceiveCount = 3
})
}
output "ingest_queue_url" {
value = aws_sqs_queue . cognee_ingest . url
}
apiVersion : apps/v1
kind : Deployment
metadata :
name : cognee-writer
spec :
replicas : 1
selector :
matchLabels :
app : cognee-writer
template :
metadata :
labels :
app : cognee-writer
spec :
containers :
- name : worker
image : cognee/cognee:1.0.6
env :
- name : INGEST_QUEUE_URL
valueFrom :
secretKeyRef :
name : cognee-queue
key : ingestQueueUrl
4. Serverless and managed
Serverless patterns are useful for HTTP-fronted memory APIs and scheduled jobs. They are rarely the primary on-prem pattern.
Run remember() offline, package the resulting Kuzu/LanceDB files into the deployment artifact or Lambda layer, and open them read-only at runtime. resource "aws_lambda_function" "cognee_reader" {
function_name = "cognee-reader"
package_type = "Image"
image_uri = " ${ var . ecr_repo } :1.0.6-snapshot"
memory_size = 3008
timeout = 30
environment {
variables = {
COGNEE_READ_ONLY = "true"
DATA_ROOT_DIRECTORY = "/var/task/.cognee_data"
GRAPH_DATABASE_PROVIDER = "kuzu"
VECTOR_DB_PROVIDER = "lancedb"
}
}
}
This scales to zero and is rollback-friendly, but every knowledge update requires rebuilding and redeploying the snapshot. Pros Cons No servers; scales to zero with per-request billing. Read-only; every knowledge update needs a rebuilt artifact. Immutable artifact is reproducible and rollback-friendly. Cold start scales with artifact size. No runtime database service to operate. Bounded by Lambda image and runtime limits.
Full read-only Lambda skeleton
resource "aws_lambda_function" "cognee_reader" {
function_name = "cognee-reader"
package_type = "Image"
image_uri = " ${ var . ecr_repo } :1.0.6-snapshot"
memory_size = 3008
timeout = 30
environment {
variables = {
COGNEE_READ_ONLY = "true"
DATA_ROOT_DIRECTORY = "/var/task/.cognee_data"
SYSTEM_ROOT_DIRECTORY = "/var/task/.cognee_system"
GRAPH_DATABASE_PROVIDER = "kuzu"
VECTOR_DB_PROVIDER = "lancedb"
DB_PROVIDER = "sqlite"
}
}
}
Mount EFS and point Cognee at that mount for writable serverless memory. EFS gives shared storage, not write coordination, so concurrent writes still need a queue or external graph backend. resource "aws_efs_file_system" "cognee" {
encrypted = true
}
resource "aws_lambda_function" "cognee_efs" {
function_name = "cognee-efs"
package_type = "Image"
image_uri = " ${ var . ecr_repo } :1.0.6"
timeout = 120
file_system_config {
arn = aws_efs_access_point . cognee . arn
local_mount_path = "/mnt/cognee"
}
environment {
variables = {
DATA_ROOT_DIRECTORY = "/mnt/cognee/.cognee_data"
SYSTEM_ROOT_DIRECTORY = "/mnt/cognee/.cognee_system"
}
}
}
Avoid EFS where POSIX locking matters under concurrency. Prefer block storage or an external graph backend for the hot graph path. Pros Cons Writable persistent memory without managing a server. No write coordination; concurrent writes still need queueing or external graph storage. EFS survives Lambda redeploys. EFS latency can hurt the hot graph path. Shared across invocations. Requires VPC wiring, security groups, and NAT or private LLM egress.
Full Lambda + EFS skeleton
resource "aws_efs_file_system" "cognee" {
encrypted = true
}
resource "aws_efs_access_point" "cognee" {
file_system_id = aws_efs_file_system . cognee . id
posix_user {
gid = 1000
uid = 1000
}
root_directory {
path = "/cognee"
creation_info {
owner_gid = 1000
owner_uid = 1000
permissions = "0755"
}
}
}
resource "aws_lambda_function" "cognee_efs" {
function_name = "cognee-efs"
package_type = "Image"
image_uri = " ${ var . ecr_repo } :1.0.6"
memory_size = 3008
timeout = 120
vpc_config {
subnet_ids = var . private_subnets
security_group_ids = [ var . lambda_sg ]
}
file_system_config {
arn = aws_efs_access_point . cognee . arn
local_mount_path = "/mnt/cognee"
}
environment {
variables = {
DATA_ROOT_DIRECTORY = "/mnt/cognee/.cognee_data"
SYSTEM_ROOT_DIRECTORY = "/mnt/cognee/.cognee_system"
}
}
}
5. Externalized backends
Use external services for sustained multi-agent writes and independent scaling of each storage layer. This removes the file-backed graph single-writer ceiling once the graph tier is externalized.
resource "aws_db_instance" "cognee" {
identifier = "cognee"
engine = "postgres"
engine_version = "17"
instance_class = "db.r6g.xlarge"
allocated_storage = 100
max_allocated_storage = 1000
storage_type = "gp3"
db_name = "cognee"
username = "cognee"
manage_master_user_password = true
multi_az = true
backup_retention_period = 14
storage_encrypted = true
}
Wire the external services into Helm:
helm upgrade --install cognee ./cognee \
--set env.DB_PROVIDER=postgres \
--set env.VECTOR_DB_PROVIDER=pgvector \
--set env.GRAPH_DATABASE_PROVIDER=neo4j \
--set externalPostgres.host="$( terraform output -raw host)"
Typical production shape:
Postgres or RDS for relational metadata
pgvector, Qdrant, Pinecone, or ChromaDB for vectors
Neo4j or the FalkorDB adapter for graph writes
Cognee API pods configured as storage-backed application nodes with writable local paths for ingestion artifacts and caches
Pros Cons Highest write concurrency once graph storage is externalized. Most infrastructure to provision, secure, and pay for. Each tier scales, backs up, and fails over independently. More moving parts means more failure modes and monitoring. Managed services can provide HA, backups, and secret rotation. Cross-service latency replaces local file access on the hot path.
Full externalized backend skeleton
variable "name" {
default = "cognee"
}
variable "vpc_id" {}
variable "subnet_ids" {
type = list ( string )
}
variable "app_sg" {}
resource "aws_db_subnet_group" "this" {
name = " ${ var . name } -db"
subnet_ids = var . subnet_ids
}
resource "aws_security_group" "db" {
name_prefix = " ${ var . name } -db-"
vpc_id = var . vpc_id
ingress {
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [ var . app_sg ]
}
}
resource "aws_db_instance" "cognee" {
identifier = var . name
engine = "postgres"
engine_version = "17"
instance_class = "db.r6g.xlarge"
allocated_storage = 100
max_allocated_storage = 1000
storage_type = "gp3"
db_name = "cognee"
username = "cognee"
manage_master_user_password = true
multi_az = true
backup_retention_period = 14
storage_encrypted = true
db_subnet_group_name = aws_db_subnet_group . this . name
vpc_security_group_ids = [ aws_security_group . db . id ]
}
# Create the pgvector extension once before using Postgres as a vector store.
# Run this through a migration, pre-deploy hook, or database init job:
# CREATE EXTENSION IF NOT EXISTS vector;
output "host" {
value = aws_db_instance . cognee . address
}
output "secret_arn" {
value = aws_db_instance . cognee . master_user_secret [ 0 ] . secret_arn
}
helm upgrade --install cognee ./cognee \
--set externalPostgres.host="$( terraform output -raw host)" \
--set env.DB_PROVIDER=postgres \
--set env.VECTOR_DB_PROVIDER=pgvector \
--set env.GRAPH_DATABASE_PROVIDER=neo4j
6. Cloud service mapping
The patterns are cloud-agnostic. The concrete services differ by platform.
Primitive AWS Azure Managed Kubernetes EKS AKS Writer block storage EBS gp3 Managed Disks / Premium SSD Shared filesystem EFS Azure Files Premium NFS Snapshots / object store S3 Blob Storage Private registry ECR ACR Secrets Secrets Manager / SSM Key Vault Pod identity IRSA Workload Identity Relational tier RDS / Aurora Postgres Azure Database for PostgreSQL Flexible Server LLM endpoint Bedrock or self-hosted vLLM Azure OpenAI or self-hosted vLLM
Keep the hot graph on block storage or an external graph service. Use shared filesystems only when the pattern truly requires cross-process file sharing.
7. Production readiness
Schema migrations
Relational: pin the Cognee version per environment and run migrations before deploying the live writer.
Graph: additive model changes are safest. Renames, removals, and new required fields need a data migration or rebuild from source.
Vector: rebuild collections when embedding dimension, distance metric, or metadata schema changes.
Backups, restore, and DR
Embedded: route writes through one writer, then snapshot the data directory to object storage.
Externalized: back up each tier independently, such as Postgres dumps, managed snapshots, graph dumps, and vector snapshots.
Region failure: use cross-region snapshot replication and warm standby. Active-active DR is not a good fit for file-backed Kuzu.
Test restore on every release.
Tenant isolation
Level Mechanism Notes Logical dataset_name and user filtersCheap, but every query must apply the right scope. Backend ENABLE_BACKEND_ACCESS_CONTROL=truePer-user and per-dataset storage isolation. Infrastructure Separate deployments, namespaces, and databases Use for hard regulatory boundaries.
LLM egress and authentication
Use self-hosted vLLM, customer-approved proxies, Bedrock private access, Azure OpenAI in-subscription, or fully air-gapped patterns when egress is restricted.
Front the Cognee API with the customer gateway. Terminate OIDC or mTLS there rather than exposing the Cognee port directly.
Use service-to-service mTLS or cluster-native identity.
Propagate user identity through a gateway-validated header when Cognee needs user-scoped access.
Appendix - options at a glance
Option Writer Best for Skeleton Embedded Single process Prototypes, notebooks, single-user agents pip + env vars Docker Compose Single service On-prem validation, first production step Compose + pgvector Helm Single service Customers already on Kubernetes Chart + values Sidecar Single service Per-agent private memory Pod spec Snapshot replicas One writer, many readers Heavy read traffic, static knowledge S3 + reader deployment Queue One worker Bursty or event-driven ingestion SQS + worker Lambda read-only Offline writer HTTP memory API, scale-to-zero Lambda image Lambda + EFS Single writer plus queue Serverless writable memory EFS + Lambda Externalized External graph-backed writes Sustained multi-agent writes RDS + Neo4j/FalkorDB adapter + vector DB
For provider-specific configuration, see Graph Stores , Vector Stores , and Relational Databases .