Setup Configuration - Cognee Documentation

Configure Cognee to use your preferred LLM, embedding engine, relational database, vector store, and graph store via environment variables in a local .env file. This section provides beginner-friendly guides for setting up different backends, with detailed technical information available in expandable sections.

What You Can Configure

Cognee uses a flexible architecture that lets you choose the best tools for your needs. We recommend starting with the defaults to get familiar with Cognee, then customizing each component as needed:

LLM Providers — Choose from OpenAI, Azure OpenAI, Google Gemini, Anthropic, Ollama, or custom providers (like vLLM) for text generation and reasoning tasks
Structured Output Backends — Configure LiteLLM + Instructor or BAML for reliable data extraction from LLM responses
Embedding Providers — Select from OpenAI, Azure OpenAI, Google Gemini, Mistral, Ollama, Fastembed, or custom embedding services to create vector representations for semantic search
Relational Databases — Use SQLite for local development or Postgres for production to store metadata, documents, and system state
Vector Stores — Store embeddings in built-in backends such as LanceDB, PGVector, ChromaDB, or Neptune Analytics, or use community adapters such as Qdrant, Redis, and FalkorDB
Graph Stores — Build knowledge graphs with Kuzu, Kuzu-remote, Neo4j, Neptune, Neptune Analytics, or Memgraph to manage relationships and reasoning
Dataset Separation & Access Control — Configure dataset-level permissions and isolation
Sessions & Caching — Enable conversational memory with Redis or filesystem cache adapters

Want to run Cognee without a cloud API key? See the Local Setup guide for step-by-step instructions using Ollama and Fastembed.

How `.env` Is Loaded

Cognee loads .env values when the Python package is imported. Keep the file in your project root, or in the directory from which you run Python, so it is available before Cognee creates its runtime configuration objects.

Cognee loads .env with overwrite behavior enabled. If the same key is set in both your shell and .env, the value from .env is the one Cognee uses after import.

Configuration Precedence

Priority	Source	When to use
1	Runtime configuration methods, such as `cognee.config.set("llm_model", "...")`	Temporary changes inside one Python process
2	Values in `.env`	Persistent local development configuration
3	Shell, deployment, or `os.environ` variables	CI, containers, hosted deployments, secrets managers, and tests
4	Cognee defaults	Local defaults when nothing is configured

Runtime configuration methods update Cognee’s in-memory config objects and stay active for the duration of the current Python process, or until you call another setter. They do not write changes back to .env.

Using os.environ

Setting os.environ["KEY"] = "value" changes the current Python process environment. Use it for Cognee only before importing Cognee, and mainly for process or deployment settings:

import os

os.environ["LOG_LEVEL"] = "ERROR"
os.environ["COGNEE_LOG_FILE"] = "false"

import cognee

After import cognee, do not rely on os.environ to change Cognee behavior. Some code paths read environment variables lazily, but others read them during import, application startup, or cached config creation. Post-import os.environ changes are therefore inconsistent.If the same key also exists in .env, Cognee’s import-time .env loading overwrites the earlier os.environ value:

import os

os.environ["LLM_MODEL"] = "openai/gpt-4o-mini"

import cognee

# .env
LLM_MODEL="openai/gpt-5-mini"

In this case, Cognee uses openai/gpt-5-mini after import. Use os.environ before importing Cognee only for keys that are not also defined in .env.Use .env, shell variables, deployment variables, or pre-import os.environ for settings such as:

Area	Environment variables
Auth and access control	`ENABLE_BACKEND_ACCESS_CONTROL`, `REQUIRE_AUTHENTICATION`, `FASTAPI_USERS_JWT_SECRET`, `JWT_LIFETIME_SECONDS`, `HASH_API_KEY`, `ALLOW_HTTP_REQUESTS`, `ALLOW_CYPHER_QUERY`, `ACCEPT_LOCAL_FILE_PATH`
Logging	`LOG_LEVEL`, `COGNEE_LOG_FILE`, `COGNEE_LOGS_DIR`, `COGNEE_LOG_MAX_BYTES`, `COGNEE_LOG_BACKUP_COUNT`, `COGNEE_LOG_SEARCH_HISTORY`
Cache and sessions	`CACHING`, `CACHE_BACKEND`, `CACHE_HOST`, `CACHE_PORT`, `CACHE_USERNAME`, `CACHE_PASSWORD`, `SESSION_TTL_SECONDS`
Storage and cloud credentials	`STORAGE_BACKEND`, `STORAGE_BUCKET_NAME`, `AWS_REGION`, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_ENDPOINT_URL`, `COGNEE_CLOUD_API_URL`, `COGNEE_CLOUD_AUTH_TOKEN`
Web/API/telemetry	`HTTP_API_HOST`, `HTTP_API_PORT`, `CORS_ALLOWED_ORIGINS`, `TAVILY_API_KEY`, `WEB_SCRAPER_TIMEOUT`, `TELEMETRY_DISABLED`, `ENV`

If you need to change supported runtime settings after import, use cognee.config.set(...) because it updates Cognee’s in-memory runtime config directly:

import cognee

cognee.config.set("llm_model", "openai/gpt-5-mini")

What Can Be Overwritten at Runtime

Use cognee.config.set(...) for runtime-safe Cognee settings: values that can be changed inside the current Python process without reinitializing the whole application. This mainly covers LLMs, embeddings, graph databases, vector databases, chunking, model overrides, and data/system root directories. For the full method list and the exact internal key names accepted by bulk setters, see the Python API config reference.

import cognee

cognee.config.set("llm_model", "openai/gpt-5-mini")
cognee.config.set("embedding_provider", "fastembed")
cognee.config.set("vector_db_provider", "lancedb")
cognee.config.set("vector_db_url", "./.cognee_system/databases/cognee.lancedb")

cognee.config.set(key, value) supports these generic keys:

Area	Supported keys
LLM	`llm_provider`, `llm_model`, `llm_api_key`, `llm_endpoint`
Embeddings	`embedding_provider`, `embedding_model`, `embedding_dimensions`, `embedding_endpoint`, `embedding_api_key`, `embedding_api_version`, `embedding_max_completion_tokens`, `embedding_batch_size`, `huggingface_tokenizer`
Graph database	`graph_database_provider`, `graph_database_subprocess_enabled`, `kuzu_num_threads`, `kuzu_buffer_pool_size`, `kuzu_max_db_size`
Vector database	`vector_db_provider`, `vector_db_subprocess_enabled`, `vector_db_url`, `vector_db_key`
Chunking	`chunk_size`, `chunk_overlap`, `chunk_strategy`, `chunk_engine`
Models	`classification_model`, `summarization_model`, `graph_model`
Storage paths	`system_root_directory`, `data_root_directory`

cognee.config.set(...) can replace .env or os.environ only for the supported runtime config keys above. It does not replace process-level environment variables.Keep these in .env, shell/deployment variables, or pre-import os.environ: ENABLE_BACKEND_ACCESS_CONTROL, REQUIRE_AUTHENTICATION, CACHING, CACHE_BACKEND, LOG_LEVEL, COGNEE_LOG_FILE, STORAGE_BACKEND, TAVILY_API_KEY, TELEMETRY_DISABLED, HTTP_API_HOST, HTTP_API_PORT, and cloud or AWS credentials.

cognee.config.set(key, value) is not a free-form setter. Unsupported keys raise an error instead of silently creating new settings.

When to Restart

Restart your Python process, server, notebook kernel, or container after editing .env if Cognee has already been imported. Runtime setters are useful for short-lived overrides, but .env changes are safest when applied before import.When changing storage backends, database providers, embedding dimensions, or other settings that affect persisted data, review the pruning warning in the Configuration Workflow section before running ingestion again.

Default backends and when connections are established

With a plain pip install cognee (no extras), Cognee uses three bundled, file-based backends. None of them require a separate server, and no extra dependencies are needed:

Role	Default provider	Where data lives
Relational (metadata, documents, state)	SQLite (`DB_PROVIDER=sqlite`, database `cognee_db`)	`<SYSTEM_ROOT_DIRECTORY>/databases/cognee_db`
Vector (embeddings, semantic search)	LanceDB (`VECTOR_DB_PROVIDER=lancedb`)	`<SYSTEM_ROOT_DIRECTORY>/databases/cognee.lancedb`
Graph (entities, relationships)	Ladybug (Kuzu-compatible) (`GRAPH_DATABASE_PROVIDER=ladybug`)	`<SYSTEM_ROOT_DIRECTORY>/databases/cognee_graph_ladybug`

Extras such as cognee[postgres], cognee[neo4j], cognee[chromadb], or cognee[neptune] are only required when you switch a backend to one of those providers. The defaults above work without any of them.Connections are not opened at import. import cognee only loads .env and builds in-memory configuration objects — it does not connect to any database. Each backend engine is created lazily, the first time an operation actually needs it (for example during add(), cognify(), or search()), and is then cached and reused for the rest of the process. For the file-based defaults, the database files are created automatically under SYSTEM_ROOT_DIRECTORY on first use, so there is no startup connection step to configure.

Environment Variable Quick Reference

The tables below list the most commonly used configuration variables. For full details on each group, follow the links to the dedicated guides.

Only a small number of internal variables use the COGNEE_ prefix: COGNEE_LOGS_DIR, COGNEE_TRACING_ENABLED, COGNEE_CLOUD_API_URL, and COGNEE_CLOUD_AUTH_TOKEN. All other configuration keys (LLM, embedding, database, etc.) are used without any prefix.

LLM

Variable	Default	Description
`LLM_PROVIDER`	`openai`	Provider: `openai`, `azure`, `gemini`, `anthropic`, `ollama`, `mistral`, `bedrock`, `custom`
`LLM_MODEL`	`openai/gpt-4o-mini`	Model in `provider/model-name` format
`LLM_API_KEY`	—	API key for the LLM provider
`LLM_ENDPOINT`	—	Custom endpoint URL (required for Ollama, vLLM, etc.)
`LLM_API_VERSION`	—	API version (required for Azure)
`LLM_TEMPERATURE`	`0.0`	Response temperature (0.0–2.0)

Embeddings

Variable	Default	Description
`EMBEDDING_PROVIDER`	`openai`	Provider: `openai`, `ollama`, `fastembed`, `gemini`, `mistral`, `bedrock`, `custom`
`EMBEDDING_MODEL`	`openai/text-embedding-3-large`	Model in `provider/model-name` format
`EMBEDDING_DIMENSIONS`	`3072`	Vector dimension size (must match your vector store)
`EMBEDDING_API_KEY`	—	API key (falls back to `LLM_API_KEY` if unset)
`EMBEDDING_ENDPOINT`	—	Custom endpoint URL (required for Ollama, etc.)
`HUGGINGFACE_TOKENIZER`	—	HuggingFace Hub model ID for token counting with Ollama (e.g. `nomic-ai/nomic-embed-text-v1.5`)
`TOKENIZERS_PARALLELISM`	—	Optional environment variable used by Hugging Face tokenizers. If Cognee loads a Hugging Face tokenizer, setting this to `false` can suppress the “tokenizers parallelism” warning that may appear in forked or multi-process environments.

Databases

Variable	Default	Description
`DB_PROVIDER`	`sqlite`	Relational DB: `sqlite`, `postgres`
`DB_HOST` / `DB_PORT` / `DB_USERNAME` / `DB_PASSWORD`	—	Postgres connection details
`VECTOR_DB_PROVIDER`	`lancedb`	Vector store provider. Built-in options include `lancedb`, `pgvector`, `chromadb`, and `neptune_analytics`; community adapters add providers such as `qdrant`, `redis`, and `falkordb`.
`VECTOR_DB_URL`	—	Vector store connection URL
`GRAPH_DATABASE_PROVIDER`	`ladybug`	Graph store: `ladybug`, `ladybug-remote`, `kuzu`, `kuzu-remote`, `neo4j`, `neptune`
`GRAPH_DATABASE_URL`	—	Graph store connection URL
`GRAPH_DATABASE_USERNAME` / `GRAPH_DATABASE_PASSWORD`	—	Graph store credentials

Storage & Logging

Variable	Default	Description
`STORAGE_BACKEND`	`local`	Storage backend: `local`, `s3`
`DATA_ROOT_DIRECTORY`	`.data_storage`	Root directory for data files
`SYSTEM_ROOT_DIRECTORY`	`.cognee_system`	Root directory for system files
`COGNEE_LOGS_DIR`	`{package}/logs`	Override the logs directory path
`LOG_LEVEL`	`INFO`	Logging level: `DEBUG`, `INFO`, `WARNING`, `ERROR`
`TELEMETRY_DISABLED`	`false`	Set `true` to disable anonymous telemetry

Sessions & Caching

Cognee uses a cache backend to store session history (Q&A turns) so that searches with the same session_id can include prior interactions as conversational context. See Sessions and Caching for the full guide.

Variable	Default	Description
`CACHING`	`true`	Enable session caching. Set to `false` to run searches without conversational memory.
`CACHE_BACKEND`	`fs`	Cache backend: `fs` (local disk via diskcache), `redis` (shared, multi-process), or `tapes` (local cache plus Tapes mirroring).
`CACHE_HOST`	`localhost`	Redis hostname (used when `CACHE_BACKEND=redis`).
`CACHE_PORT`	`6379`	Redis port.
`CACHE_USERNAME`	—	Optional Redis username.
`CACHE_PASSWORD`	—	Optional Redis password.
`SESSION_TTL_SECONDS`	`604800`	Expiry for cached session entries (7 days). Set to `0` to disable expiry.

Use fs for local development or single-process setups. Use redis for production, distributed deployments, or when multiple processes need to share session state. Use tapes when you want filesystem-backed sessions plus mirroring of new Q&A turns to a running Tapes ingest service.

Debug Mode

To enable verbose logging in a self-hosted Cognee instance, set LOG_LEVEL in your .env:

LOG_LEVEL="DEBUG"

Verbose logging covers pipeline execution, LLM calls, database queries, and graph operations—useful when troubleshooting data processing or provider configuration.

Docker Environment Variables

Use the same variable names as in your .env; pass them with docker run -e or load them from a file with --env-file.

Examples

docker run \
  -e LLM_PROVIDER=ollama \
  -e LLM_MODEL=ollama/llama3.2 \
  -e LLM_ENDPOINT=http://host.docker.internal:11434 \
  -e EMBEDDING_PROVIDER=ollama \
  -e EMBEDDING_MODEL=nomic-embed-text:latest \
  -e EMBEDDING_ENDPOINT=http://host.docker.internal:11434/api/embed \
  -e EMBEDDING_DIMENSIONS=768 \
  -e HUGGINGFACE_TOKENIZER=nomic-ai/nomic-embed-text-v1.5 \
  cognee/cognee:main

Or using an env file:

docker run --env-file .env cognee/cognee:main

Observability & Telemetry

Cognee includes built-in telemetry to help you monitor and debug your knowledge graph operations. You can control telemetry behavior with environment variables:

TELEMETRY_DISABLED (boolean, optional): Set to true to disable all telemetry collection (default: false)

When telemetry is enabled, Cognee automatically collects:

Search query performance metrics
Processing pipeline execution times
Error rates and debugging information
System resource usage

Telemetry data helps improve Cognee’s performance and reliability. It’s collected anonymously and doesn’t include your actual data content.

Configuration Workflow

Install Cognee with all optional dependencies:
- Local setup: uv sync --all-extras
- Library: pip install "cognee[all]"
Create a .env file in your project root (if you haven’t already) — see Installation for details
Choose your preferred providers and follow the configuration instructions from the guides below

Configuration Changes: If you’ve already run Cognee with default settings and are now changing your configuration (e.g., switching from SQLite to Postgres, or changing vector stores), you should call pruning operations before the next cognification to ensure data consistency.

LLM/Embedding Configuration: If you configure only LLM or only embeddings, the other defaults to OpenAI. Ensure you have a working OpenAI API key, or configure both LLM and embeddings to avoid unexpected defaults.

LLM Providers

Configure OpenAI, Azure, Gemini, Anthropic, Ollama, or custom LLM providers (like vLLM)

Structured Output Backends

Configure LiteLLM + Instructor or BAML for reliable data extraction

Embedding Providers

Set up OpenAI, Mistral, Ollama, Fastembed, or custom embedding services

Relational Databases

Choose between SQLite for local development or Postgres for production

Vector Stores

Configure LanceDB, PGVector, Qdrant, Redis, ChromaDB, FalkorDB, or Neptune Analytics

Graph Stores

Set up Kuzu, Neo4j, or Neptune for knowledge graph storage

​What You Can Configure

​How .env Is Loaded

​Environment Variable Quick Reference

​Docker Environment Variables

​Observability & Telemetry

​Configuration Workflow

LLM Providers

Structured Output Backends

Embedding Providers

Relational Databases

Vector Stores

Graph Stores

What You Can Configure

How `.env` Is Loaded

Environment Variable Quick Reference

Docker Environment Variables

Observability & Telemetry

Configuration Workflow