> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cognee.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Setup Configuration

> Configure Cognee to use your preferred LLM, embedding engine, and storage backends

Configure Cognee to use your preferred LLM, embedding engine, relational database, vector store, and graph store via environment variables in a local `.env` file.

This section provides beginner-friendly guides for setting up different backends, with detailed technical information available in expandable sections.

## What You Can Configure

Cognee uses a flexible architecture that lets you choose the best tools for your needs. We recommend starting with the defaults to get familiar with Cognee, then customizing each component as needed:

* **[LLM Providers](./llm-providers)** — Choose from OpenAI, Azure OpenAI, Google Gemini, Anthropic, Ollama, or custom providers (like vLLM) for text generation and reasoning tasks
* **[Structured Output Backends](./structured-output-backends)** — Configure LiteLLM + Instructor or BAML for reliable data extraction from LLM responses
* **[Embedding Providers](./embedding-providers)** — Select from OpenAI, Azure OpenAI, Google Gemini, Mistral, Ollama, Fastembed, or custom embedding services to create vector representations for semantic search
* **[Relational Databases](./relational-databases)** — Use SQLite for local development or Postgres for production to store metadata, documents, and system state
* **[Vector Stores](./vector-stores)** — Store embeddings in built-in backends such as LanceDB, PGVector, ChromaDB, or Neptune Analytics, or use community adapters such as Qdrant, Redis, and FalkorDB
* **[Graph Stores](./graph-stores)** — Build knowledge graphs with Kuzu, Kuzu-remote, Neo4j, Neptune, Neptune Analytics, or Memgraph to manage relationships and reasoning
* **[Dataset Separation & Access Control](./permissions)** — Configure dataset-level permissions and isolation
* **[Sessions & Caching](../core-concepts/sessions-and-caching)** — Enable conversational memory with Redis or filesystem cache adapters

<Info>
  Want to run Cognee without a cloud API key? See the [Local Setup guide](/guides/local-setup) for step-by-step instructions using Ollama and Fastembed.
</Info>

## How `.env` Is Loaded

Cognee loads `.env` values when the Python package is imported. Keep the file in your project root, or in the directory from which you run Python, so it is available before Cognee creates its runtime configuration objects.

<Note>
  Cognee loads `.env` with overwrite behavior enabled. If the same key is set in both your shell and `.env`, the value from `.env` is the one Cognee uses after import.
</Note>

<AccordionGroup>
  <Accordion title="Configuration Precedence">
    | Priority | Source                                                                         | When to use                                                     |
    | -------- | ------------------------------------------------------------------------------ | --------------------------------------------------------------- |
    | 1        | Runtime configuration methods, such as `cognee.config.set("llm_model", "...")` | Temporary changes inside one Python process                     |
    | 2        | Values in `.env`                                                               | Persistent local development configuration                      |
    | 3        | Shell, deployment, or `os.environ` variables                                   | CI, containers, hosted deployments, secrets managers, and tests |
    | 4        | Cognee defaults                                                                | Local defaults when nothing is configured                       |

    Runtime configuration methods update Cognee's in-memory config objects and stay active for the duration of the current Python process, or until you call another setter. They do not write changes back to `.env`.
  </Accordion>

  <Accordion title="Using os.environ">
    Setting `os.environ["KEY"] = "value"` changes the current Python process environment. Use it for Cognee only before importing Cognee, and mainly for process or deployment settings:

    ```python theme={null}
    import os

    os.environ["LOG_LEVEL"] = "ERROR"
    os.environ["COGNEE_LOG_FILE"] = "false"

    import cognee
    ```

    After `import cognee`, do not rely on `os.environ` to change Cognee behavior. Some code paths read environment variables lazily, but others read them during import, application startup, or cached config creation. Post-import `os.environ` changes are therefore inconsistent.

    If the same key also exists in `.env`, Cognee's import-time `.env` loading overwrites the earlier `os.environ` value:

    ```python theme={null}
    import os

    os.environ["LLM_MODEL"] = "openai/gpt-4o-mini"

    import cognee
    ```

    ```dotenv theme={null}
    # .env
    LLM_MODEL="openai/gpt-5-mini"
    ```

    In this case, Cognee uses `openai/gpt-5-mini` after import. Use `os.environ` before importing Cognee only for keys that are not also defined in `.env`.

    Use `.env`, shell variables, deployment variables, or pre-import `os.environ` for settings such as:

    | Area                          | Environment variables                                                                                                                                                                                                                  |
    | ----------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
    | Auth and access control       | `ENABLE_BACKEND_ACCESS_CONTROL`, `REQUIRE_AUTHENTICATION`, `FASTAPI_USERS_JWT_SECRET`, `JWT_LIFETIME_SECONDS`, `HASH_API_KEY`, `ALLOW_HTTP_REQUESTS`, `ALLOW_CYPHER_QUERY`, `ACCEPT_LOCAL_FILE_PATH`                                   |
    | Logging                       | `LOG_LEVEL`, `COGNEE_LOG_FILE`, `COGNEE_LOGS_DIR`, `COGNEE_LOG_MAX_BYTES`, `COGNEE_LOG_BACKUP_COUNT`, `COGNEE_LOG_SEARCH_HISTORY`                                                                                                      |
    | Cache and sessions            | `CACHING`, `AUTO_FEEDBACK`, `CACHE_BACKEND`, `CACHE_HOST`, `CACHE_PORT`, `CACHE_USERNAME`, `CACHE_PASSWORD`, `SESSION_TTL_SECONDS`                                                                                                     |
    | Storage and cloud credentials | `STORAGE_BACKEND`, `STORAGE_BUCKET_NAME`, `AWS_REGION`, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_ENDPOINT_URL`, `COGNEE_SERVICE_URL`, `COGNEE_API_KEY` (legacy fallbacks: `COGNEE_CLOUD_API_URL`, `COGNEE_CLOUD_AUTH_TOKEN`) |
    | Web/API/telemetry             | `HTTP_API_HOST`, `HTTP_API_PORT`, `CORS_ALLOWED_ORIGINS`, `TAVILY_API_KEY`, `WEB_SCRAPER_TIMEOUT`, `TELEMETRY_DISABLED`, `ENV`                                                                                                         |

    If you need to change supported runtime settings after import, use `cognee.config.set(...)` because it updates Cognee's in-memory runtime config directly:

    ```python theme={null}
    import cognee

    cognee.config.set("llm_model", "openai/gpt-5-mini")
    ```
  </Accordion>

  <Accordion title="What Can Be Overwritten at Runtime">
    Use `cognee.config.set(...)` for runtime-safe Cognee settings: values that can be changed inside the current Python process without reinitializing the whole application. This mainly covers LLMs, embeddings, graph databases, vector databases, chunking, model overrides, and data/system root directories. For the full method list and the exact internal key names accepted by bulk setters, see the [Python API config reference](/python-api/config).

    ```python theme={null}
    import cognee

    cognee.config.set("llm_model", "openai/gpt-5-mini")
    cognee.config.set("embedding_provider", "fastembed")
    cognee.config.set("vector_db_provider", "lancedb")
    cognee.config.set("vector_db_url", "./.cognee_system/databases/cognee.lancedb")
    ```

    `cognee.config.set(key, value)` supports these generic keys:

    | Area            | Supported keys                                                                                                                                                                                                          |
    | --------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
    | LLM             | `llm_provider`, `llm_model`, `llm_api_key`, `llm_endpoint`                                                                                                                                                              |
    | Embeddings      | `embedding_provider`, `embedding_model`, `embedding_dimensions`, `embedding_endpoint`, `embedding_api_key`, `embedding_api_version`, `embedding_max_completion_tokens`, `embedding_batch_size`, `huggingface_tokenizer` |
    | Graph database  | `graph_database_provider`, `graph_database_subprocess_enabled`, `kuzu_num_threads`, `kuzu_buffer_pool_size`, `kuzu_max_db_size`                                                                                         |
    | Vector database | `vector_db_provider`, `vector_db_subprocess_enabled`, `vector_db_url`, `vector_db_key`                                                                                                                                  |
    | Chunking        | `chunk_size`, `chunk_overlap`, `chunk_strategy`, `chunk_engine`                                                                                                                                                         |
    | Models          | `classification_model`, `summarization_model`, `graph_model`                                                                                                                                                            |
    | Storage paths   | `system_root_directory`, `data_root_directory`                                                                                                                                                                          |

    `cognee.config.set(...)` can replace `.env` or `os.environ` only for the supported runtime config keys above. It does not replace process-level environment variables.

    Keep these in `.env`, shell/deployment variables, or pre-import `os.environ`: `ENABLE_BACKEND_ACCESS_CONTROL`, `REQUIRE_AUTHENTICATION`, `CACHING`, `CACHE_BACKEND`, `LOG_LEVEL`, `COGNEE_LOG_FILE`, `STORAGE_BACKEND`, `TAVILY_API_KEY`, `TELEMETRY_DISABLED`, `HTTP_API_HOST`, `HTTP_API_PORT`, and cloud or AWS credentials.

    <Warning>
      `cognee.config.set(key, value)` is not a free-form setter. Unsupported keys raise an error instead of silently creating new settings.
    </Warning>
  </Accordion>

  <Accordion title="When to Restart">
    Restart your Python process, server, notebook kernel, or container after editing `.env` if Cognee has already been imported. Runtime setters are useful for short-lived overrides, but `.env` changes are safest when applied before import.

    When changing storage backends, database providers, embedding dimensions, or other settings that affect persisted data, review the pruning warning in the Configuration Workflow section before running ingestion again.
  </Accordion>

  <Accordion title="Default backends and when connections are established">
    With a plain `pip install cognee` (no extras), Cognee uses three bundled, file-based backends. None of them require a separate server, and no extra dependencies are needed:

    | Role                                    | Default provider                                                                | Where data lives                                         |
    | --------------------------------------- | ------------------------------------------------------------------------------- | -------------------------------------------------------- |
    | Relational (metadata, documents, state) | [SQLite](./relational-databases) (`DB_PROVIDER=sqlite`, database `cognee_db`)   | `<SYSTEM_ROOT_DIRECTORY>/databases/cognee_db`            |
    | Vector (embeddings, semantic search)    | [LanceDB](./vector-stores) (`VECTOR_DB_PROVIDER=lancedb`)                       | `<SYSTEM_ROOT_DIRECTORY>/databases/cognee.lancedb`       |
    | Graph (entities, relationships)         | [Ladybug (Kuzu-compatible)](./graph-stores) (`GRAPH_DATABASE_PROVIDER=ladybug`) | `<SYSTEM_ROOT_DIRECTORY>/databases/cognee_graph_ladybug` |

    Extras such as `cognee[postgres]`, `cognee[neo4j]`, `cognee[chromadb]`, or `cognee[neptune]` are only required when you switch a backend to one of those providers. The defaults above work without any of them.

    **Connections are not opened at import.** `import cognee` only loads `.env` and builds in-memory configuration objects — it does not connect to any database. Each backend engine is created lazily, the first time an operation actually needs it (for example during `add()`, `cognify()`, or `search()`), and is then cached and reused for the rest of the process. For the file-based defaults, the database files are created automatically under `SYSTEM_ROOT_DIRECTORY` on first use, so there is no startup connection step to configure.
  </Accordion>

  <Accordion title="Backing up local data">
    With the default file-based backends, all of Cognee's persistent state lives in two directories on disk, so a backup is just a copy of those two trees:

    | Directory               | Default          | Contents                                                                                                                         |
    | ----------------------- | ---------------- | -------------------------------------------------------------------------------------------------------------------------------- |
    | `SYSTEM_ROOT_DIRECTORY` | `.cognee_system` | The relational (`databases/cognee_db`), vector (`databases/cognee.lancedb`), and graph (`databases/cognee_graph_ladybug`) stores |
    | `DATA_ROOT_DIRECTORY`   | `.data_storage`  | The raw ingested source files that `add()` copies into Cognee, plus filesystem session/cache data                                |

    Back up **both** directories together so the graph, vectors, relational metadata, and the source files they reference stay consistent with each other.

    **Stop writes before copying.** SQLite, LanceDB, and the Kuzu-compatible graph store are embedded databases that write directly to these files. Copying them while an `add()`, `cognify()`, `memify()`, or `delete()` operation is in progress can capture a half-written, corrupt snapshot. For a safe, consistent backup, make sure no Cognee process is actively ingesting or mutating data — stop your Cognee service (or wait for all pipelines to finish), then copy the directories:

    ```bash theme={null}
    # With the Cognee process stopped / idle
    cp -r .cognee_system .cognee_system.backup
    cp -r .data_storage .data_storage.backup
    ```

    Operations that only read from the stores are safe, but default graph-completion searches with session caching can write session/cache data. To get a fully consistent backup, keep Cognee idle while copying. To restore, stop Cognee and replace the two directories with your backed-up copies.

    If you have moved a backend off the file-based defaults — for example to [Postgres](./relational-databases), [PGVector](./vector-stores), or [Neo4j](./graph-stores) — back up that external database using its own tooling instead; only the file-based stores live under these directories.
  </Accordion>
</AccordionGroup>

## Environment Variable Quick Reference

The tables below list the most commonly used configuration variables. For full details on each group, follow the links to the dedicated guides.

<Note>
  Most configuration keys (LLM, embedding, database, etc.) are used without a `COGNEE_` prefix, but several Cognee-specific controls do use one, including logging, tracing, and cloud connection variables. The cloud-sync credentials `COGNEE_SERVICE_URL` and `COGNEE_API_KEY` are canonical; the older `COGNEE_CLOUD_API_URL` and `COGNEE_CLOUD_AUTH_TOKEN` names are still accepted as legacy fallbacks.
</Note>

<AccordionGroup>
  <Accordion title="LLM">
    | Variable          | Default             | Description                                                                                  |
    | ----------------- | ------------------- | -------------------------------------------------------------------------------------------- |
    | `LLM_PROVIDER`    | `openai`            | Provider: `openai`, `azure`, `gemini`, `anthropic`, `ollama`, `mistral`, `bedrock`, `custom` |
    | `LLM_MODEL`       | `openai/gpt-5-mini` | Model in `provider/model-name` format                                                        |
    | `LLM_API_KEY`     | —                   | API key for the LLM provider                                                                 |
    | `LLM_ENDPOINT`    | —                   | Custom endpoint URL (required for Ollama, vLLM, etc.)                                        |
    | `LLM_API_VERSION` | —                   | API version (required for Azure)                                                             |
    | `LLM_TEMPERATURE` | `0.0`               | Response temperature (0.0–2.0)                                                               |
  </Accordion>

  <Accordion title="Embeddings">
    | Variable                 | Default                         | Description                                                                                                                                                                                                                                 |
    | ------------------------ | ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
    | `EMBEDDING_PROVIDER`     | `openai`                        | Provider: `openai`, `ollama`, `fastembed`, `gemini`, `mistral`, `bedrock`, `custom`                                                                                                                                                         |
    | `EMBEDDING_MODEL`        | `openai/text-embedding-3-large` | Model in `provider/model-name` format                                                                                                                                                                                                       |
    | `EMBEDDING_DIMENSIONS`   | `3072`                          | Vector dimension size (must match your vector store)                                                                                                                                                                                        |
    | `EMBEDDING_API_KEY`      | —                               | API key (falls back to `LLM_API_KEY` if unset)                                                                                                                                                                                              |
    | `EMBEDDING_ENDPOINT`     | —                               | Custom endpoint URL (required for Ollama, etc.)                                                                                                                                                                                             |
    | `HUGGINGFACE_TOKENIZER`  | —                               | HuggingFace Hub model ID for token counting with Ollama (e.g. `nomic-ai/nomic-embed-text-v1.5`)                                                                                                                                             |
    | `TOKENIZERS_PARALLELISM` | —                               | Optional environment variable used by Hugging Face tokenizers. If Cognee loads a Hugging Face tokenizer, setting this to `false` can suppress the "tokenizers parallelism" warning that may appear in forked or multi-process environments. |
  </Accordion>

  <Accordion title="Databases">
    | Variable                                              | Default   | Description                                                                                                                                                                                                                                                    |
    | ----------------------------------------------------- | --------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
    | `DB_PROVIDER`                                         | `sqlite`  | Relational DB: `sqlite`, `postgres`                                                                                                                                                                                                                            |
    | `DB_HOST` / `DB_PORT` / `DB_USERNAME` / `DB_PASSWORD` | —         | Postgres connection details                                                                                                                                                                                                                                    |
    | `POOL_ARGS`                                           | —         | JSON SQLAlchemy connection-pool args for the relational Postgres engine and Postgres graph adapter. Must be a JSON object.                                                                                                                                     |
    | `VECTOR_DB_PROVIDER`                                  | `lancedb` | Vector store provider. Built-in options include `lancedb`, `pgvector`, `chromadb`, and `neptune_analytics`; community adapters add providers such as `qdrant`, `redis`, and `falkordb`.                                                                        |
    | `VECTOR_DB_URL`                                       | —         | Vector store connection URL                                                                                                                                                                                                                                    |
    | `VECTOR_POOL_ARGS`                                    | —         | JSON connection-pool args for per-dataset PGVector engines in multi-user mode. Must be a JSON object; invalid JSON raises a configuration error. When unset, PGVector per-dataset engines use `pool_size=2` and `max_overflow=2` under backend access control. |
    | `GRAPH_DATABASE_PROVIDER`                             | `ladybug` | Graph store: `ladybug`, `ladybug-remote`, `kuzu`, `kuzu-remote`, `neo4j`, `neptune`                                                                                                                                                                            |
    | `GRAPH_DATABASE_URL`                                  | —         | Graph store connection URL                                                                                                                                                                                                                                     |
    | `GRAPH_DATABASE_USERNAME` / `GRAPH_DATABASE_PASSWORD` | —         | Graph store credentials                                                                                                                                                                                                                                        |
  </Accordion>

  <Accordion title="Storage & Logging">
    | Variable                | Default          | Description                                        |
    | ----------------------- | ---------------- | -------------------------------------------------- |
    | `STORAGE_BACKEND`       | `local`          | Storage backend: `local`, `s3`                     |
    | `DATA_ROOT_DIRECTORY`   | `.data_storage`  | Root directory for data files                      |
    | `SYSTEM_ROOT_DIRECTORY` | `.cognee_system` | Root directory for system files                    |
    | `COGNEE_LOGS_DIR`       | `{package}/logs` | Override the logs directory path                   |
    | `LOG_LEVEL`             | `INFO`           | Logging level: `DEBUG`, `INFO`, `WARNING`, `ERROR` |
    | `TELEMETRY_DISABLED`    | `false`          | Set `true` to disable anonymous telemetry          |
  </Accordion>

  <Accordion title="Sessions & Caching">
    Cognee uses a cache backend to store session history (Q\&A turns) so that searches with the same `session_id` can include prior interactions as conversational context. See [Sessions and Caching](/core-concepts/sessions-and-caching) for the full guide.

    | Variable              | Default     | Description                                                                                                                                                                       |
    | --------------------- | ----------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
    | `CACHING`             | `true`      | Enable session caching. Set to `false` to run searches without conversational memory.                                                                                             |
    | `AUTO_FEEDBACK`       | `true`      | Enable automatic session-context guidance for session-capable completion searches. Set to `false` to disable the extra turn-analysis LLM call and use plain conversation history. |
    | `CACHE_BACKEND`       | `fs`        | Cache backend: `fs` (local disk via diskcache), `redis` (shared, multi-process), or `tapes` (local cache plus Tapes mirroring).                                                   |
    | `CACHE_HOST`          | `localhost` | Redis hostname (used when `CACHE_BACKEND=redis`).                                                                                                                                 |
    | `CACHE_PORT`          | `6379`      | Redis port.                                                                                                                                                                       |
    | `CACHE_USERNAME`      | —           | Optional Redis username.                                                                                                                                                          |
    | `CACHE_PASSWORD`      | —           | Optional Redis password.                                                                                                                                                          |
    | `SESSION_TTL_SECONDS` | `604800`    | Expiry for cached session entries (7 days). Set to `0` to disable expiry.                                                                                                         |

    Use `fs` for local development or single-process setups. Use `redis` for production, distributed deployments, or when multiple processes need to share session state. Use `tapes` when you want filesystem-backed sessions plus mirroring of new Q\&A turns to a running Tapes ingest service.
  </Accordion>

  <Accordion title="Debug Mode">
    To enable verbose logging in a self-hosted Cognee instance, set `LOG_LEVEL` in your `.env`:

    ```dotenv theme={null}
    LOG_LEVEL="DEBUG"
    ```

    Verbose logging covers pipeline execution, LLM calls, database queries, and graph operations—useful when troubleshooting data processing or provider configuration.
  </Accordion>
</AccordionGroup>

## Docker Environment Variables

Use the same variable names as in your `.env`; pass them with `docker run -e` or load them from a file with `--env-file`.

<AccordionGroup>
  <Accordion title="Examples">
    ```bash theme={null}
    docker run \
      -e LLM_PROVIDER=ollama \
      -e LLM_MODEL=ollama/llama3.2 \
      -e LLM_ENDPOINT=http://host.docker.internal:11434 \
      -e EMBEDDING_PROVIDER=ollama \
      -e EMBEDDING_MODEL=nomic-embed-text:latest \
      -e EMBEDDING_ENDPOINT=http://host.docker.internal:11434/api/embed \
      -e EMBEDDING_DIMENSIONS=768 \
      -e HUGGINGFACE_TOKENIZER=nomic-ai/nomic-embed-text-v1.5 \
      cognee/cognee:main
    ```

    Or using an env file:

    ```bash theme={null}
    docker run --env-file .env cognee/cognee:main
    ```
  </Accordion>
</AccordionGroup>

## Observability & Telemetry

Cognee includes built-in telemetry to help you monitor and debug your knowledge graph operations. You can control telemetry behavior with environment variables:

* **`TELEMETRY_DISABLED`** (boolean, optional): Set to `true` to disable all telemetry collection (default: `false`)

When telemetry is enabled, Cognee automatically collects:

* Search query performance metrics
* Processing pipeline execution times
* Error rates and debugging information
* System resource usage

<Info>
  Telemetry data helps improve Cognee's performance and reliability. It's collected anonymously and doesn't include your actual data content.
</Info>

## Configuration Workflow

1. Install Cognee with all optional dependencies:
   * **Local setup**: `uv sync --all-extras`
   * **Library**: `pip install "cognee[all]"`
2. Create a `.env` file in your project root (if you haven't already) — see [Installation](/getting-started/installation) for details
3. Choose your preferred providers and follow the configuration instructions from the guides below

<Warning>
  **Configuration Changes**: If you've already run Cognee with default settings and are now changing your configuration (e.g., switching from SQLite to Postgres, or changing vector stores), you should call pruning operations before the next cognification to ensure data consistency.
</Warning>

<Warning>
  **LLM/Embedding Configuration**: If you configure only LLM or only embeddings, the other defaults to OpenAI. Ensure you have a working OpenAI API key, or configure both LLM and embeddings to avoid unexpected defaults.
</Warning>

<Columns cols={3}>
  <Card title="LLM Providers" icon="brain" href="/setup-configuration/llm-providers">
    Configure OpenAI, Azure, Gemini, Anthropic, Ollama, or custom LLM providers (like vLLM)
  </Card>

  <Card title="Structured Output Backends" icon="code" href="/setup-configuration/structured-output-backends">
    Configure LiteLLM + Instructor or BAML for reliable data extraction
  </Card>

  <Card title="Embedding Providers" icon="layers" href="/setup-configuration/embedding-providers">
    Set up OpenAI, Mistral, Ollama, Fastembed, or custom embedding services
  </Card>
</Columns>

<Columns cols={3}>
  <Card title="Relational Databases" icon="database" href="/setup-configuration/relational-databases">
    Choose between SQLite for local development or Postgres for production
  </Card>

  <Card title="Vector Stores" icon="database" href="/setup-configuration/vector-stores">
    Configure LanceDB, PGVector, Qdrant, Redis, ChromaDB, FalkorDB, or Neptune Analytics
  </Card>

  <Card title="Graph Stores" icon="network" href="/setup-configuration/graph-stores">
    Set up Kuzu, Neo4j, or Neptune for knowledge graph storage
  </Card>
</Columns>
