Configuration

While default options come directly with pip install cognee, dependencies for other databases should be installed from the optional-dependencies. For example, if you choose to run cognee with neo4j, pgvector, and postgres, this would look like pip install cognee[neo4j,postgres]. You can find all the optional dependencies here. After you make sure that all necessary dependencies are installed, you can configure your database variables.

🚀 Configure Vector and Graph Stores

You can configure the vector and graph stores using the environment variables in your .env file or programmatically. We use Pydantic Settings We have a global configuration object (cognee.config) and individual configurations on pipeline and data store levels Check available configuration options:
from cognee.infrastructure.databases.vector import get_vectordb_config
from cognee.infrastructure.databases.graph.config import get_graph_config
from cognee.infrastructure.databases.relational import get_relational_config
from cognee.infrastructure.llm.config import get_llm_config
print(get_vectordb_config().to_dict())
print(get_graph_config().to_dict())
print(get_relational_config().to_dict())
print(get_llm_config().to_dict())
Setting the environment variables in your .env file, and Pydantic will pick them up:
GRAPH_DATABASE_PROVIDER = 'lancedb'
Otherwise, you can set the configuration yourself:
cognee.config.set_llm_provider('ollama')
Make sure to keep your API keys secure and never commit them to version control.

Settings for database engines

Relational Engines

The default SQLite does not need much (it can be omitted as well):
DB_PROVIDER=sqlite
Postgres instance requires a bit more. Add the following to your .env file:
DB_PROVIDER=postgres
DB_HOST=127.0.0.1
DB_PORT=5432
DB_USERNAME=cognee
DB_PASSWORD=cognee
DB_NAME=cognee_db

Vector Engines

LanceDB

LanceDB is the default vector database, it is a file-based vector store. To use it, add the following to your .env file (it can be omitted as well):
VECTOR_DB_PROVIDER="lancedb"

PGVector

If you are already using Postgres, PGVector is a natural choice.
VECTOR_DB_PROVIDER="pgvector"
If you’re using an AWS hosted Postgres you need to run CREATE EXTENSION vector; before first use. See official AWS docs

Qdrant

VECTOR_DB_PROVIDER="qdrant"
VECTOR_DB_URL=https://url-to-your-qdrant-cloud-instance.cloud.qdrant.io:6333
VECTOR_DB_KEY=your-qdrant-api-key

Weaviate

VECTOR_DB_PROVIDER="weaviate"
VECTOR_DB_URL=https://url-to-your-weaviate-cloud-instance.weaviate.cloud
VECTOR_DB_KEY=your-weaviate-api-key

Embedding Engines

Embedding engine is used to embed data before it is stored in a vector database. Cognee supports different embedding providers by implementing adapters for them. To configure it, add the following to your .env file: Example with Mistral:
EMBEDDING_PROVIDER=mistral
EMBEDDING_MODEL=mistral/mistral-embed
EMBEDDING_API_KEY=mistral-api-key
EMBEDDING_MAX_TOKENS=8000
EMBEDDING_DIMENSIONS=1024
Example with Azure OpenAI Embeddings:
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=azure/text-embedding-3-large
EMBEDDING_ENDPOINT=https://azure-project.openai.azure.com/openai/deployments/text-embedding-3-large
EMBEDDING_API_KEY=azure-openai-api-key
EMBEDDING_API_VERSION=2024-12-01-preview
Example with Fastembed: Fastembed is used for the codegraph pipeline, because OpenAI models usually reach rate limit when ingesting the codebase.
EMBEDDING_PROVIDER=fastembed
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
EMBEDDING_DIMENSIONS=384
EMBEDDING_MAX_TOKENS=256

Graph Engines

Kuzu

KuzuDB is the default database we use. For Kuzu graph database, add the following to your .env file:
GRAPH_DATABASE_PROVIDER="kuzu"
Its persistance is slightly different. It will write files to the .cognee_system/databases/cognee_graph.pkl directory.

Neo4j

Neo4J needs a bit more setup, a default for a locally hosted Neo4j instance would look like this:
GRAPH_DATABASE_PROVIDER="neo4j"
GRAPH_DATABASE_URL=bolt://localhost:7687
GRAPH_DATABASE_USERNAME=neo4j
GRAPH_DATABASE_PASSWORD=pleaseletmein
If you installed your Neo4J manually, don’t forget to install apoc and graph-data-science plugins.
neo4j-local For the cloud instance, add the following to your .env file:
GRAPH_DATABASE_PROVIDER=neo4j
GRAPH_DATABASE_USERNAME=neo4j
GRAPH_DATABASE_URL=neo4j+ssc://12345678.databases.neo4j.io
GRAPH_DATABASE_PASSWORD=neo4j-api-key

NetworkX

NetworkX is a file-based graph storage. For NetworkX graph database, add the following to your .env file:
GRAPH_DATABASE_PROVIDER="networkx"
It will persist the data in .cognee_system/databases/cognee_graph.pkl JSON file.

Hybrid Engines

FalkorDB (Graph + Vector)

FalkorDB can be used as both the vector store and the graph database. Add the following to your .env file:
VECTOR_DB_PROVIDER=falkordb
VECTOR_DB_URL=localhost
VECTOR_DB_PORT=6379
GRAPH_DATABASE_PROVIDER=falkordb
GRAPH_DATABASE_URL=localhost
GRAPH_DATABASE_PORT=6379

Amazon Neptune Analytics (Graph + Vector)

Amazon Neptune Analytics can be used as a hybrid backend providing both graph storage/traversals and vector search. Add the following to your .env file:
GRAPH_DATABASE_PROVIDER="neptune_analytics"
GRAPH_DATABASE_URL=neptune-graph://g-your-graph
VECTOR_DB_PROVIDER="neptune_analytics"
VECTOR_DB_URL=neptune-graph://g-your-graph

AWS_DEFAULT_REGION=us-east-1
AWS_ACCESS_KEY_ID=your-access-key
AWS_SECRET_ACCESS_KEY=your-secret-key
AWS_SESSION_TOKEN=your-session-token

Ensure that the Neptune Analytics vector dimension matches your embedding model (for example, 1536 for OpenAI text-embedding-3-small).