Python 3.9 – 3.12 is required to run Cognee.
Setup Notes
Environment Configuration
Environment Configuration
- We recommend creating a
.envfile in your project root - Cognee supports many configuration options, and a
.envfile keeps them organized
API Keys & Models
API Keys & Models
You have two main options for configuring LLM and embedding providers:Option 1: OpenAI (Simplest)
- Single API key handles both LLM and embeddings
- Uses gpt-4o-mini for LLM and text-embedding-3-small for embeddings by default
- Works out of the box with minimal configuration
- Configure both LLM and embedding providers separately
- Supports Gemini, Anthropic, Ollama, and more
- Requires setting both
LLM_*andEMBEDDING_*variables
By default, Cognee uses OpenAI for both LLMs and embeddings. If you change the LLM provider but don’t configure embeddings, it will still default to OpenAI.
Virtual Environment
Virtual Environment
- We recommend using uv for virtual environment management
- Run the following commands to create and activate a virtual environment:
Windows Setup
Windows Setup
On Windows the setup steps differ slightly from Linux/macOS.Virtual environment activationUse PowerShell or Command Prompt instead of If you see an execution-policy error, run this first (current user only):Creating the The A Setting env vars without a Line endingsPython-dotenv handles both Windows (CRLF) and Unix (LF) line endings automatically, so line endings are not a concern.
source:- PowerShell
- Command Prompt (CMD)
.env fileCopy the template from the project root, then open it in any text editor (Notepad, VS Code, etc.):- PowerShell
- Command Prompt (CMD)
.env file must be saved in the project root — the same directory from which you run Python. Cognee calls load_dotenv() at import time and searches upward from the working directory.Path values on WindowsWhen setting DATA_ROOT_DIRECTORY or SYSTEM_ROOT_DIRECTORY in your .env file, use forward slashes or double backslashes — single backslashes are not valid in .env values:~ home-directory prefix also works and is cross-platform:.env file (optional)If you prefer to set variables directly in your shell session instead of using a file:- PowerShell
- Command Prompt (CMD)
Optional
Optional
Database
Database
- PostgreSQL database is required if you plan to use PostgreSQL as your relational database (requires
postgresextra)
Setup
- OpenAI (Recommended)
- Other Providers (Gemini, Anthropic, etc.)
Environment: Add your OpenAI API key to your Installation: Install Cognee with the default package:What this gives you: Cognee installed with default local databases (SQLite, LanceDB, Kuzu) — no external servers required.
.env file:This single API key handles both LLM and embeddings. We use gpt-4o-mini for the LLM model and text-embedding-3-small for embeddings by default.
Extras and Common Installation Combinations
Cognee’s base installation (pip install cognee) includes everything needed to run with OpenAI and the default local databases (SQLite, LanceDB, Kuzu). Optional extras unlock additional providers, integrations, and features.
Install one or more extras with:
Common installation combinations
Common installation combinations
If you already know the stack you want, these combinations cover the most common setups:
| Use case | Install |
|---|---|
| PostgreSQL as the database backend | uv pip install "cognee[postgres]" |
| Neo4j graph store + AWS S3 storage | uv pip install "cognee[neo4j,aws]" |
| Distributed execution on Modal | uv pip install "cognee[distributed]" |
| Code graph analysis | uv pip install "cognee[codegraph]" |
| Full monitoring (Sentry + Langfuse + OpenTelemetry) | uv pip install "cognee[monitoring]" |
| Web scraping + extended document formats | uv pip install "cognee[scraping,docs]" |
| BAML structured output backend | uv pip install "cognee[baml]" |
| Anthropic Claude models | uv pip install "cognee[anthropic]" |
LLM & Embedding Providers
LLM & Embedding Providers
These extras install provider SDKs. You still need to set the corresponding environment variables. See LLM Providers and Embedding Providers.
| Extra | Packages installed | When to use |
|---|---|---|
anthropic | anthropic>=0.27 | Use Claude models (claude-3-5-sonnet, etc.) |
groq | groq>=0.8.0,<1.0.0 | Use Groq-hosted inference |
mistral | mistral-common, mistralai | Use Mistral AI models |
huggingface | transformers>=4.46.3,<5 | Use HuggingFace models for LLM or embeddings |
ollama | transformers>=4.46.3,<5 | Use Ollama for local model serving |
llama-cpp | llama-cpp-python[server]>=0.3.0 | Run GGUF models locally via llama.cpp |
azure | azure-identity>=1.15.0,<2 | Azure OpenAI or other Azure-hosted models |
fastembed | fastembed<=0.6.0, onnxruntime | Fast local embeddings without a GPU |
There is no separate
gemini extra. Gemini is supported through litellm, which is already part of the base installation.Vector & Graph Stores
Vector & Graph Stores
| Extra | Packages installed | When to use |
|---|---|---|
postgres | psycopg2, pgvector, asyncpg | Use PostgreSQL as relational DB and pgvector as vector store |
postgres-binary | psycopg2-binary, pgvector, asyncpg | Same as postgres but uses pre-compiled binary wheels |
neo4j | neo4j>=5.28.0,<6 | Use Neo4j as the graph store |
neptune | langchain_aws>=0.2.22 | Use Amazon Neptune as the graph store |
chromadb | chromadb>=0.6,<0.7, pypika | Use ChromaDB as the vector store |
graphiti | graphiti-core>=0.7.0,<0.8 | Use Graphiti for temporal knowledge graphs |
Data Ingestion & Processing
Data Ingestion & Processing
| Extra | Packages installed | When to use |
|---|---|---|
docs | unstructured (with csv, doc, docx, epub, md, ppt, pptx, xlsx, pdf, and more), lxml | Parse Office documents, PDFs via unstructured, and other rich formats beyond the built-in PyPDF support |
docling | docling>=2.54, transformers>=4.55 | Use Docling for advanced document parsing |
scraping | tavily-python, beautifulsoup4, playwright, lxml, protego, APScheduler | Web scraping, URL ingestion, and scheduled crawling |
codegraph | fastembed, transformers, tree-sitter, tree-sitter-python | Build code graphs from Python repositories |
langchain | langsmith, langchain_text_splitters, langchain-core | Use LangChain text splitters or LangSmith tracing |
llama-index | llama-index-core>=0.13.0,<0.14 | Use LlamaIndex data loaders and connectors |
dlt | dlt[sqlalchemy]>=1.9.0,<2 | Ingest data via DLT pipelines |
Infrastructure & Storage
Infrastructure & Storage
| Extra | Packages installed | When to use |
|---|---|---|
distributed | modal>=1.0.5,<2.0.0 | Run cognee pipelines on Modal for distributed/serverless execution |
redis | redis>=5.0.3,<6.0.0 | Use Redis for caching instead of the default in-memory/disk cache |
aws | s3fs[boto3]==2025.3.2 | Use Amazon S3 for file storage |
baml | baml-py==0.206.0 | Use BAML as a structured output backend |
Observability & Monitoring
Observability & Monitoring
| Extra | Packages installed | When to use |
|---|---|---|
tracing | opentelemetry-api, opentelemetry-sdk, OTLP exporters (gRPC + HTTP) | Export traces via OpenTelemetry to any compatible backend |
monitoring | Everything in tracing plus sentry-sdk[fastapi], langfuse | Full monitoring stack: Sentry for errors, Langfuse for LLM observability, OpenTelemetry for traces |
posthog | posthog>=3.5.0,<4 | Send usage analytics to PostHog |
Evaluation
Evaluation
| Extra | Packages installed | When to use |
|---|---|---|
deepeval | deepeval>=3.0.1,<4 | Run LLM evaluation benchmarks with DeepEval |
evals | plotly, gdown, pandas, matplotlib, scikit-learn | Internal evaluation tooling with plotting and metrics |
Development & Tooling
Development & Tooling
| Extra | Packages installed | When to use |
|---|---|---|
notebook | notebook>=7.1.0,<8 | Run Jupyter notebooks |
dev | pytest, mypy, ruff, pre-commit, mkdocs, and more | Full development environment for contributing to cognee |
debug | debugpy>=1.8.9,<2.0.0 | Attach a remote debugger (e.g. VS Code) to a running cognee process |
Missing dependency errors (ImportError)
Missing dependency errors (ImportError)
If you encounter an
ImportError when using a cognee feature, it usually means a required extra has not been installed.| ImportError mentions | Install |
|---|---|
neo4j | cognee[neo4j] |
modal | cognee[distributed] |
playwright, tavily, beautifulsoup4 | cognee[scraping] |
unstructured | cognee[docs] |
docling | cognee[docling] |
fastembed | cognee[fastembed] or cognee[codegraph] |
tree_sitter | cognee[codegraph] |
psycopg2, asyncpg, pgvector | cognee[postgres] or cognee[postgres-binary] |
redis | cognee[redis] |
s3fs, boto3 | cognee[aws] |
baml | cognee[baml] |
anthropic | cognee[anthropic] |
groq | cognee[groq] |
mistralai | cognee[mistral] |
llama_cpp | cognee[llama-cpp] |
opentelemetry | cognee[tracing] or cognee[monitoring] |
sentry_sdk, langfuse | cognee[monitoring] |
graphiti | cognee[graphiti] |
chromadb | cognee[chromadb] |
deepeval | cognee[deepeval] |
dlt | cognee[dlt] |
Next Steps
Run Your First Example
Quickstart TutorialGet started with Cognee by running your first knowledge graph example.
Explore Advanced Features
Core ConceptsDive deeper into Cognee’s powerful features and capabilities.