LLM Providers

LLM (Large Language Model) providers handle text generation, reasoning, and structured output tasks in Cognee. You can choose from cloud providers like OpenAI and Anthropic, or run models locally with Ollama.

New to configuration?See the Setup Configuration Overview for the complete workflow:install extras → create .env → choose providers → handle pruning.

Supported Providers

Cognee supports multiple LLM providers:

OpenAI — GPT models via OpenAI API (default)
Azure OpenAI — GPT models via Azure OpenAI Service
Google Gemini — Gemini models via Google AI
Anthropic — Claude models via Anthropic API
AWS Bedrock — Models available via AWS Bedrock
Ollama — Local models via Ollama
LM Studio — Local models via LM Studio
Custom — OpenAI-compatible endpoints

LLM/Embedding Configuration: If you configure only LLM or only embeddings, the other defaults to OpenAI. Ensure you have a working OpenAI API key, or configure both LLM and embeddings to avoid unexpected defaults.

Configuration

Environment Variables

Set these environment variables in your .env file:

LLM_PROVIDER — The provider to use (openai, gemini, anthropic, ollama, custom)
LLM_MODEL — The specific model to use
LLM_API_KEY — Your API key for the provider
LLM_ENDPOINT — Custom endpoint URL (for Azure, Ollama, or custom providers)
LLM_API_VERSION — API version (for Azure OpenAI)
LLM_MAX_TOKENS — Maximum tokens per request (optional)

Provider Setup Guides

OpenAI (Default)

OpenAI is the default provider and works out of the box with minimal configuration.

LLM_PROVIDER="openai"
LLM_MODEL="gpt-4o-mini"
LLM_API_KEY="sk-..."
# Optional overrides
# LLM_ENDPOINT=https://api.openai.com/v1
# LLM_API_VERSION=
# LLM_MAX_TOKENS=16384

Azure OpenAI

Use Azure OpenAI Service with your own deployment.

LLM_PROVIDER="openai"
LLM_MODEL="azure/gpt-4o-mini"
LLM_ENDPOINT="https://<your-resource>.openai.azure.com/openai/deployments/gpt-4o-mini"
LLM_API_KEY="az-..."
LLM_API_VERSION="2024-12-01-preview"

Google Gemini

Use Google’s Gemini models for text generation.

LLM_PROVIDER="gemini"
LLM_MODEL="gemini/gemini-2.0-flash"
LLM_API_KEY="AIza..."
# Optional
# LLM_ENDPOINT=https://generativelanguage.googleapis.com/
# LLM_API_VERSION=v1beta

Anthropic

Use Anthropic’s Claude models for reasoning tasks.

LLM_PROVIDER="anthropic"
LLM_MODEL="claude-3-5-sonnet-20241022"
LLM_API_KEY="sk-ant-..."

AWS Bedrock

Use models available on AWS Bedrock for various tasks. For Bedrock specifically, you will need to also specify some information regarding AWS.

LLM_API_KEY="<your_bedrock_api_key>"
LLM_MODEL="eu.amazon.nova-lite-v1:0"
LLM_PROVIDER="bedrock"
LLM_MAX_TOKENS="16384"
AWS_REGION="<your_aws_region>"
AWS_ACCESS_KEY_ID="<your_aws_access_key_id>"
AWS_SECRET_ACCESS_KEY="<your_aws_secret_access_key>"
AWS_SESSION_TOKEN="<your_aws_session_token>"

# Optional parameters
#AWS_BEDROCK_RUNTIME_ENDPOINT="bedrock-runtime.eu-west-1.amazonaws.com"
#AWS_PROFILE_NAME="<path_to_your_aws_credentials_file>"

There are multiple ways of connecting to Bedrock models:

Using an API key and region. Simply generate you key on AWS, and put it in the LLM_API_KEY env variable.
Using AWS Credentials. You can only specify AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY, no need for the LLM_API_KEY. In this case, if you are using temporary credentials (e.g. AWS_ACCESS_KEY_ID starting with ASIA...), then you also must specify the AWS_SESSION_TOKEN.
Using AWS profiles. Create a file called something like /.aws/credentials, and store your credentials inside it.

Installation: Install the required dependency:

pip install cognee[aws]

Model Name The name of the model might differ based on the region (the name begins with eu for Europe, us of USA, etc.)

Ollama (Local)

Run models locally with Ollama for privacy and cost control.

LLM_PROVIDER="ollama"
LLM_MODEL="llama3.1:8b"
LLM_ENDPOINT="http://localhost:11434/v1"
LLM_API_KEY="ollama"

Installation: Install Ollama from ollama.ai and pull your desired model:

ollama pull llama3.1:8b

Known Issues

Requires HUGGINGFACE_TOKENIZER: Ollama currently needs this env var set even when used only as LLM. Fix in progress.
NoDataError with mixed providers: Using Ollama as LLM and OpenAI as embedding provider may fail with NoDataError. Workaround: use the same provider for both.

LM Studio (Local)

Run models locally with LM Studio for privacy and cost control.

LLM_PROVIDER="custom"
LLM_MODEL="lm_studio/magistral-small-2509"
LLM_ENDPOINT="http://127.0.0.1:1234/v1"
LLM_API_KEY="."
LLM_INSTRUCTOR_MODE="json_schema_mode"

Installation: Install LM Studio from lmstudio.ai and download your desired model from LM Studio’s interface. Load your model, start the LM Studio server, and Cognee will be able to connect to it.

Set up instructor mode The LLM_INSTRUCTOR_MODE env variable controls the LiteLLM instructor mode, i.e. the model’s response type. This may vary depending on the model, and you would need to change it accordingly.

Custom Providers

Use OpenAI-compatible endpoints like OpenRouter or other services.

LLM_PROVIDER="custom"
LLM_MODEL="openrouter/google/gemini-2.0-flash-lite-preview-02-05:free"
LLM_ENDPOINT="https://openrouter.ai/api/v1"
LLM_API_KEY="or-..."
# Optional fallback chain
# FALLBACK_MODEL=
# FALLBACK_ENDPOINT=
# FALLBACK_API_KEY=

Advanced Options

Rate Limiting

Control client-side throttling for LLM calls to manage API usage and costs.Configuration (in .env):

LLM_RATE_LIMIT_ENABLED="true"
LLM_RATE_LIMIT_REQUESTS="60"
LLM_RATE_LIMIT_INTERVAL="60"

How it works:

Client-side limiter: Cognee paces outbound LLM calls before they reach the provider
Moving window: Spreads allowance across the time window for smoother throughput
Per-process scope: In-memory limits don’t share across multiple processes/containers
Auto-applied: Works with all providers (OpenAI, Gemini, Anthropic, Ollama, Custom)

Example: 60 requests per 60 seconds ≈ 1 request/second average rate.

Notes

If EMBEDDING_API_KEY is not set, Cognee falls back to LLM_API_KEY for embeddings
Rate limiting helps manage API usage and costs
Structured output frameworks ensure consistent data extraction from LLM responses
If you are using Instructor as the structured output framework, you can control the response type of the LLM through the LLM_INSTRUCTOR_MODE env variable, which sets the corresponding instructor mode (e.g. json_mode for JSON output)

Embedding Providers

Configure embedding providers for semantic search

Overview

Return to setup configuration overview

Relational Databases

Set up SQLite or Postgres for metadata storage

Getting Started

Core Concepts

Setup Configuration

Guides

Examples

CLI

Supported Providers

Configuration

Provider Setup Guides

Known Issues

Advanced Options

Notes

Embedding Providers

Overview

Relational Databases

Getting Started

Core Concepts

Setup Configuration

Guides

Examples

CLI

​Supported Providers

​Configuration

​Provider Setup Guides

​Known Issues

​Advanced Options

​Notes

Embedding Providers

Overview

Relational Databases

Supported Providers

Configuration

Provider Setup Guides

Known Issues

Advanced Options

Notes