New to configuration?See the Setup Configuration Overview for the complete workflow:install extras → create
.env → choose providers → handle pruning.Supported Providers
Cognee supports multiple LLM providers:- OpenAI — GPT models via OpenAI API (default)
- Azure OpenAI — GPT models via Azure OpenAI Service
- Google Gemini — Gemini models via Google AI
- Anthropic — Claude models via Anthropic API
- AWS Bedrock — Models available via AWS Bedrock
- Ollama — Local models via Ollama
- LM Studio — Local models via LM Studio
- Custom — OpenAI-compatible endpoints (like vLLM)
Configuration
Environment Variables
Environment Variables
Set these environment variables in your
.env file:LLM_PROVIDER— The provider to use (openai, gemini, anthropic, ollama, custom)LLM_MODEL— The specific model to useLLM_API_KEY— Your API key for the providerLLM_ENDPOINT— Custom endpoint URL (for Azure, Ollama, or custom providers)LLM_API_VERSION— API version (for Azure OpenAI)LLM_MAX_TOKENS— Maximum tokens per request (optional)
Why do model names have a prefix like
gemini/ or openrouter/?Cognee routes all LLM requests through LiteLLM, which uses provider prefixes to identify the correct API endpoint. For example, Google lists their model as gemini-2.0-flash, but in Cognee you must write gemini/gemini-2.0-flash. This prefix tells LiteLLM to use the Gemini API. The same applies to custom providers — openrouter/, hosted_vllm/, lm_studio/, etc. See each provider section below for the correct format.Provider Setup Guides
OpenAI (Default)
OpenAI (Default)
OpenAI is the default provider and works out of the box with minimal configuration.
Azure OpenAI
Azure OpenAI
Use Azure OpenAI Service with your own deployment.
Google Gemini
Google Gemini
Use Google’s Gemini models for text generation.
Anthropic
Anthropic
Use Anthropic’s Claude models for reasoning tasks.
AWS Bedrock
AWS Bedrock
Use models available on AWS Bedrock for various tasks. For Bedrock specifically, you will need to
also specify some information regarding AWS.There are multiple ways of connecting to Bedrock models:
- Using an API key and region. Simply generate you key on AWS, and put it in the
LLM_API_KEYenv variable. - Using AWS Credentials. You can only specify
AWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEY, no need for theLLM_API_KEY. In this case, if you are using temporary credentials (e.g.AWS_ACCESS_KEY_IDstarting withASIA...), then you also must specify theAWS_SESSION_TOKEN. - Using AWS profiles. Create a file called something like
/.aws/credentials, and store your credentials inside it.
Model Name
The name of the model might differ based on the region (the name begins with eu for Europe, us of USA, etc.)
Ollama (Local)
Ollama (Local)
Run models locally with Ollama for privacy and cost control.
LLM_API_KEY="ollama" is a placeholder required by the client library — Ollama itself does not validate it.Installation: Install Ollama from ollama.ai and pull your desired model:Zero-API-key setup: To avoid falling back to OpenAI for embeddings, you must also configure the embedding provider to use a local backend. See the Local Setup (No API Key) section of the configuration overview for a complete
.env example using Ollama or Fastembed for both LLM and embeddings.Known Issues
NoDataErrorwith mixed providers: Using Ollama as LLM and OpenAI as embedding provider may fail withNoDataError. Workaround: configure both LLM and embeddings to the same local provider (see the local setup guide above).
LM Studio (Local)
LM Studio (Local)
Run models locally with LM Studio for privacy and cost control.Installation: Install LM Studio from lmstudio.ai and download your desired model from
LM Studio’s interface.
Load your model, start the LM Studio server, and Cognee will be able to connect to it.
Set up instructor mode
The
LLM_INSTRUCTOR_MODE env variable controls the LiteLLM instructor mode,
i.e. the model’s response type.
This may vary depending on the model, and you would need to change it accordingly.Custom Providers
Custom Providers
Use OpenAI-compatible endpoints like OpenRouter or other services.See Fallback Provider in Advanced Options for full details.Custom Provider Prefixes: When using
LLM_PROVIDER="custom", you must include the correct provider prefix in your model name. Cognee forwards requests to LiteLLM, which uses these prefixes to route requests correctly.Common prefixes include:hosted_vllm/— vLLM serversopenrouter/— OpenRouterlm_studio/— LM Studioopenai/— OpenAI-compatible APIs
vLLM
vLLM
Use vLLM for high-performance model serving with OpenAI-compatible API.Example with Gemma:To find the correct model name, see their documentation.
Advanced Options
Rate Limiting
Rate Limiting
Control client-side throttling for LLM calls to manage API usage and costs.Configuration (in .env):How it works:
- Client-side limiter: Cognee paces outbound LLM calls before they reach the provider
- Moving window: Spreads allowance across the time window for smoother throughput
- Per-process scope: In-memory limits don’t share across multiple processes/containers
- Auto-applied: Works with all providers (OpenAI, Gemini, Anthropic, Ollama, Custom)
60 requests per 60 seconds ≈ 1 request/second average rate.Fallback Provider
Fallback Provider
Cognee supports a primary-plus-fallback model configuration that automatically retries a failed request against a secondary provider. This is useful when your primary provider may reject certain content, and you want a fallback to handle those cases gracefully.When the fallback triggersThe fallback is invoked only on content policy violations from the primary provider:For
ContentFilterFinishReasonError— the provider’s output filter blocked the responseContentPolicyViolationError— the request was rejected for policy reasonsInstructorRetryExceptioncontaining “content management policy”
LLM_PROVIDER is set to openai or custom. Other providers (Anthropic, Gemini, Mistral, Bedrock, Ollama) do not currently support the fallback chain.ConfigurationSet these three variables alongside your primary LLM configuration:LLM_PROVIDER="custom", all three fallback variables (FALLBACK_MODEL, FALLBACK_ENDPOINT, FALLBACK_API_KEY) must be set. If any is missing, Cognee raises a ContentPolicyFilterError instead of falling back.For LLM_PROVIDER="openai", only FALLBACK_MODEL and FALLBACK_API_KEY are required. FALLBACK_ENDPOINT is accepted but currently unused for the OpenAI adapter.Variable reference| Variable | Description |
|---|---|
FALLBACK_MODEL | Model identifier for the fallback provider (use LiteLLM prefix format, e.g. openrouter/openai/gpt-4o-mini) |
FALLBACK_ENDPOINT | Base URL for the fallback provider’s API (required for custom, optional for openai) |
FALLBACK_API_KEY | API key for the fallback provider |
Notes
- If
EMBEDDING_API_KEYis not set, Cognee falls back toLLM_API_KEYfor embeddings - Rate limiting helps manage API usage and costs
- Structured output frameworks ensure consistent data extraction from LLM responses
- If you are using
Instructoras the structured output framework, you can control the response type of the LLM through theLLM_INSTRUCTOR_MODEenv variable, which sets the corresponding instructor mode (e.g.json_modefor JSON output)
Embedding Providers
Configure embedding providers for semantic search
Overview
Return to setup configuration overview
Relational Databases
Set up SQLite or Postgres for metadata storage