New to configuration?See the Setup Configuration Overview for the complete workflow:install extras → create
.env → choose providers → handle pruning.Supported Providers
Cognee supports multiple LLM providers:- OpenAI — GPT models via OpenAI API (default)
- Azure OpenAI — GPT models via Azure OpenAI Service
- Google Gemini — Gemini models via Google AI
- Anthropic — Claude models via Anthropic API
- Ollama — Local models via Ollama
- Custom — OpenAI-compatible endpoints
LLM/Embedding Configuration: If you configure only LLM or only embeddings, the other defaults to OpenAI. Ensure you have a working OpenAI API key, or configure both LLM and embeddings to avoid unexpected defaults.
Configuration
Environment Variables
Environment Variables
Set these environment variables in your
.env file:LLM_PROVIDER— The provider to use (openai, gemini, anthropic, ollama, custom)LLM_MODEL— The specific model to useLLM_API_KEY— Your API key for the providerLLM_ENDPOINT— Custom endpoint URL (for Azure, Ollama, or custom providers)LLM_API_VERSION— API version (for Azure OpenAI)LLM_MAX_TOKENS— Maximum tokens per request (optional)
Provider Setup Guides
OpenAI (Default)
OpenAI (Default)
OpenAI is the default provider and works out of the box with minimal configuration.
Azure OpenAI
Azure OpenAI
Use Azure OpenAI Service with your own deployment.
Google Gemini
Google Gemini
Use Google’s Gemini models for text generation.
Anthropic
Anthropic
Use Anthropic’s Claude models for reasoning tasks.
Ollama (Local)
Ollama (Local)
Run models locally with Ollama for privacy and cost control.Installation: Install Ollama from ollama.ai and pull your desired model:
Known Issues
- Requires
HUGGINGFACE_TOKENIZER: Ollama currently needs this env var set even when used only as LLM. Fix in progress. NoDataErrorwith mixed providers: Using Ollama as LLM and OpenAI as embedding provider may fail withNoDataError. Workaround: use the same provider for both.
Custom Providers
Custom Providers
Use OpenAI-compatible endpoints like OpenRouter or other services.
Advanced Options
Rate Limiting
Rate Limiting
Control client-side throttling for LLM calls to manage API usage and costs.Configuration (in .env):How it works:
- Client-side limiter: Cognee paces outbound LLM calls before they reach the provider
- Moving window: Spreads allowance across the time window for smoother throughput
- Per-process scope: In-memory limits don’t share across multiple processes/containers
- Auto-applied: Works with all providers (OpenAI, Gemini, Anthropic, Ollama, Custom)
60 requests per 60 seconds ≈ 1 request/second average rate.Notes
- If
EMBEDDING_API_KEYis not set, Cognee falls back toLLM_API_KEYfor embeddings - Rate limiting helps manage API usage and costs
- Structured output frameworks ensure consistent data extraction from LLM responses