Skip to main content
LLM (Large Language Model) providers handle text generation, reasoning, and structured output tasks in Cognee. You can choose from cloud providers like OpenAI and Anthropic, or run models locally with Ollama.
New to configuration?See the Setup Configuration Overview for the complete workflow:install extras → create .env → choose providers → handle pruning.

Supported Providers

Cognee supports multiple LLM providers:
  • OpenAI — GPT models via OpenAI API (default)
  • Azure OpenAI — GPT models via Azure OpenAI Service
  • Google Gemini — Gemini models via Google AI
  • Anthropic — Claude models via Anthropic API
  • Ollama — Local models via Ollama
  • Custom — OpenAI-compatible endpoints
LLM/Embedding Configuration: If you configure only LLM or only embeddings, the other defaults to OpenAI. Ensure you have a working OpenAI API key, or configure both LLM and embeddings to avoid unexpected defaults.

Configuration

Set these environment variables in your .env file:
  • LLM_PROVIDER — The provider to use (openai, gemini, anthropic, ollama, custom)
  • LLM_MODEL — The specific model to use
  • LLM_API_KEY — Your API key for the provider
  • LLM_ENDPOINT — Custom endpoint URL (for Azure, Ollama, or custom providers)
  • LLM_API_VERSION — API version (for Azure OpenAI)
  • LLM_MAX_TOKENS — Maximum tokens per request (optional)

Provider Setup Guides

OpenAI is the default provider and works out of the box with minimal configuration.
LLM_PROVIDER="openai"
LLM_MODEL="gpt-4o-mini"
LLM_API_KEY="sk-..."
# Optional overrides
# LLM_ENDPOINT=https://api.openai.com/v1
# LLM_API_VERSION=
# LLM_MAX_TOKENS=16384
Use Azure OpenAI Service with your own deployment.
LLM_PROVIDER="openai"
LLM_MODEL="azure/gpt-4o-mini"
LLM_ENDPOINT="https://<your-resource>.openai.azure.com/openai/deployments/gpt-4o-mini"
LLM_API_KEY="az-..."
LLM_API_VERSION="2024-12-01-preview"
Use Google’s Gemini models for text generation.
LLM_PROVIDER="gemini"
LLM_MODEL="gemini/gemini-2.0-flash"
LLM_API_KEY="AIza..."
# Optional
# LLM_ENDPOINT=https://generativelanguage.googleapis.com/
# LLM_API_VERSION=v1beta
Use Anthropic’s Claude models for reasoning tasks.
LLM_PROVIDER="anthropic"
LLM_MODEL="claude-3-5-sonnet-20241022"
LLM_API_KEY="sk-ant-..."
Run models locally with Ollama for privacy and cost control.
LLM_PROVIDER="ollama"
LLM_MODEL="llama3.1:8b"
LLM_ENDPOINT="http://localhost:11434/v1"
LLM_API_KEY="ollama"
Installation: Install Ollama from ollama.ai and pull your desired model:
ollama pull llama3.1:8b

Known Issues

  • Requires HUGGINGFACE_TOKENIZER: Ollama currently needs this env var set even when used only as LLM. Fix in progress.
  • NoDataError with mixed providers: Using Ollama as LLM and OpenAI as embedding provider may fail with NoDataError. Workaround: use the same provider for both.
Use OpenAI-compatible endpoints like OpenRouter or other services.
LLM_PROVIDER="custom"
LLM_MODEL="openrouter/google/gemini-2.0-flash-lite-preview-02-05:free"
LLM_ENDPOINT="https://openrouter.ai/api/v1"
LLM_API_KEY="or-..."
# Optional fallback chain
# FALLBACK_MODEL=
# FALLBACK_ENDPOINT=
# FALLBACK_API_KEY=

Advanced Options

Control client-side throttling for LLM calls to manage API usage and costs.Configuration (in .env):
LLM_RATE_LIMIT_ENABLED="true"
LLM_RATE_LIMIT_REQUESTS="60"
LLM_RATE_LIMIT_INTERVAL="60"
How it works:
  • Client-side limiter: Cognee paces outbound LLM calls before they reach the provider
  • Moving window: Spreads allowance across the time window for smoother throughput
  • Per-process scope: In-memory limits don’t share across multiple processes/containers
  • Auto-applied: Works with all providers (OpenAI, Gemini, Anthropic, Ollama, Custom)
Example: 60 requests per 60 seconds ≈ 1 request/second average rate.

Notes

  • If EMBEDDING_API_KEY is not set, Cognee falls back to LLM_API_KEY for embeddings
  • Rate limiting helps manage API usage and costs
  • Structured output frameworks ensure consistent data extraction from LLM responses
I