Custom providers give you access to specialized models, self-hosted solutions, and emerging AI services while maintaining Cognee’s full functionality.

Overview

Custom providers enable you to use any LLM service that follows the OpenAI API format, including:
  • Cloud-hosted models from specialized providers
  • Self-hosted inference servers like vLLM
  • Enterprise model endpoints with custom configurations
  • Emerging AI services with OpenAI-compatible APIs

Configuration Methods

Method 1: Environment Variables

The simplest way to configure custom providers is through environment variables:
# Basic custom provider setup
LLM_PROVIDER="custom"
LLM_MODEL="your-model-name"
LLM_ENDPOINT="https://your-api-endpoint.com/v1"
LLM_API_KEY="your_api_key"

Method 2: Python Configuration

For more control, configure custom providers programmatically:
import os
import cognee

# Set custom provider configuration
os.environ["LLM_PROVIDER"] = "custom"
os.environ["LLM_MODEL"] = "your-model-name"
os.environ["LLM_ENDPOINT"] = "https://your-api-endpoint.com/v1"
os.environ["LLM_API_KEY"] = "your_api_key"

# Alternative: Use configuration object
config = {
    "llm": {
        "provider": "custom",
        "model": "your-model-name",
        "endpoint": "https://your-api-endpoint.com/v1",
        "api_key": "your_api_key"
    }
}

cognee.configure(config)

Supported Custom Providers

Anyscale

Anyscale provides managed inference for open-source models with enterprise-grade reliability.
LLM_PROVIDER="custom"
LLM_MODEL="anyscale/mistralai/Mixtral-8x7B-Instruct-v0.1"
LLM_ENDPOINT="https://api.endpoints.anyscale.com/v1"
LLM_API_KEY="your_api_key"
Features:
  • Models: Mixtral, Llama, CodeLlama, and other open-source models
  • Performance: Optimized inference with enterprise SLAs
  • Scalability: Auto-scaling based on demand
  • Cost: Pay-per-token pricing

vLLM

vLLM is a high-performance inference framework for large language models that you can self-host.
LLM_PROVIDER="openai"
LLM_MODEL="openai/<model name>" # Must start with openai/
LLM_ENDPOINT="https://vllm-host/v1" # Must end with /v1
LLM_API_KEY="<key>"
Features:
  • Performance: PagedAttention for efficient memory usage
  • Models: Support for Llama, Mistral, CodeLlama, and more
  • Deployment: Docker, Kubernetes, or standalone server
  • Cost: No API costs, only infrastructure costs
Important Notes:
  • Must use LLM_PROVIDER="openai" (not “custom”)
  • Model name must start with openai/
  • Endpoint must end with /v1

Deep Infra

Deep Infra provides access to a wide range of open-source models through a unified API.
LLM_PROVIDER="custom"
LLM_API_KEY="your_api_key"
LLM_MODEL="deepinfra/meta-llama/Meta-Llama-3-8B-Instruct"
LLM_ENDPOINT="https://api.deepinfra.com/v1/openai"
Features:
  • Models: Meta Llama, Mistral, CodeLlama, and many more
  • Pricing: Competitive pay-per-token rates
  • Performance: Optimized inference infrastructure
  • Availability: High uptime and reliability
Limitations:
  • Currently does not support embedding models with Cognee

OpenRouter

OpenRouter provides unified access to multiple AI models through a single API, including models from Google, Anthropic, and others.
LLM_PROVIDER="custom"
LLM_API_KEY="your_api_key"
LLM_MODEL="openrouter/google/gemini-2.0-flash-thinking-exp-1219:free"
LLM_ENDPOINT="https://openrouter.ai/api/v1"
Features:
  • Model Variety: Access to Google Gemini, Anthropic Claude, and many others
  • Unified API: Single endpoint for multiple providers
  • Cost Optimization: Compare prices across providers
  • Flexibility: Switch between models easily
Limitations:
  • Currently does not support embedding models with Cognee

Best Practices

Provider Selection

Production Use

Reliability First
  • Choose providers with high uptime guarantees
  • Implement fallback configurations
  • Monitor API response times and error rates
  • Use enterprise plans for critical applications

Development & Testing

Cost Optimization
  • Use local models (Ollama) for development
  • Leverage free tiers for testing
  • Consider pay-per-token providers for low-volume use
  • Use smaller models for initial development

Next Steps

Additional Resources

  • LiteLLM Documentation: Comprehensive guide to supported providers
  • OpenAI API Reference: Standard API format reference
  • [Provider-Specific Docs]: Check your chosen provider’s documentation for model names and endpoints
  • Cognee Discord: Community support and provider recommendations