Custom LLM Providers

Custom providers give you access to specialized models, self-hosted solutions, and emerging AI services while maintaining Cognee’s full functionality.

Overview

Custom providers enable you to use any LLM service that follows the OpenAI API format, including:

Cloud-hosted models from specialized providers
Self-hosted inference servers like vLLM
Enterprise model endpoints with custom configurations
Emerging AI services with OpenAI-compatible APIs

Configuration Methods

Method 1: Environment Variables

The simplest way to configure custom providers is through environment variables:

# Basic custom provider setup
LLM_PROVIDER="custom"
LLM_MODEL="your-model-name"
LLM_ENDPOINT="https://your-api-endpoint.com/v1"
LLM_API_KEY="your_api_key"

Method 2: Python Configuration

For more control, configure custom providers programmatically:

import os
import cognee

# Set custom provider configuration
os.environ["LLM_PROVIDER"] = "custom"
os.environ["LLM_MODEL"] = "your-model-name"
os.environ["LLM_ENDPOINT"] = "https://your-api-endpoint.com/v1"
os.environ["LLM_API_KEY"] = "your_api_key"

# Alternative: Use configuration object
config = {
    "llm": {
        "provider": "custom",
        "model": "your-model-name",
        "endpoint": "https://your-api-endpoint.com/v1",
        "api_key": "your_api_key"
    }
}

cognee.configure(config)

Supported Custom Providers

Anyscale

Anyscale provides managed inference for open-source models with enterprise-grade reliability.

LLM_PROVIDER="custom"
LLM_MODEL="anyscale/mistralai/Mixtral-8x7B-Instruct-v0.1"
LLM_ENDPOINT="https://api.endpoints.anyscale.com/v1"
LLM_API_KEY="your_api_key"

Features:

Models: Mixtral, Llama, CodeLlama, and other open-source models
Performance: Optimized inference with enterprise SLAs
Scalability: Auto-scaling based on demand
Cost: Pay-per-token pricing

vLLM

vLLM is a high-performance inference framework for large language models that you can self-host.

LLM_PROVIDER="openai"
LLM_MODEL="openai/<model name>" # Must start with openai/
LLM_ENDPOINT="https://vllm-host/v1" # Must end with /v1
LLM_API_KEY="<key>"

Features:

Performance: PagedAttention for efficient memory usage
Models: Support for Llama, Mistral, CodeLlama, and more
Deployment: Docker, Kubernetes, or standalone server
Cost: No API costs, only infrastructure costs

Important Notes:

Must use LLM_PROVIDER="openai" (not “custom”)
Model name must start with openai/
Endpoint must end with /v1

Deep Infra

Deep Infra provides access to a wide range of open-source models through a unified API.

LLM_PROVIDER="custom"
LLM_API_KEY="your_api_key"
LLM_MODEL="deepinfra/meta-llama/Meta-Llama-3-8B-Instruct"
LLM_ENDPOINT="https://api.deepinfra.com/v1/openai"

Features:

Models: Meta Llama, Mistral, CodeLlama, and many more
Pricing: Competitive pay-per-token rates
Performance: Optimized inference infrastructure
Availability: High uptime and reliability

Limitations:

Currently does not support embedding models with Cognee

OpenRouter

OpenRouter provides unified access to multiple AI models through a single API, including models from Google, Anthropic, and others.

LLM_PROVIDER="custom"
LLM_API_KEY="your_api_key"
LLM_MODEL="openrouter/google/gemini-2.0-flash-thinking-exp-1219:free"
LLM_ENDPOINT="https://openrouter.ai/api/v1"

Features:

Model Variety: Access to Google Gemini, Anthropic Claude, and many others
Unified API: Single endpoint for multiple providers
Cost Optimization: Compare prices across providers
Flexibility: Switch between models easily

Limitations:

Currently does not support embedding models with Cognee

Best Practices

Provider Selection

Production Use

Reliability First

Choose providers with high uptime guarantees
Implement fallback configurations
Monitor API response times and error rates
Use enterprise plans for critical applications

Development & Testing

Cost Optimization

Use local models (Ollama) for development
Leverage free tiers for testing
Consider pay-per-token providers for low-volume use
Use smaller models for initial development

Next Steps

LiteLLM Integration

Unified Provider InterfaceLearn how to use LiteLLM for access to 100+ providers with a single interface.

Local Models

Self-Hosted SolutionsSet up local model inference with Ollama and other local providers.

Configuration Guide

Advanced ConfigurationLearn about advanced configuration options and best practices.

Provider Migration

Switching ProvidersUnderstand how to migrate between different LLM providers.

Additional Resources

LiteLLM Documentation: Comprehensive guide to supported providers
OpenAI API Reference: Standard API format reference
[Provider-Specific Docs]: Check your chosen provider’s documentation for model names and endpoints
Cognee Discord: Community support and provider recommendations

Getting Started

Core Concepts

Examples

CLI

Python SDK

UI

Overview

Configuration Methods

Method 1: Environment Variables

Method 2: Python Configuration

Supported Custom Providers

Anyscale

vLLM

Deep Infra

OpenRouter

Best Practices

Provider Selection

Production Use

Development & Testing

Next Steps

LiteLLM Integration

Local Models

Configuration Guide

Provider Migration

Additional Resources

Getting Started

Core Concepts

Examples

CLI

Python SDK

UI

​Overview

​Configuration Methods

​Method 1: Environment Variables

​Method 2: Python Configuration

​Supported Custom Providers

​Anyscale

​vLLM

​Deep Infra

​OpenRouter

​Best Practices

​Provider Selection

Production Use

Development & Testing

​Next Steps

LiteLLM Integration

Local Models

Configuration Guide

Provider Migration

​Additional Resources

Overview

Configuration Methods

Method 1: Environment Variables

Method 2: Python Configuration

Supported Custom Providers

Anyscale

vLLM

Deep Infra

OpenRouter

Best Practices

Provider Selection

Next Steps

Additional Resources