LiteLLM Integration

Cognee integrates with LiteLLM to provide a unified interface for interacting with multiple Large Language Model (LLM) providers. This integration enables seamless switching between different LLM providers while maintaining consistent functionality across the platform.
LiteLLM integration in Cognee combines the Instructor library with LiteLLM for type-safe, validated responses from 100+ LLM providers with built-in rate limiting and retry mechanisms.

Key Features

100+ Providers

Universal AccessAccess OpenAI, Anthropic, Google, Ollama, and 100+ other LLM providers through a single interface.

Type-Safe Outputs

Structured ResponsesGenerate validated, type-safe responses using Pydantic models with Instructor integration.

Rate Limiting

Built-in ProtectionAutomatic rate limiting with exponential backoff and retry mechanisms to prevent API quota exhaustion.

Fallback Support

High ReliabilityAutomatic fallback to backup configurations for production reliability.

Architecture

The LiteLLM integration is built around a structured output framework that provides robust LLM operations:
1

LLMGateway

Central InterfaceCentral interface for all LLM operations with provider abstraction.
2

Provider Adapters

Specific ImplementationsDedicated adapters for each LLM provider with optimized configurations.
3

Rate Limiting

Request ManagementBuilt-in rate limiting and retry mechanisms with exponential backoff.
4

Embedding Engine

Vector GenerationLiteLLM-powered embedding functionality for semantic search.

Supported Providers

Cognee supports the following LLM providers through LiteLLM:

Provider Overview

Configuration

Environment Variables

# Core LLM Configuration
LLM_PROVIDER=openai          # Provider: openai, anthropic, gemini, ollama, custom
LLM_MODEL=gpt-4o-mini        # Model name
LLM_API_KEY=your_api_key     # API key for the provider
LLM_ENDPOINT=                # Custom endpoint (optional)
LLM_API_VERSION=             # API version (optional)
LLM_TEMPERATURE=0.0          # Temperature setting
LLM_MAX_TOKENS=16384         # Maximum tokens

Programmatic Configuration

Get Configuration
from cognee.infrastructure.llm import get_llm_config

# Get current configuration
config = get_llm_config()

# Access configuration values
print(f"Provider: {config.llm_provider}")
print(f"Model: {config.llm_model}")
print(f"Max Tokens: {config.llm_max_tokens}")
print(f"Temperature: {config.llm_temperature}")
print(f"Rate Limiting: {config.llm_rate_limit_enabled}")

Core Features

1. Structured Output Generation

Generate type-safe, validated responses using Pydantic models:
from cognee.infrastructure.llm.LLMGateway import LLMGateway
from pydantic import BaseModel

class EntityModel(BaseModel):
    name: str
    type: str
    confidence: float

# Generate structured output
result = await LLMGateway.acreate_structured_output(
    text_input="Extract entities from this text: Apple Inc. was founded by Steve Jobs.",
    system_prompt="You are an entity extraction expert. Extract named entities with confidence scores.",
    response_model=EntityModel
)

print(f"Entity: {result.name}")
print(f"Type: {result.type}")
print(f"Confidence: {result.confidence}")

2. Rate Limiting & Retry Logic

Built-in protection against API limits:
Configure Rate Limits
# Enable rate limiting
os.environ["LLM_RATE_LIMIT_ENABLED"] = "true"
os.environ["LLM_RATE_LIMIT_REQUESTS"] = "60"
os.environ["LLM_RATE_LIMIT_INTERVAL"] = "60"

# Rate limiting is automatically applied to all LLM calls
result = await LLMGateway.acreate_structured_output(
    text_input="Your input",
    system_prompt="Your prompt",
    response_model=YourModel
)
# Automatically respects rate limits

3. Model Token Management

Automatic token limit detection and management:
from cognee.infrastructure.llm.utils import get_model_max_tokens

# Get max tokens for different models
gpt4_tokens = get_model_max_tokens("gpt-4o-mini")
claude_tokens = get_model_max_tokens("claude-3-5-sonnet-20241022")
gemini_tokens = get_model_max_tokens("gemini-pro")

print(f"GPT-4o-mini max tokens: {gpt4_tokens}")
print(f"Claude-3.5-Sonnet max tokens: {claude_tokens}")
print(f"Gemini Pro max tokens: {gemini_tokens}")

4. Embedding Engine Integration

LiteLLM powers the embedding functionality for semantic search:
from cognee.infrastructure.databases.vector.embeddings.LiteLLMEmbeddingEngine import LiteLLMEmbeddingEngine

# Initialize embedding engine
embedding_engine = LiteLLMEmbeddingEngine(
    model="openai/text-embedding-3-large",
    provider="openai",
    dimensions=3072
)

# Generate embeddings
texts = ["Sample text to embed", "Another text for embedding"]
embeddings = await embedding_engine.embed_text(texts)

print(f"Generated {len(embeddings)} embeddings")
print(f"Embedding dimensions: {len(embeddings[0])}")

Advanced Features

Fallback Configuration

Usage Examples

Structured Output Generation
from cognee.infrastructure.llm.LLMGateway import LLMGateway
from pydantic import BaseModel

class AnalysisModel(BaseModel):
    summary: str
    key_points: list[str]
    sentiment: str
    confidence: float

# Generate structured analysis
response = await LLMGateway.acreate_structured_output(
    text_input="Analyze this document: AI is revolutionizing healthcare...",
    system_prompt="You are a document analyzer. Provide structured analysis.",
    response_model=AnalysisModel
)

print(f"Summary: {response.summary}")
print(f"Key Points: {response.key_points}")
print(f"Sentiment: {response.sentiment}")
print(f"Confidence: {response.confidence}")

Advanced Integration

Audio Transcription

Performance & Monitoring

Performance Optimization

Efficient Processing
# Token optimization
max_tokens = get_model_max_tokens("gpt-4o-mini")

# Concurrent requests
tasks = [
    LLMGateway.acreate_structured_output(...)
    for content in content_list
]
results = await asyncio.gather(*tasks)

Observability

Monitoring Integration
# Langfuse integration (automatic)
# Structured logging
# Performance metrics
# Request/response tracking

Dependencies

The LiteLLM integration requires specific package versions:
litellm = ">=1.71.0, <2.0.0"
instructor = ">=1.9.1, <2.0.0"
limits = ">=4.4.1, <5"
These dependencies are automatically installed when you install Cognee with LiteLLM support.

Error Handling

Automatic Error Detection

Best Practices

Configuration

Setup Guidelines
  • Always use structured outputs for type safety
  • Configure rate limits to match API quotas
  • Set up fallback configurations for reliability
  • Test connections before production deployment

Optimization

Performance Tips
  • Use appropriate models for your use case (cost vs. performance)
  • Monitor token usage to optimize costs
  • Use async operations for concurrent processing
  • Enable rate limiting to prevent quota exhaustion

Troubleshooting

Common Issues

Next Steps

Framework Status: LiteLLM is fully supported for multi-provider access and cost optimization. For new projects requiring advanced structured outputs, consider BAML for better type safety and prompt management.