Embedding providers in Cognee convert text into high-dimensional vectors that capture semantic meaning, enabling powerful similarity search and knowledge graph enhancement.

Key Features

Multiple Providers

Flexible OptionsSupport for OpenAI, local models, self-hosted solutions, and custom endpoints.

Unified Interface

Consistent APIClean, protocol-based architecture ensures consistency across all providers.

Automatic Optimization

Smart ProcessingBuilt-in batching, rate limiting, and context window management.

Production Ready

Enterprise FeaturesRate limiting, retry mechanisms, fallbacks, and error handling.

Architecture

The embedding system is built around a clean, protocol-based architecture:
1

EmbeddingEngine Protocol

Unified InterfaceDefines the interface that all embedding providers must implement for consistency.
2

Provider-Specific Engines

Concrete ImplementationsDedicated implementations for different embedding providers with optimized configurations.
3

Configuration Management

Centralized SetupEnvironment variable support with validation and fallback configurations.
4

Rate Limiting & Retries

Reliability FeaturesBuilt-in rate limiting and retry mechanisms to handle API quotas and failures.

Embedding Engine Interface

All embedding providers implement the EmbeddingEngine protocol for consistency:
from typing import Protocol

class EmbeddingEngine(Protocol):
    async def embed_text(self, text: list[str]) -> list[list[float]]:
        """Embed text strings into vector representations"""
        
    def get_vector_size(self) -> int:
        """Return the dimensionality of embedding vectors"""

Supported Providers

Cognee supports three main categories of embedding providers:

1. LiteLLM-Based Providers (Default)

Cloud Providers

2. FastEmbed (Local/Offline)

Local Models

3. Ollama (Self-Hosted)

Self-Hosted Models

Configuration

Complete Configuration
# Primary Embedding Configuration
EMBEDDING_PROVIDER=openai                    # Provider: openai, fastembed, ollama
EMBEDDING_MODEL=openai/text-embedding-3-large  # Model with provider prefix
EMBEDDING_DIMENSIONS=3072                    # Vector dimensions
EMBEDDING_MAX_TOKENS=8191                    # Maximum tokens per request
EMBEDDING_API_KEY=your_api_key              # API key (if required)
EMBEDDING_ENDPOINT=                          # Custom endpoint (optional)
EMBEDDING_API_VERSION=                       # API version (optional)

# HuggingFace Tokenizer (for Ollama)
HUGGINGFACE_TOKENIZER=Salesforce/SFR-Embedding-Mistral

# Rate Limiting
EMBEDDING_RATE_LIMIT_ENABLED=false          # Enable rate limiting
EMBEDDING_RATE_LIMIT_REQUESTS=60            # Requests per interval
EMBEDDING_RATE_LIMIT_INTERVAL=60            # Interval in seconds

# Development/Testing
MOCK_EMBEDDING=false                         # Use mock embeddings for testing
DISABLE_RETRIES=false                        # Disable retries for testing

Advanced Features

Rate Limiting & Retry Logic

Next Steps