Overview

Embedding providers in Cognee convert text into high-dimensional vectors that capture semantic meaning, enabling powerful similarity search and knowledge graph enhancement.

Key Features

Multiple Providers

Flexible OptionsSupport for OpenAI, local models, self-hosted solutions, and custom endpoints.

Unified Interface

Consistent APIClean, protocol-based architecture ensures consistency across all providers.

Automatic Optimization

Smart ProcessingBuilt-in batching, rate limiting, and context window management.

Production Ready

Enterprise FeaturesRate limiting, retry mechanisms, fallbacks, and error handling.

Architecture

The embedding system is built around a clean, protocol-based architecture:

EmbeddingEngine Protocol

Unified InterfaceDefines the interface that all embedding providers must implement for consistency.

Provider-Specific Engines

Concrete ImplementationsDedicated implementations for different embedding providers with optimized configurations.

Configuration Management

Centralized SetupEnvironment variable support with validation and fallback configurations.

Rate Limiting & Retries

Reliability FeaturesBuilt-in rate limiting and retry mechanisms to handle API quotas and failures.

Embedding Engine Interface

All embedding providers implement the EmbeddingEngine protocol for consistency:

from typing import Protocol

class EmbeddingEngine(Protocol):
    async def embed_text(self, text: list[str]) -> list[list[float]]:
        """Embed text strings into vector representations"""
        
    def get_vector_size(self) -> int:
        """Return the dimensionality of embedding vectors"""

Supported Providers

Cognee supports three main categories of embedding providers:

1. LiteLLM-Based Providers (Default)

Cloud Providers

API-Based Embedding ServicesThe LiteLLMEmbeddingEngine provides access to major cloud providers:

Provider	Model Examples	Description
OpenAI	`text-embedding-3-large`, `text-embedding-3-small`	High-quality embeddings with various size options
Azure OpenAI	`text-embedding-3-large`	Enterprise OpenAI embeddings through Azure
Google	`text-embedding-004`, `textembedding-gecko`	Google’s embedding models
Cohere	`embed-english-v3.0`, `embed-multilingual-v3.0`	Multilingual embedding support
Mistral	`mistral-embed`	Efficient European-based embeddings
Custom APIs	Any OpenAI-compatible endpoint	Self-hosted or custom embedding services

Key Features:

Automatic context window handling with intelligent text splitting
Embedding pooling for oversized text
Comprehensive error handling and retries
Support for custom endpoints and API versions

Configuration Examples

Provider Setup

# OpenAI (Default)
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=openai/text-embedding-3-large
EMBEDDING_DIMENSIONS=3072
EMBEDDING_API_KEY=your_openai_key

# Azure OpenAI
EMBEDDING_PROVIDER=azure
EMBEDDING_MODEL=azure/text-embedding-3-large
EMBEDDING_API_KEY=your_azure_key
EMBEDDING_ENDPOINT=https://your-resource.openai.azure.com
EMBEDDING_API_VERSION=2024-02-01

# Google
EMBEDDING_PROVIDER=google
EMBEDDING_MODEL=google/text-embedding-004
EMBEDDING_API_KEY=your_google_key

# Cohere
EMBEDDING_PROVIDER=cohere
EMBEDDING_MODEL=cohere/embed-english-v3.0
EMBEDDING_API_KEY=your_cohere_key

2. FastEmbed (Local/Offline)

Local Models

FastEmbed Configuration

Local Setup

# Configure FastEmbed
EMBEDDING_PROVIDER=fastembed
EMBEDDING_MODEL=BAAI/bge-large-en-v1.5
EMBEDDING_DIMENSIONS=1024

# No API key needed for local models

import cognee

# Use FastEmbed for privacy and cost savings
await cognee.add("Your sensitive data here")
await cognee.cognify()  # Uses local embeddings

# Search works the same way
results = await cognee.search("your query")

3. Ollama (Self-Hosted)

Self-Hosted Models

Ollama Setup

Self-Hosted Configuration

# Start Ollama server with embedding model
ollama pull avr/sfr-embedding-mistral:latest
ollama serve

# Configure Cognee
EMBEDDING_PROVIDER=ollama
EMBEDDING_MODEL=avr/sfr-embedding-mistral:latest
EMBEDDING_DIMENSIONS=1024
EMBEDDING_ENDPOINT=http://localhost:11434/api/embeddings
HUGGINGFACE_TOKENIZER=Salesforce/SFR-Embedding-Mistral

import cognee

# Use self-hosted Ollama
await cognee.add("Your data here")
await cognee.cognify()  # Uses Ollama embeddings

# Complete privacy and control
results = await cognee.search("your query")

Configuration

Complete Configuration

# Primary Embedding Configuration
EMBEDDING_PROVIDER=openai                    # Provider: openai, fastembed, ollama
EMBEDDING_MODEL=openai/text-embedding-3-large  # Model with provider prefix
EMBEDDING_DIMENSIONS=3072                    # Vector dimensions
EMBEDDING_MAX_TOKENS=8191                    # Maximum tokens per request
EMBEDDING_API_KEY=your_api_key              # API key (if required)
EMBEDDING_ENDPOINT=                          # Custom endpoint (optional)
EMBEDDING_API_VERSION=                       # API version (optional)

# HuggingFace Tokenizer (for Ollama)
HUGGINGFACE_TOKENIZER=Salesforce/SFR-Embedding-Mistral

# Rate Limiting
EMBEDDING_RATE_LIMIT_ENABLED=false          # Enable rate limiting
EMBEDDING_RATE_LIMIT_REQUESTS=60            # Requests per interval
EMBEDDING_RATE_LIMIT_INTERVAL=60            # Interval in seconds

# Development/Testing
MOCK_EMBEDDING=false                         # Use mock embeddings for testing
DISABLE_RETRIES=false                        # Disable retries for testing

Advanced Features

Rate Limiting & Retry Logic

Built-in ProtectionSophisticated rate limiting to handle API quotas:

Configurable Limits: Set requests per minute/hour limits
Exponential Backoff: Intelligent retry with jitter to avoid thundering herd
Error Pattern Recognition: Automatically detects rate limit errors across providers
Thread-Safe: Singleton pattern ensures consistent rate limiting

# Enable rate limiting for production
os.environ["EMBEDDING_RATE_LIMIT_ENABLED"] = "true"
os.environ["EMBEDDING_RATE_LIMIT_REQUESTS"] = "60"
os.environ["EMBEDDING_RATE_LIMIT_INTERVAL"] = "60"

# Automatic rate limiting applied to all embedding operations
await cognee.add(large_document_list)
await cognee.cognify()  # Respects rate limits automatically

Context Window Management

Large Text HandlingFor large text inputs that exceed model context windows:

Automatic Splitting: Recursively splits large texts into manageable chunks
Overlap Strategy: Uses overlapping segments for better semantic preservation
Embedding Pooling: Combines multiple embeddings using averaging

# Large text is automatically handled
very_long_text = "..." * 10000  # Text exceeding context window

await cognee.add(very_long_text)
await cognee.cognify()  # Automatically splits and pools embeddings

# Search works seamlessly
results = await cognee.search("find information in long text")

Tokenization Support

Provider-Specific TokenizersOptimal text preprocessing with provider-specific tokenizers:

OpenAI: TikToken tokenizer for GPT models
Google: Gemini-specific tokenizer
Mistral: Mistral tokenizer
HuggingFace: Fallback tokenizer for custom models
Ollama: Configurable HuggingFace tokenizers

# Tokenization is automatic based on provider
# No manual tokenization needed

# Different providers use optimal tokenizers
os.environ["EMBEDDING_PROVIDER"] = "openai"  # Uses TikToken
# vs
os.environ["EMBEDDING_PROVIDER"] = "ollama"  # Uses HuggingFace tokenizer

Next Steps

Vector Stores

Storage OptionsLearn about vector database options that work with embedding providers.

LLMs Overview

LLM IntegrationUnderstand how embedding providers work with LLM providers in Cognee.

Configuration

Advanced SetupConfigure advanced embedding settings and optimization parameters.

Search Types

Search MethodsLearn about different search types that leverage embedding providers.

Getting Started

Core Concepts

Examples

CLI

Python SDK

UI

Key Features

Multiple Providers

Unified Interface

Automatic Optimization

Production Ready

Architecture

Embedding Engine Interface

Supported Providers

1. LiteLLM-Based Providers (Default)

2. FastEmbed (Local/Offline)

3. Ollama (Self-Hosted)

Configuration

Advanced Features

Next Steps

Vector Stores

LLMs Overview

Configuration

Search Types

Getting Started

Core Concepts

Examples

CLI

Python SDK

UI

​Key Features

Multiple Providers

Unified Interface

Automatic Optimization

Production Ready

​Architecture

​Embedding Engine Interface

​Supported Providers

​1. LiteLLM-Based Providers (Default)

​2. FastEmbed (Local/Offline)

​3. Ollama (Self-Hosted)

​Configuration

​Advanced Features

​Next Steps

Vector Stores

LLMs Overview

Configuration

Search Types

Key Features

Architecture

Embedding Engine Interface

Supported Providers

1. LiteLLM-Based Providers (Default)

2. FastEmbed (Local/Offline)

3. Ollama (Self-Hosted)

Configuration

Advanced Features

Next Steps