FastEmbed

FastEmbed provides local, offline embedding capabilities for Cognee, enabling you to generate high-quality embeddings without API costs or external dependencies. This is perfect for privacy-sensitive applications, development environments, and cost-conscious deployments.

FastEmbed runs completely locally, requiring no API keys or internet connection for embedding generation, making it ideal for privacy-focused and cost-effective applications.

Key Benefits

Zero API Costs

Cost EffectiveNo per-request costs after initial model download - perfect for high-volume applications.

Complete Privacy

Data SecurityAll processing happens locally with no data sent to external services.

Offline Capability

No Internet RequiredGenerate embeddings without internet connectivity after initial setup.

Fast Inference

Low LatencyLocal processing provides lower latency than API calls for small to medium workloads.

Supported Models

FastEmbed supports several high-quality embedding models:

BGE Models (Recommended)

BAAI BGE Series

Model	Dimensions	Description	Best For
`BAAI/bge-small-en-v1.5`	384	Compact, fast	Development, quick prototyping
`BAAI/bge-base-en-v1.5`	768	Balanced quality/speed	General production use
`BAAI/bge-large-en-v1.5`	1024	Highest quality	Quality-critical applications

Sentence Transformers

Popular Open Source Models

Model	Dimensions	Description
`sentence-transformers/all-MiniLM-L6-v2`	384	Fast, lightweight
`sentence-transformers/all-mpnet-base-v2`	768	High quality, balanced

These models are widely used and well-tested across many applications.

Configuration

Simple Configuration

# Configure FastEmbed
EMBEDDING_PROVIDER=fastembed
EMBEDDING_MODEL=BAAI/bge-large-en-v1.5
EMBEDDING_DIMENSIONS=1024

# No API key needed for local models

import cognee

# Use FastEmbed embeddings
await cognee.add("Your content here")
await cognee.cognify()  # Uses local FastEmbed models

# Search works the same way
results = await cognee.search("your query")
print(f"Found {len(results)} results using local embeddings")

Usage Examples

import cognee
import os

# Configure for complete privacy
os.environ["EMBEDDING_PROVIDER"] = "fastembed"
os.environ["EMBEDDING_MODEL"] = "BAAI/bge-large-en-v1.5"
os.environ["VECTOR_DB_PROVIDER"] = "lancedb"  # Local vector storage

# Process sensitive data locally
sensitive_data = [
    "Confidential business information...",
    "Personal user data...",
    "Proprietary research findings..."
]

await cognee.add(sensitive_data)
await cognee.cognify()  # All processing happens locally

# Search without external API calls
results = await cognee.search("find confidential information")
print("✅ Processed sensitive data without external API calls")

Performance Characteristics

Speed

Processing Speed

Small model: ~1000 texts/minute
Base model: ~500 texts/minute
Large model: ~200 texts/minute

Performance varies by hardware

Memory Usage

Resource Requirements

Small model: ~200MB RAM
Base model: ~500MB RAM
Large model: ~1GB RAM

Plus model download storage

Quality

Embedding Quality

Small: Good for development
Base: Production-ready quality
Large: Comparable to commercial APIs

Setup Time

Initial Setup

First run: 2-10 minutes (model download)
Subsequent runs: Instant startup
Storage: 100MB-2GB per model

Advanced Features

Model Caching

Efficient Model ManagementFastEmbed automatically downloads and caches models locally:

# Models are cached after first download
# Location: ~/.cache/fastembed/ (default)

# Check cached models
import fastembed
print("Available models:", fastembed.list_supported_models())

# Models persist across Cognee sessions
# No re-download needed

Batch Processing

Optimized Throughput

from cognee.infrastructure.databases.vector.embeddings.FastembedEmbeddingEngine import FastembedEmbeddingEngine

# Initialize with custom batch size
embedding_engine = FastembedEmbeddingEngine(
    model="BAAI/bge-base-en-v1.5",
    batch_size=64  # Adjust based on available memory
)

# Process large batches efficiently
large_text_list = [...]  # Hundreds of texts
embeddings = await embedding_engine.embed_text(large_text_list)

Hardware Optimization

CPU & Memory Tuning

# Optimize for your hardware
import os

# Use all available CPU cores
os.environ["OMP_NUM_THREADS"] = str(os.cpu_count())

# Configure memory usage
os.environ["FASTEMBED_CACHE_PATH"] = "/path/to/fast/storage"

# Enable GPU acceleration (if available)
os.environ["FASTEMBED_DEVICE"] = "cuda"  # or "mps" for Apple Silicon

Comparison with OpenAI

Embedding Quality

Metric	FastEmbed (BGE-Large)	OpenAI (3-Large)	Notes
Dimensions	1024	3072	OpenAI has higher dimensionality
Quality	Very Good	Excellent	OpenAI slightly better for complex tasks
Speed	Fast (local)	Medium (API)	FastEmbed faster for small batches
Cost	Free	$0.13/1M tokens	FastEmbed eliminates API costs
Privacy	Complete	API-dependent	FastEmbed keeps data local

Next Steps

Ollama Embeddings

Self-Hosted ModelsSet up Ollama for custom embedding models with complete control.

OpenAI Embeddings

Cloud EmbeddingsCompare with OpenAI embeddings for quality vs cost trade-offs.

Vector Stores

Storage OptionsChoose the right vector database for your local embeddings.

Configuration

Advanced SetupConfigure advanced FastEmbed settings and hardware optimization.

Getting Started

Core Concepts

Examples

CLI

Python SDK

UI

Key Benefits

Zero API Costs

Complete Privacy

Offline Capability

Fast Inference

Supported Models

Configuration

Usage Examples

Performance Characteristics

Speed

Memory Usage

Quality

Setup Time

Advanced Features

Comparison with OpenAI

Next Steps

Ollama Embeddings

OpenAI Embeddings

Vector Stores

Configuration

Getting Started

Core Concepts

Examples

CLI

Python SDK

UI

​Key Benefits

Zero API Costs

Complete Privacy

Offline Capability

Fast Inference

​Supported Models

​Configuration

​Usage Examples

​Performance Characteristics

Speed

Memory Usage

Quality

Setup Time

​Advanced Features

​Comparison with OpenAI

​Next Steps

Ollama Embeddings

OpenAI Embeddings

Vector Stores

Configuration

Key Benefits

Supported Models

Configuration

Usage Examples

Performance Characteristics

Advanced Features

Comparison with OpenAI

Next Steps