FastEmbed provides local, offline embedding capabilities for Cognee, enabling you to generate high-quality embeddings without API costs or external dependencies. This is perfect for privacy-sensitive applications, development environments, and cost-conscious deployments.
FastEmbed runs completely locally, requiring no API keys or internet connection for embedding generation, making it ideal for privacy-focused and cost-effective applications.

Key Benefits

Zero API Costs

Cost EffectiveNo per-request costs after initial model download - perfect for high-volume applications.

Complete Privacy

Data SecurityAll processing happens locally with no data sent to external services.

Offline Capability

No Internet RequiredGenerate embeddings without internet connectivity after initial setup.

Fast Inference

Low LatencyLocal processing provides lower latency than API calls for small to medium workloads.

Supported Models

FastEmbed supports several high-quality embedding models:

BGE Models (Recommended)

Configuration

Simple Configuration
# Configure FastEmbed
EMBEDDING_PROVIDER=fastembed
EMBEDDING_MODEL=BAAI/bge-large-en-v1.5
EMBEDDING_DIMENSIONS=1024

# No API key needed for local models
import cognee

# Use FastEmbed embeddings
await cognee.add("Your content here")
await cognee.cognify()  # Uses local FastEmbed models

# Search works the same way
results = await cognee.search("your query")
print(f"Found {len(results)} results using local embeddings")

Usage Examples

import cognee
import os

# Configure for complete privacy
os.environ["EMBEDDING_PROVIDER"] = "fastembed"
os.environ["EMBEDDING_MODEL"] = "BAAI/bge-large-en-v1.5"
os.environ["VECTOR_DB_PROVIDER"] = "lancedb"  # Local vector storage

# Process sensitive data locally
sensitive_data = [
    "Confidential business information...",
    "Personal user data...",
    "Proprietary research findings..."
]

await cognee.add(sensitive_data)
await cognee.cognify()  # All processing happens locally

# Search without external API calls
results = await cognee.search("find confidential information")
print("✅ Processed sensitive data without external API calls")

Performance Characteristics

Speed

Processing Speed
  • Small model: ~1000 texts/minute
  • Base model: ~500 texts/minute
  • Large model: ~200 texts/minute
Performance varies by hardware

Memory Usage

Resource Requirements
  • Small model: ~200MB RAM
  • Base model: ~500MB RAM
  • Large model: ~1GB RAM
Plus model download storage

Quality

Embedding Quality
  • Small: Good for development
  • Base: Production-ready quality
  • Large: Comparable to commercial APIs

Setup Time

Initial Setup
  • First run: 2-10 minutes (model download)
  • Subsequent runs: Instant startup
  • Storage: 100MB-2GB per model

Advanced Features

Model Caching

Comparison with OpenAI

Embedding Quality
MetricFastEmbed (BGE-Large)OpenAI (3-Large)Notes
Dimensions10243072OpenAI has higher dimensionality
QualityVery GoodExcellentOpenAI slightly better for complex tasks
SpeedFast (local)Medium (API)FastEmbed faster for small batches
CostFree$0.13/1M tokensFastEmbed eliminates API costs
PrivacyCompleteAPI-dependentFastEmbed keeps data local

Next Steps