Ollama enables you to run embedding models locally on your own infrastructure, providing complete control over your embedding generation process. This is ideal for privacy-sensitive applications, custom models, and environments requiring full data sovereignty.
Ollama embeddings provide self-hosted embedding generation with configurable models, complete privacy, and no external dependencies after initial setup.

Key Benefits

Complete Control

Self-HostedRun embedding models on your own infrastructure with full control over the process.

Custom Models

Model FlexibilityUse any Ollama-compatible embedding model or create your own custom models.

Data Sovereignty

Maximum PrivacyAll data processing happens on your infrastructure with no external API calls.

Configurable Endpoints

Flexible DeploymentConfigure custom endpoints, ports, and deployment architectures.

Default Configuration

Ollama embeddings in Cognee use the following defaults:

Default Settings

Setup Guide

1

Install Ollama

Get Ollama RunningInstall Ollama on your system and start the server.
2

Download Model

Pull Embedding ModelDownload the embedding model you want to use.
3

Configure Cognee

Set Environment VariablesConfigure Cognee to use your Ollama instance.
4

Test Integration

Verify SetupTest that embeddings are working correctly.

Installation & Setup

System Installation
# Install Ollama (macOS)
brew install ollama

# Install Ollama (Linux)
curl -fsSL https://ollama.ai/install.sh | sh

# Start Ollama server
ollama serve
Verify Installation
# Check Ollama is running
curl http://localhost:11434/api/tags

# Should return JSON with available models

Advanced Configuration

Custom Endpoints

Integration Examples

import cognee
import os

# Configure Ollama embeddings
os.environ["EMBEDDING_PROVIDER"] = "ollama"
os.environ["EMBEDDING_MODEL"] = "avr/sfr-embedding-mistral:latest"
os.environ["EMBEDDING_DIMENSIONS"] = "1024"

# Use with Cognee
await cognee.add("Ollama provides local embedding generation")
await cognee.cognify()

# Search with local embeddings
results = await cognee.search("local embedding generation")
print(f"Found {len(results)} results using Ollama embeddings")

Troubleshooting

Connection Issues

Best Practices

Model Selection

Choose the Right Model
  • Development: Use smaller, faster models
  • Production: Balance quality vs performance
  • Specialized: Choose domain-specific models
  • Resource-constrained: Use efficient models

Infrastructure

Deployment Tips
  • Run Ollama on dedicated hardware for consistency
  • Use SSD storage for better model loading
  • Monitor resource usage and scale accordingly
  • Set up health checks and monitoring

Next Steps