Setup Local Models with Ollama

Difficulty: Easy

Overview

In this tutorial, you’ll learn how to run local models with Cognee using Ollama . Running models locally allows for private, offline reasoning, and removes the need for cloud-based APIs.

By the end, you will have:

Installed Ollama and downloaded a supported local model
Configured Cognee to use your local model via the Ollama API
Created a basic script to verify everything is working
Queried the local model through a simple Cognee pipeline

What You’ll Learn

Ollama Setup: Install and run the Ollama CLI
Model Download: Pull completion and embeddings models (like phi4 or sfr-embedding-mistral:latest) to run locally
Cognee Config: Setup Cognee to use your local Ollama endpoint

Prerequisites

Before starting this tutorial, ensure you have:

A computer with sufficient RAM (8GB+ recommended for local models)
Python 3.9 to 3.12 installed
Terminal/command line access
Basic familiarity with command line operations

You can also follow our video tutorial for the setup:

Step 1: Download and Install Ollama

Download Ollama

Go to Ollama’s official website and download the installer:

👉 https://ollama.com

Click on the Download button, choose your operating system, and download the Ollama installer.

Ollama provides native installers for macOS, Windows, and Linux.

Install and Verify

Install Ollama on your local system using the downloaded installer, then verify the installation by running:


ollama --version

This command should display the Ollama version number, confirming successful installation.

Step 2: Download Local Models

Choose Your Models

Ollama supports many models. You need:

A large language model to serve completions
An embedding model to create embeddings from your textual data

Pull the LLM Model

Download your chosen language model:


ollama pull YOUR_MODEL

Popular choices include deepseek-r1:32b, llama3.3, or phi4 depending on your system resources.

Pull the Embedding Model

Download your chosen embedding model:


ollama pull YOUR_EMBEDDING_MODEL

Common embedding models include nomic-embed-text or mxbai-embed-large.

Verify Installation

To verify your local models are available:


ollama list

This command shows all locally available models.

⚠️ Important: Models below 32B parameters may be unable to create proper graph structures sometimes. We suggest using models like deepseek-r1:32b or llama3.3 for best results.

Step 3: Install Cognee with Ollama Support

Set Up Python Environment

Create and activate a virtual environment:


uv venv
source .venv/bin/activate

Install Cognee with Ollama Dependencies

Install Cognee with Ollama support:


uv pip install "cognee[ollama]"

The [ollama] extra includes all necessary dependencies for local model integration.

Step 4: Configure Cognee Environment

Create Environment Configuration

Create a .env file in your project directory with the following configuration:


LLM_API_KEY="ollama"
LLM_MODEL="YOUR_MODEL"
LLM_PROVIDER="ollama"
LLM_ENDPOINT="http://localhost:11434/v1"
EMBEDDING_PROVIDER="ollama"
EMBEDDING_MODEL="YOUR_EMBEDDING_MODEL"
EMBEDDING_ENDPOINT="http://localhost:11434/api/embeddings"
EMBEDDING_DIMENSIONS="DIMENSIONS_OF_YOUR_EMBEDDING_MODEL"
HUGGINGFACE_TOKENIZER="TOKENIZER_TO_YOUR_EMBEDDING_MODEL"

Replace YOUR_MODEL and YOUR_EMBEDDING_MODEL with the actual model names you downloaded.

Configuration Explanation

LLM_ENDPOINT: Points to Ollama’s local completion API
EMBEDDING_ENDPOINT: Points to Ollama’s local embedding API
EMBEDDING_DIMENSIONS: Must match your embedding model’s output dimensions
HUGGINGFACE_TOKENIZER: Specifies the tokenizer for your embedding model

Step 5: Test Your Setup

Create Test Script

Create a test file called test_ollama.py with the following content:


import asyncio
import cognee
from cognee.shared.logging_utils import get_logger, ERROR
from cognee.api.v1.search import SearchType
 
async def main():
    # Create a clean slate for cognee -- reset data and system state
    await cognee.prune.prune_data()
    await cognee.prune.prune_system(metadata=True)
 
    # cognee knowledge graph will be created based on this text
    text = """
    Natural language processing (NLP) is an interdisciplinary
    subfield of computer science and information retrieval.
    """
 
    # Add the text, and make it available for cognify
    await cognee.add(text)
 
    # Run cognify and build the knowledge graph using the added text
    await cognee.cognify()
 
    # Query cognee for insights on the added text
    query_text = "Tell me about NLP"
    search_results = await cognee.search(
        query_type=SearchType.INSIGHTS, 
        query_text=query_text
    )
    
    for result_text in search_results:
        print(result_text)
 
if __name__ == "__main__":
    logger = get_logger(level=ERROR)
    loop = asyncio.new_event_loop()
    asyncio.set_event_loop(loop)
    try:
        loop.run_until_complete(main())
    finally:
        loop.run_until_complete(loop.shutdown_asyncgens())

Run the Test

Execute your test script:


python test_ollama.py

If everything is configured correctly, you should see insights about NLP generated using your local models.

Summary

In this tutorial, you learned how to:

Install Ollama and run it as a local LLM server
Download and manage LLM and embedding models locally
Configure Cognee to use Ollama instead of cloud APIs
Test the integration with a complete working example

Running models locally gives you full control over privacy, cost, and speed, whether you’re prototyping or scaling production use-cases.

Next Steps

Now that you have local models running, you can:

Explore more complex knowledge graphs with Load Your Data
Build custom entity extraction with Use Ontologies
Create advanced applications with Build Custom Knowledge Graphs

Join the Conversation!

Have questions or need more help? Join our community to connect with professionals, share insights, and get your questions answered!