Skip to Content
How-to GuidesLocal Models

🚀 Getting Started with Local Models

Let’s get you set up with running models locally! You’ll need to run the model on your machine or use one of the providers hosting the model.

🛠️ Setting Up Ollama

First things first, head over to Ollama’s website and download their application.

📚 Choosing Your Models

We’ve tested various models, and here’s what we’ve found works best:

For Completion Tasks

You’ve got a few great options, depending on your hardware:

  • Phi-4 (14B parameters)

    • Perfect if you don’t have much memory
    • Great entry point for local models, install with ollama pull phi4
  • DeepSeek-R1 (32B parameters)

    • Our recommended choice for better performance
    • Install and try it out with: ollama run deepseek-r1:32b
  • Llama 3.3 (70B parameters, quantized)

    • The powerhouse option
    • Install with: ollama pull llama3.3-70b-instruct-q3_K_M

For Embeddings

For embeddings, we recommend using the SFR-Embedding-Mistral model:

ollama pull avr/sfr-embedding-mistral

🧪 Testing Your Setup

Want to make sure everything’s working? Here’s how:

  1. Test your completion model (using Phi-4 as an example):
ollama run phi4
  1. Test the embedding model:
ollama run avr/sfr-embedding-mistral
curl http://localhost:11434/api/embeddings -d '{ "model": "avr/sfr-embedding-mistral:<TAG>", "prompt": "Your prompt here" }'
  1. See what models you have installed:
ollama list

⚙️ Configuration

Use these environment variables in your .env file:

LLM_API_KEY = "ollama" LLM_MODEL = "phi4:latest" LLM_PROVIDER = "ollama" LLM_ENDPOINT = "http://localhost:11434/v1" EMBEDDING_PROVIDER = "ollama" EMBEDDING_MODEL = "avr/sfr-embedding-mistral:latest" EMBEDDING_ENDPOINT = "http://localhost:11434/api/embeddings" EMBEDDING_DIMENSIONS = 4096 HUGGINGFACE_TOKENIZER = "Salesforce/SFR-Embedding-Mistral"