🚀 Getting Started with Local Models
Let’s get you set up with running models locally! You’ll need to run the model on your machine or use one of the providers hosting the model.
🛠️ Setting Up Ollama
First things first, head over to Ollama’s website and download their application.
📚 Choosing Your Models
We’ve tested various models, and here’s what we’ve found works best:
For Completion Tasks
You’ve got a few great options, depending on your hardware:
-
Phi-4 (14B parameters)
- Perfect if you don’t have much memory
- Great entry point for local models, install with
ollama pull phi4
-
DeepSeek-R1 (32B parameters)
- Our recommended choice for better performance
- Install and try it out with:
ollama run deepseek-r1:32b
-
Llama 3.3 (70B parameters, quantized)
- The powerhouse option
- Install with:
ollama pull llama3.3-70b-instruct-q3_K_M
For Embeddings
For embeddings, we recommend using the SFR-Embedding-Mistral model:
ollama pull avr/sfr-embedding-mistral
🧪 Testing Your Setup
Want to make sure everything’s working? Here’s how:
- Test your completion model (using Phi-4 as an example):
ollama run phi4
- Test the embedding model:
ollama run avr/sfr-embedding-mistral
curl http://localhost:11434/api/embeddings -d '{
"model": "avr/sfr-embedding-mistral:<TAG>",
"prompt": "Your prompt here"
}'
- See what models you have installed:
ollama list
⚙️ Configuration
Use these environment variables in your .env file:
LLM_API_KEY = "ollama"
LLM_MODEL = "phi4:latest"
LLM_PROVIDER = "ollama"
LLM_ENDPOINT = "http://localhost:11434/v1"
EMBEDDING_PROVIDER = "ollama"
EMBEDDING_MODEL = "avr/sfr-embedding-mistral:latest"
EMBEDDING_ENDPOINT = "http://localhost:11434/api/embeddings"
EMBEDDING_DIMENSIONS = 4096
HUGGINGFACE_TOKENIZER = "Salesforce/SFR-Embedding-Mistral"