> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cognee.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Embedding Providers

> Configure embedding providers for semantic search in Cognee

Embedding providers convert text into vector representations that enable semantic search. These vectors capture the meaning of text, allowing Cognee to find conceptually related content even when the wording is different.

<Info>
  **New to configuration?**

  See the [Setup Configuration Overview](./overview) for the complete workflow:

  install extras → create `.env` → choose providers → handle pruning.
</Info>

## Supported Providers

Cognee supports multiple embedding providers:

* **OpenAI** — Text embedding models via OpenAI API (default)
* **Azure OpenAI** — Text embedding models via Azure OpenAI Service
* **Google Gemini** — Embedding models via Google AI
* **Mistral** — Embedding models via Mistral AI
* **AWS Bedrock** — Embedding models via AWS Bedrock
* **Ollama** — Local embedding models via Ollama
* **LM Studio** — Local embedding models via LM Studio
* **Fastembed** — CPU-friendly local embeddings
* **HuggingFace** — Embedding models via HuggingFace Inference API or Inference Endpoints
* **vLLM** — Self-hosted embedding models via vLLM
* **OpenAI-Compatible** — Direct OpenAI SDK for llama.cpp, vLLM, TEI, and any `/v1/embeddings` server (bypasses LiteLLM)
* **Custom** — OpenAI-compatible embedding endpoints routed through LiteLLM (DeepInfra, company-internal)

<Warning>
  **LLM/Embedding Configuration**: If you configure only LLM or only embeddings, the other defaults to OpenAI. Ensure you have a working OpenAI API key, or configure both LLM and embeddings to avoid unexpected defaults.
</Warning>

## Configuration

<Accordion title="Environment Variables">
  Set these environment variables in your `.env` file:

  * `EMBEDDING_PROVIDER` — The provider to use (openai, gemini, mistral, ollama, fastembed, openai\_compatible, custom)
  * `EMBEDDING_MODEL` — The specific embedding model to use
  * `EMBEDDING_DIMENSIONS` — The vector dimension size (must match your vector store)
  * `EMBEDDING_API_KEY` — Your API key (falls back to `LLM_API_KEY` if not set)
  * `EMBEDDING_ENDPOINT` — Custom endpoint URL (for Azure, Ollama, or custom providers)
  * `EMBEDDING_API_VERSION` — API version (for Azure OpenAI)
  * `EMBEDDING_MAX_COMPLETION_TOKENS` — Maximum tokens per embedding request; used for tokenizer-based chunk sizing (optional, default `8191`)
  * `HUGGINGFACE_TOKENIZER` — HuggingFace Hub model ID used for token counting when `EMBEDDING_PROVIDER` is `ollama`
</Accordion>

## Provider Setup Guides

<AccordionGroup>
  <Accordion title="OpenAI (Default)">
    OpenAI provides high-quality embeddings with good performance.

    ```dotenv theme={null}
    EMBEDDING_PROVIDER="openai"
    EMBEDDING_MODEL="openai/text-embedding-3-large"
    EMBEDDING_DIMENSIONS="3072"
    # Optional
    # EMBEDDING_API_KEY=sk-...   # falls back to LLM_API_KEY if omitted
    # EMBEDDING_ENDPOINT=https://api.openai.com/v1
    # EMBEDDING_API_VERSION=
    # EMBEDDING_MAX_COMPLETION_TOKENS=8191
    ```
  </Accordion>

  <Accordion title="Azure OpenAI Embeddings">
    Use Azure OpenAI Service for embeddings with your own deployment.

    ```dotenv theme={null}
    EMBEDDING_PROVIDER="openai"
    EMBEDDING_MODEL="azure/text-embedding-3-large"
    EMBEDDING_ENDPOINT="https://<your-az>.cognitiveservices.azure.com/openai/deployments/text-embedding-3-large"
    EMBEDDING_API_KEY="az-..."
    EMBEDDING_API_VERSION="2023-05-15"
    EMBEDDING_DIMENSIONS="3072"
    ```
  </Accordion>

  <Accordion title="Google Gemini">
    Use Google's embedding models for semantic search.

    ```dotenv theme={null}
    EMBEDDING_PROVIDER="gemini"
    EMBEDDING_MODEL="gemini/gemini-embedding-001"
    EMBEDDING_API_KEY="AIza..."
    EMBEDDING_DIMENSIONS="768"
    ```
  </Accordion>

  <Accordion title="Mistral">
    Use Mistral's embedding models for high-quality vector representations.

    ```dotenv theme={null}
    EMBEDDING_PROVIDER="mistral"
    EMBEDDING_MODEL="mistral/mistral-embed"
    EMBEDDING_API_KEY="sk-mis-..."
    EMBEDDING_DIMENSIONS="1024"
    ```

    **Installation**: Install the required dependency:

    ```bash theme={null}
    pip install mistral-common[sentencepiece]
    ```
  </Accordion>

  <Accordion title="AWS Bedrock">
    Use embedding models provided by the AWS Bedrock service.

    ```dotenv theme={null}
    EMBEDDING_PROVIDER="bedrock"
    EMBEDDING_MODEL="<your_model_name>"
    EMBEDDING_DIMENSIONS="<dimensions_of_the_model>"
    EMBEDDING_API_KEY="<your_api_key>"
    EMBEDDING_MAX_COMPLETION_TOKENS="<max_tokens_of_your_model>"
    ```
  </Accordion>

  <Accordion title="Ollama (Local)">
    Run embedding models locally with Ollama for privacy and cost control.

    ```dotenv theme={null}
    EMBEDDING_PROVIDER="ollama"
    EMBEDDING_MODEL="nomic-embed-text:latest"
    EMBEDDING_ENDPOINT="http://localhost:11434/api/embed"
    EMBEDDING_DIMENSIONS="768"
    HUGGINGFACE_TOKENIZER="nomic-ai/nomic-embed-text-v1.5"
    ```

    `HUGGINGFACE_TOKENIZER` is the HuggingFace repo ID of the tokenizer used for token-length counting when sending requests to the Ollama embedding endpoint.

    **Installation**: Install Ollama from [ollama.ai](https://ollama.ai) and pull your desired embedding model:

    ```bash theme={null}
    ollama pull nomic-embed-text:latest
    ```

    `HUGGINGFACE_TOKENIZER` is required when using Ollama. It should be the HuggingFace repo ID for the tokenizer that matches your embedding model. See the `HUGGINGFACE_TOKENIZER environment variable` section below for how to find the correct value for your model.

    If a text input exceeds the model's context window, the Ollama embedding engine automatically falls back by splitting the batch in half and retrying both halves. For a single overlong text, it splits the string into two overlapping segments and averages the resulting embeddings. Cognee no longer pre-truncates text before sending it to Ollama, so this fallback only activates when the server returns a context-length error.

    <Info>
      **Zero-API-key setup**: To run fully offline with no OpenAI key, you must configure both the LLM provider **and** the embedding provider to use local backends. See the [Local Setup guide](/guides/local-setup) for a complete combined `.env` example.
    </Info>

    <Tip>
      **Ollama falling behind?** Ollama processes requests sequentially. If it becomes unresponsive or returns errors under load, reduce `EMBEDDING_BATCH_SIZE` (default `36`) to send fewer chunks per call — values between `1` and `10` work well for most local hardware:

      ```dotenv theme={null}
      EMBEDDING_BATCH_SIZE="5"
      ```
    </Tip>
  </Accordion>

  <Accordion title="LM Studio (Local)">
    Run embedding models locally with LM Studio for privacy and cost control.

    ```dotenv theme={null}
    EMBEDDING_PROVIDER="custom"
    EMBEDDING_MODEL="lm_studio/text-embedding-nomic-embed-text-1.5"
    EMBEDDING_ENDPOINT="http://127.0.0.1:1234/v1"
    EMBEDDING_API_KEY="."
    EMBEDDING_DIMENSIONS="768"
    ```

    **Installation**: Install LM Studio from [lmstudio.ai](https://lmstudio.ai/) and download your desired model from
    LM Studio's interface.
    Load your model, start the LM Studio server, and Cognee will be able to connect to it.
  </Accordion>

  <Accordion title="Fastembed (Local)">
    Use Fastembed for CPU-friendly local embeddings without GPU requirements.

    ```dotenv theme={null}
    EMBEDDING_PROVIDER="fastembed"
    EMBEDDING_MODEL="sentence-transformers/all-MiniLM-L6-v2"
    EMBEDDING_DIMENSIONS="384"
    ```

    **Installation**: Fastembed is included by default with Cognee.

    <Info>
      **Context window handling**: When a text input exceeds the model's context window, Fastembed automatically splits the batch and retries. For a single overlong text, it splits the string into two overlapping segments and averages the resulting embeddings. This mirrors the behavior of the OpenAI-compatible engine.
    </Info>
  </Accordion>

  <Accordion title="HuggingFace">
    Use embedding models from HuggingFace via the [HuggingFace Inference API](https://huggingface.co/docs/api-inference/index) (serverless) or dedicated [Inference Endpoints](https://huggingface.co/docs/inference-endpoints/index).

    <Tabs>
      <Tab title="Serverless">
        ```dotenv theme={null}
        EMBEDDING_PROVIDER="custom"
        EMBEDDING_MODEL="huggingface/BAAI/bge-large-en-v1.5"
        EMBEDDING_API_KEY="hf_..."
        EMBEDDING_DIMENSIONS="1024"
        ```
      </Tab>

      <Tab title="Dedicated Endpoint">
        ```dotenv theme={null}
        EMBEDDING_PROVIDER="custom"
        EMBEDDING_MODEL="huggingface/BAAI/bge-large-en-v1.5"
        EMBEDDING_ENDPOINT="https://<your-endpoint-id>.<region>.aws.endpoints.huggingface.cloud"
        EMBEDDING_API_KEY="hf_..."
        EMBEDDING_DIMENSIONS="1024"
        ```
      </Tab>
    </Tabs>

    **Installation**: Install the HuggingFace extra for tokenizer support:

    ```bash theme={null}
    pip install cognee[huggingface]
    ```

    <Info>
      **HUGGINGFACE\_TOKENIZER with HuggingFace embeddings**: When using `EMBEDDING_PROVIDER="custom"` with a `huggingface/` model, Cognee automatically attempts to load a HuggingFace tokenizer from the model repo for token counting. If that fails, it falls back to the TikToken tokenizer. You do not need to set `HUGGINGFACE_TOKENIZER` manually for this provider — it is only required when using `EMBEDDING_PROVIDER="ollama"` (see the Ollama section above).
    </Info>
  </Accordion>

  <Accordion title="vLLM">
    Use vLLM to serve local or self-hosted embedding models with an OpenAI-compatible API.

    **Example with Qwen3-Embedding-4B on port 8001:**

    ```dotenv theme={null}
    EMBEDDING_PROVIDER="custom"
    EMBEDDING_MODEL="hosted_vllm/Qwen/Qwen3-Embedding-4B"
    EMBEDDING_ENDPOINT="http://localhost:8001/v1"
    EMBEDDING_API_KEY="."
    EMBEDDING_DIMENSIONS="2560"
    ```

    <Warning>
      **`hosted_vllm/` prefix required**: Include `hosted_vllm/` at the start of the model name so LiteLLM routes requests to your vLLM server. The model name after the prefix should match the model ID returned by your vLLM server's `/v1/models` endpoint.
    </Warning>

    **Tokenization**: Cognee automatically strips the `hosted_vllm/` prefix when loading the HuggingFace tokenizer, so no separate `HUGGINGFACE_TOKENIZER` setting is needed as long as the model name after the prefix is a valid HuggingFace model ID.

    To verify the model name your vLLM server exposes, run:

    ```bash theme={null}
    curl http://localhost:8001/v1/models
    ```

    See the [LiteLLM vLLM documentation](https://docs.litellm.ai/docs/providers/vllm) for more details.
  </Accordion>

  <Accordion title="OpenAI-Compatible Local Servers (llama.cpp, TEI, vLLM)">
    Use `EMBEDDING_PROVIDER="openai_compatible"` for any local inference server that exposes the standard `/v1/embeddings` endpoint. This provider talks directly to OpenAI-compatible embedding servers via the OpenAI Python SDK, bypassing LiteLLM.

    Use this provider for: llama.cpp (`llama-server --embedding`), vLLM, Hugging Face TEI, LocalAI, Infinity, and similar servers.

    <Tabs>
      <Tab title="llama.cpp">
        Start llama.cpp with embedding support:

        ```bash theme={null}
        llama-server --model your-model.gguf --embedding --port 8080
        ```

        ```dotenv theme={null}
        EMBEDDING_PROVIDER="openai_compatible"
        EMBEDDING_MODEL="default"
        EMBEDDING_ENDPOINT="http://localhost:8080/v1"
        EMBEDDING_API_KEY="no-key-required"
        EMBEDDING_DIMENSIONS="768"
        ```
      </Tab>

      <Tab title="vLLM">
        ```dotenv theme={null}
        EMBEDDING_PROVIDER="openai_compatible"
        EMBEDDING_MODEL="BAAI/bge-large-en-v1.5"
        EMBEDDING_ENDPOINT="http://localhost:8001/v1"
        EMBEDDING_API_KEY="."
        EMBEDDING_DIMENSIONS="1024"
        ```

        Unlike `EMBEDDING_PROVIDER="custom"` (LiteLLM), you do **not** need a `hosted_vllm/` prefix in the model name — use the model ID directly as reported by your vLLM server's `/v1/models` endpoint.
      </Tab>

      <Tab title="Hugging Face TEI">
        ```dotenv theme={null}
        EMBEDDING_PROVIDER="openai_compatible"
        EMBEDDING_MODEL="BAAI/bge-large-en-v1.5"
        EMBEDDING_ENDPOINT="http://localhost:8080/v1"
        EMBEDDING_API_KEY="."
        EMBEDDING_DIMENSIONS="1024"
        ```
      </Tab>
    </Tabs>

    <Info>
      **Endpoint normalisation**: The engine automatically appends `/v1` to `EMBEDDING_ENDPOINT` if it is missing, and strips a trailing `/embeddings` suffix. You can pass either `http://localhost:8080` or `http://localhost:8080/v1` — both work.
    </Info>

    <Info>
      **Tokenizer and token limits**: The `openai_compatible` engine automatically loads a tokenizer for chunk sizing. It first tries to load a HuggingFace tokenizer matching `EMBEDDING_MODEL`; if that fails (for example, because the model name is a local alias not on the HuggingFace Hub), it falls back to the TikToken tokenizer. The token limit passed to the tokenizer is controlled by `EMBEDDING_MAX_COMPLETION_TOKENS` (default `8191`). Set this to match your server's context window if it differs from the default:

      ```dotenv theme={null}
      EMBEDDING_MAX_COMPLETION_TOKENS="4096"
      ```
    </Info>

    <Info>
      **HUGGINGFACE\_TOKENIZER is not needed for this provider.** The engine automatically tries to load a HuggingFace tokenizer using the model name (e.g. `BAAI/bge-large-en-v1.5`) for token counting. If that fails — for example when using `EMBEDDING_MODEL="default"` with llama.cpp — it silently falls back to TikToken. You do not need to set `HUGGINGFACE_TOKENIZER` when using `EMBEDDING_PROVIDER="openai_compatible"`.
    </Info>
  </Accordion>

  <Accordion title="Custom Providers">
    Use OpenAI-compatible embedding endpoints from other providers such as DeepInfra, OpenRouter, or a company-internal server. These are routed through LiteLLM and require a provider prefix in the model name.

    <Tabs>
      <Tab title="DeepInfra">
        ```dotenv theme={null}
        EMBEDDING_PROVIDER="custom"
        EMBEDDING_MODEL="deepinfra/BAAI/bge-base-en-v1.5"
        EMBEDDING_ENDPOINT="https://api.deepinfra.com/v1/openai"
        EMBEDDING_API_KEY="<your-deepinfra-api-key>"
        EMBEDDING_DIMENSIONS="768"
        ```
      </Tab>

      <Tab title="OpenRouter">
        Use the `openrouter/` model prefix. Do not set `EMBEDDING_ENDPOINT`.

        ```dotenv theme={null}
        EMBEDDING_PROVIDER="custom"
        EMBEDDING_MODEL="openrouter/openai/text-embedding-3-small"
        EMBEDDING_API_KEY="sk-or-..."
        EMBEDDING_DIMENSIONS="1536"
        ```
      </Tab>

      <Tab title="Self-Hosted">
        ```dotenv theme={null}
        EMBEDDING_PROVIDER="custom"
        EMBEDDING_MODEL="openai/<your-internal-model-name>"
        EMBEDDING_ENDPOINT="https://embeddings.internal.example.com/v1"
        EMBEDDING_API_KEY="<internal-api-key>"
        EMBEDDING_DIMENSIONS="<match-your-model>"
        ```
      </Tab>
    </Tabs>

    <Info>
      **No endpoint normalisation for `custom`**: Unlike [`openai_compatible`](#openai-compatible-local-servers-llama-cpp-tei-vllm), the `custom` provider passes `EMBEDDING_ENDPOINT` directly to LiteLLM as `api_base` with no automatic `/v1` appending or `/embeddings` stripping. Set the endpoint to exactly the base URL your provider expects (e.g., `https://api.deepinfra.com/v1/openai`), or omit it entirely when using a named LiteLLM prefix such as `openrouter/`.
    </Info>
  </Accordion>
</AccordionGroup>

## Additional Information

<Accordion title="Batch Size">
  `EMBEDDING_BATCH_SIZE` controls how many text chunks are grouped into a single embedding API call. Cognee splits all chunks into batches of this size and sends them concurrently to the embedding engine.

  | Variable               | Default | Description                   |
  | ---------------------- | ------- | ----------------------------- |
  | `EMBEDDING_BATCH_SIZE` | `36`    | Chunks per embedding API call |

  **Local inference (Ollama, llama.cpp, LM Studio)**: Local servers handle one request at a time with limited concurrency. The default `36` can overwhelm them. Reduce the batch size if you see errors or slowdowns:

  ```dotenv theme={null}
  EMBEDDING_BATCH_SIZE="5"
  ```

  **Cloud providers**: Larger batches reduce the number of API calls and are efficient with cloud APIs. The default `36` suits most cloud providers.

  **Relationship to rate limiting**: Each batch counts as one request toward `EMBEDDING_RATE_LIMIT_REQUESTS`. A single file may produce many chunks — with `EMBEDDING_BATCH_SIZE=36`, a document split into 360 chunks generates 10 requests.
</Accordion>

<Accordion title="Timeout and Retry Behavior">
  The `LiteLLMEmbeddingEngine` applies two layers of protection against slow or unreachable endpoints:

  | Limit               | Value       | Configurable   |
  | ------------------- | ----------- | -------------- |
  | Per-attempt timeout | 30 seconds  | No (hardcoded) |
  | Total retry window  | 128 seconds | No (hardcoded) |

  **How retries work**: Failed attempts are retried with exponential back-off starting at 2 seconds, with random jitter, until the 128-second window is exhausted.

  **What is not retried**: `404 Not Found` errors are raised immediately — they indicate a configuration problem (wrong model name or endpoint) rather than a transient failure.

  **Common error messages and causes**:

  * `EmbeddingException: Embedding request timed out. Check EMBEDDING_ENDPOINT connectivity.` — The endpoint did not respond within 30 seconds. Verify that `EMBEDDING_ENDPOINT` is reachable from your network.
  * `EmbeddingException: Cannot connect to embedding endpoint. Check EMBEDDING_ENDPOINT.` — TCP connection was refused or the server closed the connection before responding. Confirm the server is running and the URL is correct.
  * `EmbeddingException: Failed to index data points using model <model>` — The provider returned a 400 or 404 error. Common causes: wrong model name, missing `hosted_vllm/` prefix for vLLM, or an unsupported model at that endpoint.

  **Diagnosing slow local servers**: If you see repeated timeouts with Ollama, LM Studio, or vLLM, the 30-second per-attempt limit may be tight for large batches. Reduce `EMBEDDING_BATCH_SIZE` to send fewer texts per request:

  ```dotenv theme={null}
  EMBEDDING_BATCH_SIZE="5"
  ```

  <Note>
    The timeout and retry values are hardcoded in `LiteLLMEmbeddingEngine` and cannot be changed via environment variables. To use different limits, subclass `LiteLLMEmbeddingEngine` and override `embed_text` with a custom `@retry` decorator.
  </Note>
</Accordion>

<Accordion title="Rate Limiting">
  Control client-side throttling for embedding calls to manage API usage and costs.

  <Warning>
    **Rate limiting is disabled by default.** You must explicitly set `EMBEDDING_RATE_LIMIT_ENABLED="true"` to activate it.
  </Warning>

  **Defaults (when rate limiting is enabled):**

  | Variable                        | Default | Meaning                   |
  | ------------------------------- | ------- | ------------------------- |
  | `EMBEDDING_RATE_LIMIT_ENABLED`  | `false` | Off by default — opt-in   |
  | `EMBEDDING_RATE_LIMIT_REQUESTS` | `60`    | Max requests per interval |
  | `EMBEDDING_RATE_LIMIT_INTERVAL` | `60`    | Interval in seconds       |

  **What counts as one request?**

  One rate-limit request = one `embed_text()` API call = one batch of chunks (not one chunk). With the default `EMBEDDING_BATCH_SIZE=36`, processing 360 chunks produces 10 requests. See the [Batch Size](#batch-size) section for how to tune batch size.

  **Sizing guidance:**

  Set `EMBEDDING_RATE_LIMIT_REQUESTS` to your provider's RPM limit and `EMBEDDING_RATE_LIMIT_INTERVAL` to `60`. Use \~80–90% of your provider's advertised limit to leave headroom.

  **Example configurations for common provider tiers**

  These examples target embedding endpoints, such as OpenAI embedding models like `text-embedding-3-large`.

  <AccordionGroup>
    <Accordion title="OpenAI - Tier 1">
      ```dotenv theme={null}
      EMBEDDING_RATE_LIMIT_ENABLED="true"
      EMBEDDING_RATE_LIMIT_REQUESTS="2700"
      EMBEDDING_RATE_LIMIT_INTERVAL="60"
      ```
    </Accordion>

    <Accordion title="OpenAI - Free / Very Low Tier">
      ```dotenv theme={null}
      EMBEDDING_RATE_LIMIT_ENABLED="true"
      EMBEDDING_RATE_LIMIT_REQUESTS="180"
      EMBEDDING_RATE_LIMIT_INTERVAL="60"
      ```
    </Accordion>

    <Accordion title="Google Gemini - Free Tier">
      ```dotenv theme={null}
      EMBEDDING_RATE_LIMIT_ENABLED="true"
      EMBEDDING_RATE_LIMIT_REQUESTS="1350"
      EMBEDDING_RATE_LIMIT_INTERVAL="60"
      ```
    </Accordion>

    <Accordion title="Conservative Default">
      ```dotenv theme={null}
      EMBEDDING_RATE_LIMIT_ENABLED="true"
      EMBEDDING_RATE_LIMIT_REQUESTS="60"
      EMBEDDING_RATE_LIMIT_INTERVAL="60"
      ```
    </Accordion>
  </AccordionGroup>

  <Info>
    Always verify your exact tier limits in your provider's dashboard — limits vary by model, tier, and region. The examples above are approximations for common tiers and may change.
  </Info>
</Accordion>

<Accordion title="Testing and Development">
  ```dotenv theme={null}
  # Mock embeddings for testing (returns zero vectors)
  MOCK_EMBEDDING="true"
  ```
</Accordion>

<Accordion title="HUGGINGFACE_TOKENIZER environment variable">
  The `HUGGINGFACE_TOKENIZER` environment variable specifies which Hugging Face tokenizer to use for counting tokens before sending text to the embedding model. This is required when using the **Ollama** provider.

  **Value format**: The value is the Hugging Face model repository ID — the `{organization}/{model-name}` path that appears in the URL on [huggingface.co/models](https://huggingface.co/models). This should match the underlying model used by your Ollama embedding.

  For example, if the Ollama model `nomic-embed-text:latest` is built from `nomic-ai/nomic-embed-text-v1.5` on Hugging Face, set:

  ```dotenv theme={null}
  HUGGINGFACE_TOKENIZER="nomic-ai/nomic-embed-text-v1.5"
  ```

  ### Common model-to-tokenizer mappings

  | Ollama model                       | `HUGGINGFACE_TOKENIZER` value            | Dimensions |
  | ---------------------------------- | ---------------------------------------- | ---------- |
  | `nomic-embed-text:latest`          | `nomic-ai/nomic-embed-text-v1.5`         | 768        |
  | `bge-m3:latest`                    | `BAAI/bge-m3`                            | 1024       |
  | `mxbai-embed-large:latest`         | `mixedbread-ai/mxbai-embed-large-v1`     | 1024       |
  | `avr/sfr-embedding-mistral:latest` | `Salesforce/SFR-Embedding-Mistral`       | 4096       |
  | `all-minilm:latest`                | `sentence-transformers/all-MiniLM-L6-v2` | 384        |

  ### Finding the tokenizer for any model

  1. Look up the model on [huggingface.co/models](https://huggingface.co/models).
  2. The repository ID is the `{organization}/{model-name}` part of the URL (e.g., `huggingface.co/BAAI/bge-m3` → `BAAI/bge-m3`).
  3. Use the repository ID that corresponds to the model your Ollama tag is built from. The Ollama model page typically links to the original Hugging Face repository.

  <Note>
    `HUGGINGFACE_TOKENIZER` is only used by the Ollama embedding engine. It is not needed for OpenAI, Fastembed, `openai_compatible`, or other providers. For `openai_compatible`, the engine automatically tries the model name as a HuggingFace tokenizer ID and falls back to TikToken if unavailable — no separate `HUGGINGFACE_TOKENIZER` value is required.
  </Note>

  ## Important Notes

  * **Dimension Consistency**: `EMBEDDING_DIMENSIONS` must match your vector store collection schema
  * **API Key Fallback**: If `EMBEDDING_API_KEY` is not set, Cognee uses `LLM_API_KEY` (except for custom providers)
  * **Tokenization**: `HUGGINGFACE_TOKENIZER` is required for the Ollama provider — set it to the HuggingFace model repo ID that matches your embedding model
  * **Performance**: Local providers (Ollama, Fastembed) are slower but offer privacy and cost benefits
</Accordion>

<Columns cols={3}>
  <Card title="LLM Providers" icon="brain" href="/setup-configuration/llm-providers">
    Configure LLM providers for text generation
  </Card>

  <Card title="Vector Stores" icon="database" href="/setup-configuration/vector-stores">
    Set up vector databases for embedding storage
  </Card>

  <Card title="Overview" icon="settings" href="/setup-configuration/overview">
    Return to setup configuration overview
  </Card>
</Columns>
