> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cognee.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Structured Output Backends

> Configure structured output frameworks for reliable data extraction in Cognee

Structured output backends ensure reliable data extraction from LLM responses. Cognee supports three frameworks that convert LLM text into structured Pydantic models for knowledge graph extraction and other tasks.

<Info>
  **New to configuration?**

  See the [Setup Configuration Overview](./overview) for the complete workflow:

  install extras → create `.env` → choose providers → handle pruning.
</Info>

## Supported Frameworks

Cognee supports three structured output approaches:

* **LiteLLM + Instructor** — Provider-agnostic client with Pydantic coercion (default)
* **BAML** — DSL-based framework with type registry and guardrails
* **LiteLLM Native** — Validates responses into Pydantic models using LiteLLM's own `response_format`, without the `instructor` dependency (opt-in)

All three frameworks produce the same Pydantic-validated outputs, so your application code remains unchanged regardless of which backend you choose.

## How It Works

Cognee uses a unified interface that abstracts the underlying framework:

```python theme={null}
from cognee.infrastructure.llm.LLMGateway import LLMGateway
await LLMGateway.acreate_structured_output(text, system_prompt, response_model)
```

The `STRUCTURED_OUTPUT_FRAMEWORK` environment variable determines which backend processes your requests, but the API remains identical.

## Configuration

<Tabs>
  <Tab title="LiteLLM + Instructor (Default)">
    The default framework — no extra install needed. Uses LiteLLM and the `instructor` library to coerce LLM responses into Pydantic models.

    ```dotenv theme={null}
    STRUCTURED_OUTPUT_FRAMEWORK=instructor
    ```

    Optionally, control how the model is prompted for structured output:

    ```dotenv theme={null}
    # Override instructor mode (e.g. json_mode, tool_call, markdown_json_mode)
    # Leave unset to use the provider's default — see "Instructor Modes" below.
    LLM_INSTRUCTOR_MODE=json_schema_mode
    ```
  </Tab>

  <Tab title="BAML">
    BAML is an alternative structured output framework that uses a DSL-based type registry to extract data. It is particularly useful when small local models (such as Ollama models like `llama3.1:8b` or `qwen3.5:0.8b`) struggle to produce valid structured output with instructor, causing repeated `InstructorRetryException` errors.

    **Installation**: BAML requires a separate install:

    ```bash theme={null}
    pip install "cognee[baml]"
    ```

    **Configuration**: BAML uses its own LLM settings, independent of the main `LLM_*` variables:

    ```dotenv theme={null}
    STRUCTURED_OUTPUT_FRAMEWORK=baml

    # BAML-specific LLM settings (required)
    BAML_LLM_PROVIDER=openai
    BAML_LLM_MODEL=gpt-4o-mini
    BAML_LLM_API_KEY=sk-...

    # Optional BAML overrides
    # BAML_LLM_ENDPOINT=https://api.openai.com/v1
    # BAML_LLM_API_VERSION=
    # BAML_LLM_TEMPERATURE=0.0
    ```

    `BAML_LLM_PROVIDER` and `BAML_LLM_MODEL` accept the same provider names and model identifiers as the main LLM configuration. You can point BAML at a different model than your main LLM — for example, use a small Ollama model for general text generation while routing structured extraction through a cloud model.

    <Warning>
      If `STRUCTURED_OUTPUT_FRAMEWORK=baml` is set but the `cognee[baml]` extra is not installed, Cognee will raise an `ImportError` on startup. Run `pip install "cognee[baml]"` to resolve it.
    </Warning>

    <Accordion title="Using BAML for Small Local Models">
      Small Ollama models (e.g. `llama3.1:8b`, `qwen3.5:0.8b`) often fail to produce valid JSON-structured output when using the default instructor backend, resulting in repeated `InstructorRetryException` errors during `cognify` for types like `KnowledgeGraph` or `SummarizedContent`.

      Switching to BAML bypasses instructor entirely and uses BAML's own extraction pipeline, which is more forgiving with smaller models:

      ```dotenv theme={null}
      # Main LLM — small local model via Ollama
      LLM_PROVIDER=ollama
      LLM_MODEL=llama3.1:8b
      LLM_ENDPOINT=http://localhost:11434/v1
      LLM_API_KEY=ollama

      # Use BAML for structured extraction (can point to a different, more capable model)
      STRUCTURED_OUTPUT_FRAMEWORK=baml
      BAML_LLM_PROVIDER=openai
      BAML_LLM_MODEL=gpt-4o-mini
      BAML_LLM_API_KEY=sk-...
      ```

      See the [Local Setup guide](/guides/local-setup) for a complete Ollama configuration including embeddings.
    </Accordion>
  </Tab>

  <Tab title="LiteLLM Native">
    LiteLLM Native is an opt-in framework that validates responses into Pydantic models using LiteLLM's own `response_format`, **without** the `instructor` library. No extra install is needed beyond the base package.

    ```dotenv theme={null}
    STRUCTURED_OUTPUT_FRAMEWORK=litellm_native
    ```

    It reuses the standard `LLM_*` settings (provider, model, API key, endpoint) — no separate configuration block. Behavior depends on the provider:

    * **Schema-native path** — For providers LiteLLM reports as schema-capable (OpenAI, Azure, Gemini, Mistral, Bedrock, and others), your Pydantic model is passed directly as `response_format` and the returned JSON is validated against it.
    * **JSON-object fallback** — For providers without native schema support (Ollama, llama.cpp, custom OpenAI-compatible endpoints), Cognee requests a JSON object, injects the model's JSON Schema into the prompt, and validates the result. On a validation failure it feeds the error back to the model and retries so it can self-correct, up to 3 attempts before raising the last validation error.

    Routing between the two paths is automatic per model (via `litellm.supports_response_schema`); you don't select the path yourself.

    <Note>
      The default framework remains `instructor` — setting `STRUCTURED_OUTPUT_FRAMEWORK=litellm_native` is required to opt in. Transient errors (including rate limits) are retried with backoff, while budget/quota exhaustion is surfaced as `LLMPaymentRequiredError` and is **not** retried. Content-policy violations fall back only when the standard fallback settings are configured (at minimum `FALLBACK_MODEL` and `FALLBACK_API_KEY`); otherwise Cognee raises `ContentPolicyFilterError` immediately.
    </Note>
  </Tab>
</Tabs>

## Instructor Modes

When `STRUCTURED_OUTPUT_FRAMEWORK=instructor`, the **instructor mode** controls *how* Cognee asks the model for structured output — for example via the model's native JSON-schema response, a plain JSON object, or a tool/function call. The value of `LLM_INSTRUCTOR_MODE` is passed directly to the `instructor` library's `Mode`, so it must be one of instructor's supported mode strings.

`LLM_INSTRUCTOR_MODE` is **empty by default**. When it is unset, Cognee either applies a provider-specific mode or defers to the underlying Instructor/LiteLLM default, so in most cases you don't need to set it at all:

| `LLM_PROVIDER`                                              | Behavior when `LLM_INSTRUCTOR_MODE` is unset |
| ----------------------------------------------------------- | -------------------------------------------- |
| `openai`, `azure` with `gpt-5` models                       | `json_schema_mode`                           |
| `openai`, `azure` with other models                         | use Instructor/LiteLLM default               |
| AWS Bedrock                                                 | `json_schema_mode`                           |
| `ollama`, `gemini`, `custom` (OpenAI-compatible), llama.cpp | `json_mode`                                  |
| `anthropic`                                                 | `anthropic_tools`                            |
| `mistral`                                                   | `mistral_tools`                              |

Common values you can set explicitly include `json_schema_mode`, `json_mode`, `tool_call`, and `markdown_json_mode`.

<Tip>
  **Which mode for OpenAI models (e.g. `gpt-5-mini`)?** Leave `LLM_INSTRUCTOR_MODE` unset, or set `json_schema_mode` — Cognee applies `json_schema_mode` to `gpt-5` models, and it is the recommended mode for OpenAI models that support native JSON-schema responses. Only override it when you point Cognee at a custom or local OpenAI-compatible endpoint that rejects JSON-schema responses; in that case try `json_mode` first, then `markdown_json_mode` or `tool_call`.
</Tip>

## Setting Structured Output in a Script

You don't have to use `.env` — the same settings can be configured directly in Python. Both the framework and the instructor mode are attributes on the internal `LLMConfig`.

<Tabs>
  <Tab title="set_llm_config">
    Pass the exact attribute names (`structured_output_framework`, `llm_instructor_mode`) to `cognee.config.set_llm_config()`:

    ```python theme={null}
    import cognee

    cognee.config.set_llm_config({
        "structured_output_framework": "instructor",  # or "baml", "litellm_native"
        "llm_instructor_mode": "json_mode",           # any instructor Mode string
    })
    ```
  </Tab>

  <Tab title="os.environ">
    Set the environment variables **before** importing cognee:

    ```python theme={null}
    import os

    os.environ["STRUCTURED_OUTPUT_FRAMEWORK"] = "instructor"
    os.environ["LLM_INSTRUCTOR_MODE"] = "json_mode"

    import cognee  # reads the variables on first config access
    ```
  </Tab>
</Tabs>

<Warning>
  Switching to **BAML** at runtime via `set_llm_config()` does not initialize BAML's client registry, which is built when the config is first constructed. To use BAML, set `STRUCTURED_OUTPUT_FRAMEWORK=baml` (and the `BAML_LLM_*` variables) in `.env` or via `os.environ` **before** importing cognee.
</Warning>

## Important Notes

* **Unified Interface**: Your application code uses the same `acreate_structured_output()` call regardless of framework
* **Provider Flexibility**: LiteLLM + Instructor and LiteLLM Native reuse the standard `LLM_*` provider settings; BAML uses its own `BAML_LLM_*` block
* **Output Consistency**: All three produce Pydantic-validated results
* **Performance**: Framework choice doesn't significantly impact performance

## Troubleshooting

<Accordion title="`1 validation error for Response: content Input should be a valid string ... input_type=dict`">
  This error appears during `recall()` / `search()` with completion search types such as `GRAPH_COMPLETION`, `GRAPH_SUMMARY_COMPLETION`, `GRAPH_COMPLETION_COT`, and `RAG_COMPLETION`.

  **Cause.** These search types ask the LLM for a plain-text answer (the retriever uses `response_model=str`). When the configured instructor mode doesn't match what your model/provider actually supports, the model wraps its answer in a JSON object instead of returning plain text. The `instructor` backend then can't coerce that `dict` into the expected string field, so Pydantic raises `Input should be a valid string ... input_type=dict`. This is common with OpenAI-compatible, custom, and local (Ollama / LM Studio) endpoints.

  **Fixes:**

  * **Align the instructor mode with your provider.** OpenAI/Azure `gpt-4o`/`gpt-5` models work with the default `json_schema_mode`. Endpoints that don't support JSON-schema responses usually need a different mode:
    ```dotenv theme={null}
    # Try one of these for OpenAI-compatible / local endpoints
    LLM_INSTRUCTOR_MODE=json_mode
    # LLM_INSTRUCTOR_MODE=markdown_json_mode
    # LLM_INSTRUCTOR_MODE=tool_call
    ```
  * **Switch to BAML** if a small/local model keeps wrapping answers in JSON. BAML bypasses instructor's coercion and is more forgiving of loose model output:
    ```dotenv theme={null}
    STRUCTURED_OUTPUT_FRAMEWORK=baml
    BAML_LLM_PROVIDER=openai
    BAML_LLM_MODEL=gpt-4o-mini
    BAML_LLM_API_KEY=sk-...
    # For local/OpenAI-compatible endpoints:
    # BAML_LLM_ENDPOINT=http://localhost:11434/v1
    # BAML_LLM_API_KEY=ollama
    ```
  * **Skip the LLM completion step** to confirm retrieval works independently of model output formatting. Pass `only_context=True` to return the retrieved context directly — see [Search Basics](/guides/search-basics). If retrieval succeeds with `only_context=True`, the problem is the structured-output configuration above, not your graph.
</Accordion>

<Accordion title="`AttributeError: type object 'str' has no attribute 'model_json_schema'` (Gemini)">
  This error is raised from `GeminiAdapter.acreate_structured_output` during `recall()` / `search()` with completion search types (`GRAPH_COMPLETION`, `RAG_COMPLETION`, etc.).

  **Cause.** These search types request a plain-text answer with `response_model=str`. Gemini's default instructor mode is `json_mode`, which handles `str` correctly. If you override `LLM_INSTRUCTOR_MODE` with a schema- or tool-based mode (`json_schema_mode`, `tool_call`, `mistral_tools`, …), Instructor tries to call `str.model_json_schema()` — a method that only exists on Pydantic models — and crashes.

  **Fix — force `json_mode` for Gemini.** Either leave `LLM_INSTRUCTOR_MODE` unset so Gemini falls back to its `json_mode` default, or set it explicitly:

  ```dotenv theme={null}
  LLM_PROVIDER="gemini"
  LLM_MODEL="gemini/gemini-2.0-flash"
  LLM_API_KEY="AIza..."
  LLM_INSTRUCTOR_MODE="json_mode"
  ```

  To set it in-script, pass it through `set_llm_config` — `llm_instructor_mode` is a valid key on the LLM config:

  ```python theme={null}
  import cognee

  cognee.config.set_llm_config({"llm_instructor_mode": "json_mode"})
  ```

  <Warning>
    `cognee.config.set("llm_instructor_mode", "json_mode")` raises `InvalidConfigAttributeError: 'llm_instructor_mode' is not a valid attribute of the configuration` — the generic `config.set()` only accepts a fixed set of keys. Use `cognee.config.set_llm_config({...})` (or the `LLM_INSTRUCTOR_MODE` env var) instead.
  </Warning>

  See [LLM Instructor Modes](/setup-configuration/llm-providers#llm-instructor-modes) for the full list of modes and per-provider defaults.
</Accordion>

<Accordion title="`ValueError: Unsupported type for BAML mapping: str | None`">
  This error is raised with `STRUCTURED_OUTPUT_FRAMEWORK=baml` while BAML builds a dynamic type for a response model that has a PEP 604 optional field (`X | None`). In practice it surfaces during `recall()` / `search()` with completion search types such as `GRAPH_COMPLETION`, because the completion response model contains `str | None` fields — so recall fails on local/Ollama + BAML setups.

  **Cause.** BAML's dynamic type builder previously recognized only `typing.Union` / `typing.Optional`. PEP 604 unions written as `X | None` have a different origin (`types.UnionType`), so they missed the Optional/Union branch and fell through to the unsupported-type error. `typing.Optional[str]` worked; the equivalent `str | None` did not.

  **Fix.** Upgrade Cognee — BAML now maps PEP 604 `X | None` unions the same way it maps `typing.Optional`, so optional fields work with either syntax. No configuration change is required.
</Accordion>

<Columns cols={3}>
  <Card title="LLM Providers" icon="brain" href="/setup-configuration/llm-providers">
    Configure LLM providers for text generation
  </Card>

  <Card title="Overview" icon="settings" href="/setup-configuration/overview">
    Return to setup configuration overview
  </Card>

  <Card title="Custom Prompts" icon="text-wrap" href="/guides/custom-prompts">
    Learn about custom prompt configuration
  </Card>
</Columns>