Instructor Integration

Instructor enables structured outputs using Pydantic models with any LLM provider. While BAML is our recommended framework for new projects, Instructor remains fully supported for existing applications and Python developers who prefer familiar Pydantic-based validation.
Instructor provides a Python-native approach to structured outputs using Pydantic models, making it familiar for Python developers but requiring more manual prompt and validation management.

Key Features

Pydantic Models

Familiar SyntaxDefine outputs using familiar Python data classes and Pydantic validation.

Validation

Automatic ValidationAutomatic validation with detailed error messages and retry logic.

OpenAI Compatible

Function CallingWorks with OpenAI function calling and other compatible providers.

Python Native

Developer FriendlyGood for Python developers familiar with Pydantic and type hints.

Framework Status

Current Status: Instructor is fully supported but not recommended for new projects. Consider migrating to BAML for better reliability, maintainability, and reduced boilerplate.

Why We Recommend BAML Instead

Instructor Limitations

Configuration

Enable Instructor Framework
# Set Instructor as structured output framework
export STRUCTURED_OUTPUT_FRAMEWORK=instructor

# Use with existing LLM configuration
export LLM_PROVIDER=openai
export LLM_API_KEY=your_openai_api_key
export LLM_MODEL=gpt-4o-mini
import cognee
import os

# Configure Instructor
os.environ["STRUCTURED_OUTPUT_FRAMEWORK"] = "instructor"

# Cognee automatically uses Instructor for structured extraction
await cognee.add("Your content here")
await cognee.cognify()  # Uses Instructor + Pydantic models

Usage Examples

import cognee
import os
from pydantic import BaseModel

# Configure Instructor
os.environ["STRUCTURED_OUTPUT_FRAMEWORK"] = "instructor"

class SimpleEntity(BaseModel):
    name: str
    type: str

# Use with Cognee
await cognee.add("Apple Inc. develops innovative technology products.")
await cognee.cognify()  # Uses Instructor for extraction

# Search results benefit from structured extraction
results = await cognee.search("technology companies")
print(f"Found {len(results)} results")

Integration with Cognee

Automatic Integration

Migration to BAML

1

Assess Current Usage

Evaluate Your SetupReview your current Instructor implementation and identify pain points.
2

Test BAML

Try the Modern ApproachSet up BAML in a test environment and compare results with your Instructor setup.
3

Gradual Migration

Incremental TransitionMigrate one component at a time, starting with the most problematic areas.
4

Validate Results

Quality AssuranceEnsure BAML provides equal or better results before full migration.

Migration Example

import instructor
from openai import OpenAI
from pydantic import BaseModel

class EntityModel(BaseModel):
    entities: list[str]
    relationships: list[dict]

client = instructor.from_openai(OpenAI())

# Manual prompt management
result = client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=EntityModel,
    messages=[{
        "role": "system",
        "content": "Extract entities and relationships from the text"
    }, {
        "role": "user", 
        "content": f"Text: {content}"
    }]
)

Best Practices

Model Design

Pydantic Best Practices
  • Use descriptive field names and docstrings
  • Add validation constraints with Field()
  • Include custom validators for complex rules
  • Use type hints for better IDE support

Error Handling

Robust Validation
  • Handle ValidationError exceptions gracefully
  • Use retry logic for transient failures
  • Provide fallback strategies for validation failures
  • Log validation issues for debugging

Troubleshooting

Common Issues

Next Steps

Legacy Framework: While Instructor is fully supported, we recommend BAML for new projects due to its superior prompt management, type safety, and developer experience.