BAML Integration

BAML (BoundaryML) is a domain-specific language (DSL) designed specifically for building applications with Large Language Models. Unlike traditional approaches that rely on string-based prompts and unreliable JSON parsing, BAML provides a structured, type-safe way to define LLM functions and ensure predictable outputs.

Why BAML? Let’s be honest - Instructor + LiteLLM feels dated. BAML addresses reliability challenges by providing a robust framework for structured LLM interactions, eliminating the need to manage prompts manually and ensuring everything stays in sync.

Key Benefits

Type Safety

Reliable OutputsStructured outputs eliminate parsing errors with declarative BAML syntax and runtime type validation.

Easy Prompt Management

No More BoilerplateAvoid boilerplate needed to manage prompts, read them, version them, and keep everything in sync.

Functional Paradigm

Clean ArchitectureEasily switch between different prompts in a functional paradigm with modularized templates.

Enterprise Ready

Production FeaturesAutomatic retries, error handling, streaming support, and dynamic model switching.

BAML vs Traditional Approaches

Problems with Traditional Approaches

BAML Advantages

BAML Features in Cognee

1. Knowledge Graph Extraction

Our BAML implementation includes templates for extracting structured knowledge graphs:

function ExtractContentGraphGeneric(
    content: string,
    mode: "simple" | "base" | "guided" | "strict" | "custom"?
) -> KnowledgeGraph {
    client OpenAI
    // Multiple extraction modes for different use cases
}

Multiple Extraction Modes:

Fast & High-Quality

# Simple mode for general use cases
graph = await LLMGateway.extract_content_graph(
    content="Apple Inc. was founded by Steve Jobs in Cupertino.",
    response_model=KnowledgeGraph,
    mode="simple"
)

# Fast extraction with good quality
# Best for: General content, quick processing

2. Content Classification

Content type detection with structured outputs:

class ContentLabel {
    content_type "text" | "audio" | "image" | "video" | "multimedia" | "3d_model" | "procedural"
    type string
    subclass string[]
}

function ClassifyContent(content: string) -> ContentLabel {
    client OpenAI
    // Automatic content type detection
}

3. Code Summarization

Specialized code understanding with structured analysis:

class SummarizedCode {
    high_level_summary string
    key_features string[]
    imports string[]
    classes SummarizedClass[]
    functions SummarizedFunction[]
    workflow_description string?
}

function SummarizeCode(code: string) -> SummarizedCode {
    client OpenAI
    // Structured code analysis
}

Configuration

Environment Setup

Configure BAML as your structured output framework:

# Set BAML as the structured output framework
export STRUCTURED_OUTPUT_FRAMEWORK=BAML

# Configure your LLM provider (use existing Cognee LLM config)
export LLM_PROVIDER=openai
export LLM_API_KEY=your_openai_api_key
export LLM_MODEL=gpt-4o-mini

Framework Selection Logic

Cognee automatically selects the appropriate framework based on your configuration:

Smart Framework Switching

def extract_content_graph(content: str, response_model: Type[BaseModel], mode: str = "simple"):
    llm_config = get_llm_config()
    
    if llm_config.structured_output_framework.upper() == "BAML":
        # Use BAML implementation with advanced features
        return extract_content_graph_baml(
            content=content, 
            response_model=response_model, 
            mode=mode
        )
    else:
        # Fall back to Instructor implementation
        return extract_content_graph_instructor(
            content=content, 
            response_model=response_model
        )

Developer Experience

Type Safety Benefits

Compile-time and Runtime SafetyBAML provides comprehensive type safety:

Compile-time validation: Catch errors before runtime
Runtime type checking: Ensure outputs match expected schemas
IDE integration: Full autocomplete and error highlighting
Generated clients: Type-safe Python clients with async/sync support

# Type-safe function calls
result: KnowledgeGraph = await baml.ExtractContentGraphGeneric(
    content="Apple Inc. develops innovative products",
    mode="simple"
)

# IDE knows the exact type and provides autocomplete
for entity in result.entities:  # ✅ Type-safe access
    print(f"Entity: {entity.name} ({entity.type})")

Prompt Management

Declarative ApproachNo more manual prompt management:

function ExtractEntities(content: string) -> EntityList {
    client OpenAI
    prompt #"
        Extract entities from the following content:
        
        Content: {{ content }}
        
        Focus on:
        - Named entities (people, places, organizations)
        - Key concepts and topics
        - Relationships between entities
        
        Return structured data with confidence scores.
    "#
}

Benefits:

Version control: Prompts are versioned with your code
Testing: Built-in testing framework for prompt validation
Modularity: Reusable prompt templates and components
No boilerplate: Framework handles all the complexity

Error Handling & Retries

Robust ExecutionBAML provides built-in reliability features:

Automatic retries: Configurable retry logic for failed calls
Error handling: Structured error responses and recovery
Streaming support: Handle large outputs efficiently
Fallback strategies: Multiple model/provider fallbacks

# Automatic retry configuration
@baml.retry(max_attempts=3, backoff_factor=2)
async def reliable_extraction(content: str):
    return await baml.ExtractContentGraphGeneric(
        content=content,
        mode="strict"
    )

Getting Started with BAML

Configure Framework

Set BAML as your structured output framework in environment variables.

Set LLM Provider

Configure your preferred LLM provider (OpenAI, Anthropic, etc.) for BAML.

Choose Extraction Mode

Select the appropriate extraction mode for your use case (simple, guided, strict, custom).

Extract Knowledge

Use BAML-powered extraction in your Cognee pipeline for reliable results.

Configuration Example

# Set BAML as the structured output framework
export STRUCTURED_OUTPUT_FRAMEWORK=BAML

# Use existing Cognee LLM configuration
export LLM_PROVIDER=openai
export LLM_API_KEY=your_openai_api_key
export LLM_MODEL=gpt-4o-mini

Advanced BAML Features

Flexible Processing

from cognee.infrastructure.llm import LLMGateway

# Compare different extraction modes
content = "Tesla revolutionized electric vehicles with the Model S, Model 3, and Cybertruck."

# Simple mode - fast and reliable
simple_result = await LLMGateway.extract_content_graph(
    content=content,
    response_model=KnowledgeGraph,
    mode="simple"
)

# Strict mode - conservative extraction
strict_result = await LLMGateway.extract_content_graph(
    content=content,
    response_model=KnowledgeGraph,
    mode="strict"
)

# Guided mode - detailed instructions
guided_result = await LLMGateway.extract_content_graph(
    content=content,
    response_model=KnowledgeGraph,
    mode="guided"
)

print(f"Simple mode: {len(simple_result.entities)} entities")
print(f"Strict mode: {len(strict_result.entities)} entities")
print(f"Guided mode: {len(guided_result.entities)} entities")

Integration with Cognee Pipeline

Automatic Integration

Seamless Framework UsageWhen BAML is configured, Cognee automatically uses it for all structured extraction:

import cognee

# Set BAML as framework
os.environ["STRUCTURED_OUTPUT_FRAMEWORK"] = "BAML"

# Regular Cognee operations now use BAML
await cognee.add("Your data here")
await cognee.cognify()  # Uses BAML for entity extraction

# Search results are enhanced by BAML's structured extraction
results = await cognee.search("your query")

Manual BAML Usage

Direct BAML CallsUse BAML directly for specific extraction needs:

from cognee.infrastructure.llm import LLMGateway

# Direct BAML extraction
content = "Complex document content..."

# Extract knowledge graph
graph = await LLMGateway.extract_content_graph(
    content=content,
    response_model=KnowledgeGraph,
    mode="guided"
)

# Add extracted graph to Cognee
await cognee.add_structured_data(graph)

Hybrid Approach

Best of Both WorldsCombine BAML with other frameworks:

# Use BAML for complex extraction
baml_graph = await LLMGateway.extract_content_graph(
    content=complex_content,
    mode="guided"
)

# Use Instructor for simple validation
validated_data = await instructor_validate(simple_content)

# Combine results in Cognee
await cognee.add_structured_data([baml_graph, validated_data])

BAML Advantages in Practice

Reliability

Production Ready

# No more JSON parsing errors
try:
    graph = await baml.ExtractContentGraphGeneric(
        content=content, mode="strict"
    )
    # Always get properly typed results
except Exception as e:
    # Structured error handling
    print(f"Extraction failed: {e}")

Maintainability

Easy Updates

// Update prompts in BAML files
function ExtractEntities(content: string) -> EntityList {
    client OpenAI
    prompt #"
        Updated extraction logic here...
    "#
}
// No code changes needed!

Testing

Built-in Testing

# Test BAML functions
test_result = await baml.test_function(
    function_name="ExtractContentGraphGeneric",
    test_cases=[
        {"content": "test input", "expected_entities": 3}
    ]
)

Performance

Optimized Execution

# Streaming support for large content
async for chunk in baml.stream_extraction(
    content=large_document,
    mode="simple"
):
    await process_chunk(chunk)

Troubleshooting

Configuration Issues

Common Setup ProblemsBAML not being used:

# Check framework configuration
import os
framework = os.getenv("STRUCTURED_OUTPUT_FRAMEWORK")
print(f"Current framework: {framework}")

# Ensure BAML is set
if framework != "BAML":
    os.environ["STRUCTURED_OUTPUT_FRAMEWORK"] = "BAML"
    print("✅ BAML framework activated")

Performance Issues

Quality Issues

Looking Forward

Future BAML Features: We’re working on exciting enhancements including:

On-the-fly recompilation: Update BAML functions without restarts
Search customization: BAML-powered search result enhancement
Dynamic pipelines: Runtime pipeline modification with BAML
Advanced modeling: More sophisticated knowledge graph models

Why We Love BAML

IDE Integration & Type SafetyThe BAML IDE client is a pleasure to work with:

Full autocomplete and IntelliSense
Real-time error checking
Integrated testing and debugging
Visual prompt editing

Next Steps

BAML Documentation

Official BAML DocsExplore the complete BAML documentation for advanced features and concepts.

Structured Output Overview

Framework ComparisonCompare BAML with other structured output frameworks in Cognee.

Custom Tasks

Advanced IntegrationCreate custom tasks that leverage BAML’s structured output capabilities.

Configuration

Advanced SetupConfigure advanced BAML settings and optimization parameters.

Ready to try BAML in Cognee? Set STRUCTURED_OUTPUT_FRAMEWORK=BAML in your environment and start building more reliable AI applications today!

Getting Started

Core Concepts

Examples

CLI

Python SDK

UI

BAML

BAML Integration

Key Benefits

Type Safety

Easy Prompt Management

Functional Paradigm

Enterprise Ready

BAML vs Traditional Approaches

BAML Features in Cognee

1. Knowledge Graph Extraction

2. Content Classification

3. Code Summarization

Configuration

Environment Setup

Framework Selection Logic

Developer Experience

Getting Started with BAML

Configuration Example

Advanced BAML Features

Integration with Cognee Pipeline

BAML Advantages in Practice

Reliability

Maintainability

Testing

Performance

Troubleshooting

Looking Forward

Why We Love BAML

Next Steps

BAML Documentation

Structured Output Overview

Custom Tasks

Configuration

Getting Started

Core Concepts

Examples

CLI

Python SDK

UI

​BAML Integration

​Key Benefits

Type Safety

Easy Prompt Management

Functional Paradigm

Enterprise Ready

​BAML vs Traditional Approaches

​BAML Features in Cognee

​1. Knowledge Graph Extraction

​2. Content Classification

​3. Code Summarization

​Configuration

​Environment Setup

​Framework Selection Logic

​Developer Experience

​Getting Started with BAML

​Configuration Example

​Advanced BAML Features

​Integration with Cognee Pipeline

​BAML Advantages in Practice

Reliability

Maintainability

Testing

Performance

​Troubleshooting

​Looking Forward

​Why We Love BAML

​Next Steps

BAML Documentation

Structured Output Overview

Custom Tasks

Configuration

BAML Integration

Key Benefits

BAML vs Traditional Approaches

BAML Features in Cognee

1. Knowledge Graph Extraction

2. Content Classification

3. Code Summarization

Configuration

Environment Setup

Framework Selection Logic

Developer Experience

Getting Started with BAML

Configuration Example

Advanced BAML Features

Integration with Cognee Pipeline

BAML Advantages in Practice

Troubleshooting

Looking Forward

Why We Love BAML

Next Steps