Skip to main content
A minimal guide to enabling translation during ingestion. Cognee includes a built-in translation pipeline that detects languages and translates content before graph extraction, so non-English documents are indexed as English knowledge. Before you start:
  • Complete Quickstart to understand basic operations
  • Ensure you have LLM Providers configured
  • Have non-English text or documents to process

What Translation Does

  • Detects language automatically using the langdetect library
  • Skips chunks already in the target language
  • Translates using one of three providers: llm (default), google, or azure
  • Stores original text alongside the translation in the knowledge graph

Configuration

Set these environment variables in your .env file:
# Provider: "llm" (default), "google", or "azure"
TRANSLATION_PROVIDER=llm

# Target language ISO 639-1 code (default: "en")
TARGET_LANGUAGE=en

# Minimum detection confidence to trigger translation (default: 0.8)
CONFIDENCE_THRESHOLD=0.8
The llm provider uses your existing LLM configuration — no additional keys needed.

Using Translation in a Pipeline

Insert translate_content as a pipeline task between chunk extraction and graph building:
import asyncio
import os
import cognee
from cognee.infrastructure.llm import get_max_chunk_tokens
from cognee.tasks.documents import classify_documents, extract_chunks_from_documents
from cognee.shared.data_models import KnowledgeGraph
from cognee.tasks.translation import translate_content
from cognee.modules.pipelines import Task, run_pipeline
from cognee.tasks.graph import extract_graph_from_data
from cognee.tasks.storage import add_data_points

# the translated text is in data_chunks[].text, 
async def drop_translation_metadata(data_chunks):
    for chunk in data_chunks:
        chunk.contains = None
    return data_chunks


async def main():
    await cognee.prune.prune_data()
    await cognee.prune.prune_system(metadata=True)
    text_fr = "La mémoire artificielle permet aux agents IA de retenir des informations complexes."

    tasks = [
        Task(classify_documents),
        Task(extract_chunks_from_documents, max_chunk_size=get_max_chunk_tokens()),
        Task(translate_content, target_language="en", translation_provider="llm"),
        Task(drop_translation_metadata),
        Task(extract_graph_from_data, graph_model=KnowledgeGraph),
        Task(add_data_points),
    ]

    async for _ in run_pipeline(tasks=tasks, datasets=["multilingual"]):
        pass

    visualize_graph_path = os.path.join(
        os.path.dirname(__file__), ".artifacts", "multilingual.html"
    )
    await cognee.visualize_graph(visualize_graph_path)

asyncio.run(main())
translate_content mutates chunks in-place: chunk.text is replaced with the translation and the original is preserved in a TranslatedContent data point attached to the chunk.

Additional Information

For one-off translation without a pipeline, use translate_text:
from cognee.tasks.translation import translate_text

result = await translate_text("Bonjour le monde!", target_language="en")
print(result.translated_text)   # "Hello world!"
print(result.source_language)   # "fr"

LLM Provider (default)

Uses your existing LLM — no extra configuration needed. Works with any provider configured via LLM_PROVIDER and LLM_API_KEY.
TRANSLATION_PROVIDER=llm
Requires the google-cloud-translate package and a Google Cloud project.
pip install google-cloud-translate
TRANSLATION_PROVIDER=google
GOOGLE_TRANSLATE_API_KEY=your_api_key
GOOGLE_PROJECT_ID=your_project_id
Requires an Azure Cognitive Services resource.
TRANSLATION_PROVIDER=azure
AZURE_TRANSLATOR_KEY=your_key
AZURE_TRANSLATOR_REGION=eastus
# Endpoint defaults to https://api.cognitive.microsofttranslator.com
AZURE_TRANSLATOR_ENDPOINT=https://api.cognitive.microsofttranslator.com
VariableDefaultDescription
TRANSLATION_BATCH_SIZE10Chunks per translation batch
TRANSLATION_MAX_RETRIES3Retry attempts on failure
TRANSLATION_TIMEOUT_SECONDS30Request timeout

Custom Pipelines

Learn to build custom task pipelines

LLM Providers

Configure your LLM provider

Core Concepts

Understand knowledge graph fundamentals