A minimal guide to enabling translation during ingestion. Cognee includes a built-in translation pipeline that detects languages and translates content before graph extraction, so non-English documents are indexed as English knowledge. Before you start:Documentation Index
Fetch the complete documentation index at: https://docs.cognee.ai/llms.txt
Use this file to discover all available pages before exploring further.
- Complete Quickstart to understand basic operations
- Ensure you have LLM Providers configured
- Have non-English text or documents to process
What Translation Does
- Detects language automatically using the
langdetectlibrary - Skips chunks already in the target language
- Translates using one of three providers:
llm(default),google, orazure - Stores original text alongside the translation in the knowledge graph
Configuration
Set these environment variables in your.env file:
llm provider uses your existing LLM configuration — no additional keys needed.
Using Translation in a Pipeline
Inserttranslate_content as a pipeline task between chunk extraction and graph building:
translate_content mutates chunks in-place: chunk.text is replaced with the translation and the original is preserved in a TranslatedContent data point attached to the chunk.Additional Information
Translating Individual Strings
Translating Individual Strings
For one-off translation without a pipeline, use
translate_text:Choosing a Provider
Choosing a Provider
All three providers translate non-English chunks to your
Supported languages: detection uses
TARGET_LANGUAGE. Pick based on cost, setup, and quality trade-offs:| Provider | Setup | Cost | Best for |
|---|---|---|---|
llm (default) | None — reuses your LLM config | Per-token LLM usage; higher quality, slower | Mixed/long-form documents where context-aware translation matters |
google | Install google-cloud-translate, Google Cloud project | Per-character pricing; fast batch translation | High-volume ingestion across many languages |
azure | Azure Cognitive Services key + region | Per-character pricing; fast batch translation | Enterprise deployments already on Azure |
langdetect (~55 languages). The llm provider supports any language the underlying model handles. Google Translate and Azure Translator each support 130+ language codes, including locale-specific variants such as zh-CN and zh-TW — see the Google Cloud Translation language list and Azure Translator language list for the full set.Set TRANSLATION_PROVIDER in .env to switch — no code changes required.Provider-Specific Setup
Provider-Specific Setup
LLM Provider (default)
LLM Provider (default)
Uses your existing LLM — no extra configuration needed. Works with any provider configured via
LLM_PROVIDER and LLM_API_KEY.Google Cloud Translation
Google Cloud Translation
Requires the
google-cloud-translate package and a Google Cloud project.Azure Translator
Azure Translator
Requires an Azure Cognitive Services resource.
Advanced Options
Advanced Options
| Variable | Default | Description |
|---|---|---|
TRANSLATION_BATCH_SIZE | 10 | Chunks per translation batch |
TRANSLATION_MAX_RETRIES | 3 | Retry attempts on failure |
TRANSLATION_TIMEOUT_SECONDS | 30 | Request timeout |
Custom Pipelines
Learn to build custom task pipelines
LLM Providers
Configure your LLM provider
Core Concepts
Understand knowledge graph fundamentals