Translate non-English content before building the knowledge graph
A minimal guide to enabling translation during ingestion. Cognee includes a built-in translation pipeline that detects languages and translates content before graph extraction, so non-English documents are indexed as English knowledge.Before you start:
Complete Quickstart to understand basic operations
Insert translate_content as a pipeline task between chunk extraction and graph building:
import asyncioimport osimport cogneefrom cognee.infrastructure.llm import get_max_chunk_tokensfrom cognee.tasks.documents import classify_documents, extract_chunks_from_documentsfrom cognee.shared.data_models import KnowledgeGraphfrom cognee.tasks.translation import translate_contentfrom cognee.modules.pipelines import Task, run_pipelinefrom cognee.tasks.graph import extract_graph_from_datafrom cognee.tasks.storage import add_data_points# the translated text is in data_chunks[].text, async def drop_translation_metadata(data_chunks): for chunk in data_chunks: chunk.contains = None return data_chunksasync def main(): await cognee.prune.prune_data() await cognee.prune.prune_system(metadata=True) text_fr = "La mémoire artificielle permet aux agents IA de retenir des informations complexes." tasks = [ Task(classify_documents), Task(extract_chunks_from_documents, max_chunk_size=get_max_chunk_tokens()), Task(translate_content, target_language="en", translation_provider="llm"), Task(drop_translation_metadata), Task(extract_graph_from_data, graph_model=KnowledgeGraph), Task(add_data_points), ] async for _ in run_pipeline(tasks=tasks, datasets=["multilingual"]): pass visualize_graph_path = os.path.join( os.path.dirname(__file__), ".artifacts", "multilingual.html" ) await cognee.visualize_graph(visualize_graph_path)asyncio.run(main())
translate_content mutates chunks in-place: chunk.text is replaced with the translation and the original is preserved in a TranslatedContent data point attached to the chunk.
TRANSLATION_PROVIDER=azureAZURE_TRANSLATOR_KEY=your_keyAZURE_TRANSLATOR_REGION=eastus# Endpoint defaults to https://api.cognitive.microsofttranslator.comAZURE_TRANSLATOR_ENDPOINT=https://api.cognitive.microsofttranslator.com