Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.cognee.ai/llms.txt

Use this file to discover all available pages before exploring further.

The global context index is an optional summary layer that helps Cognee answer questions that depend on the broader shape of a dataset, not only the closest graph facts. Normal graph retrieval is local: Cognee searches for graph edges, chunks, summaries, and entities that match the query. That works well when the answer is near a few specific facts. The global context index adds a higher-level map: semantic buckets of TextSummary nodes and a root summary of the dataset.

Why use it

Use the global context index when answers often depend on document-wide or dataset-wide context:
  • long documents where important details are spread across chapters or sections
  • evolving conversations where the final state depends on earlier updates
  • project memory where the answer needs the overall plan, risks, and current status
  • policy or research corpora where local facts need broader framing
It is most useful when you want retrieval to include both:
  • local evidence from the graph
  • global orientation from compact dataset summaries

How it works

During normal ingestion and enrichment, Cognee creates DocumentChunk and TextSummary datapoints. The global context index adds the GlobalContextSummary layers shown inside the dashed lines:
--------------------------------------------------
root GlobalContextSummary
  -> optional higher-level GlobalContextSummary bucket
    -> GlobalContextSummary bucket
--------------------------------------------------
      -> TextSummary
        -> DocumentChunk
The build process is bottom-up, starting from TextSummary nodes. The retrieval hierarchy is top-down, starting from the root summary.
The index groups TextSummary nodes, not raw DocumentChunk nodes directly.

What retrieval adds

When enabled for graph completion search, Cognee prepends a global context prelude before the usual graph context:
World summary:
...

Relevant areas:
...

<normal graph context follows>
The World summary comes from the root GlobalContextSummary. The Relevant areas are the top matching non-root GlobalContextSummary bucket texts for the query. This gives the model a compact map before it reads the local graph facts.

Build the index

The index is opt-in. Build it after memory has been created:
await cognee.improve(
    dataset="product_docs",
    build_global_context_index=True,
)
improve() first runs the normal enrichment pass, then builds the global context index.
build_global_context_index=True is skipped when run_in_background=True, because ordered background pipeline chaining is not currently supported for this step.
Enable it through retriever_specific_config on graph completion search:
from cognee import SearchType

results = await cognee.recall(
    query_text="What is the current state of the rollout plan?",
    query_type=SearchType.GRAPH_COMPLETION,
    datasets=["product_docs"],
    retriever_specific_config={
        "include_global_context_index": True,
        "global_context_index_top_k": 3,
    },
)
To inspect exactly what will be sent as context, use only_context=True:
context = await cognee.recall(
    query_text="What changed after the second meeting?",
    query_type=SearchType.GRAPH_COMPLETION,
    datasets=["product_docs"],
    only_context=True,
    retriever_specific_config={
        "include_global_context_index": True,
        "global_context_index_top_k": 3,
    },
)

Configuration

OptionDefaultWhere it is usedWhat it does
build_global_context_indexFalsecognee.improve()Builds the bucket and root summaries after enrichment.
include_global_context_indexFalseretriever_specific_configPrepends global context during GRAPH_COMPLETION retrieval.
global_context_index_top_k3retriever_specific_configNumber of non-root bucket summaries to include as relevant areas.

Benefits and tradeoffs

The main benefit is better long-range coherence. The model can see a compact summary of the dataset before it reasons over the retrieved graph context. This can reduce failures where local retrieval finds a relevant fragment but misses the broader story. The tradeoff is that the index is lossy. A bucket summary is an orientation aid, not a replacement for source chunks or graph facts. The most reliable answers still come from combining global context with precise retrieved evidence. Building the index also adds work: Cognee clusters summaries and calls the LLM to summarize buckets and the root.

When not to use it

You may not need the global context index when:
  • your dataset is small enough that normal retrieval already has enough context
  • queries are mostly simple fact lookups
  • you need the fastest possible enrichment pass
  • you want retrieval context to contain only direct local graph evidence
For small datasets, start without it. Add it when you see questions that need broader orientation or multi-part memory.