Skip to main content
search lets you ask questions over everything you’ve ingested and cognified.
Under the hood, Cognee blends vector similarity, graph structure, and LLM reasoning to return answers with context and provenance.

The big picture

  • Dataset-aware: searches run against one or more datasets you can read (requires ENABLE_BACKEND_ACCESS_CONTROL=true)
  • Multiple modes: from simple chunk lookup to graph-aware Q&A
  • Hybrid retrieval: vectors find relevant pieces; graphs provide structure; LLMs compose answers
  • Conversational memory: for GRAPH_COMPLETION, RAG_COMPLETION, and TRIPLET_COMPLETION, use session_id to maintain conversation history across searches (requires caching enabled). When caching is on, omitting session_id uses default_session and still stores history. Other search types do not use session history.
  • Safe by default: permissions are checked before any retrieval
  • Observability: telemetry is emitted for query start/completion
Dataset scoping requires specific configuration. See permissions system for details on access control requirements and supported database setups.

Where search fits

Use search after you’ve run .add and .cognify. At that point, your dataset has chunks, summaries, embeddings, and a knowledge graph—so queries can leverage both similarity and structure.

How it works (conceptually)

  1. Scope & permissions
    Resolve target datasets (by name or id) and enforce read access.
  2. Mode dispatch
    Pick a search mode (default: graph-aware completion) and route to its retriever.
  3. Retrieve → (optional) generate
    Collect context via vectors and/or graph traversal; some modes then ask an LLM to compose a final answer.
  4. Return results
    Depending on mode: answers, chunks/summaries with metadata, graph records, Cypher results, or code contexts.
For a practical guide to using search with examples and detailed parameter explanations, see Search Basics.

Retrievers

Each search type is handled by a retriever. The pipeline is: get_retrieved_objectsget_context_from_objectsget_completion_from_context (skipped when only_context=True).
Search typeRetriever
GRAPH_COMPLETIONGraphCompletionRetriever
RAG_COMPLETIONCompletionRetriever
CHUNKSChunksRetriever
SUMMARIESSummariesRetriever
GRAPH_SUMMARY_COMPLETIONGraphSummaryCompletionRetriever
GRAPH_COMPLETION_COTGraphCompletionCotRetriever
GRAPH_COMPLETION_CONTEXT_EXTENSIONGraphCompletionContextExtensionRetriever
TRIPLET_COMPLETIONTripletRetriever
CHUNKS_LEXICALJaccardChunksRetriever
CODING_RULESCodingRulesRetriever
TEMPORALTemporalRetriever
CYPHERCypherSearchRetriever
NATURAL_LANGUAGENaturalLanguageRetriever
You can register a custom retriever for a search type via use_retriever(SearchType, RetrieverClass); the class must implement the same three-step interface (BaseRetriever). See the API reference for BaseRetriever and register_retriever.

Multi-query (batch)

GraphCompletionRetriever, GraphCompletionCotRetriever, and GraphCompletionContextExtensionRetriever support batch mode: pass query_batch (a non-empty list of strings) instead of query. You get one result per query; session cache is not used in batch mode. The public cognee.search() API accepts only a single query_text; batch is available when you use the retrievers directly (e.g. in custom pipelines).

GRAPH_COMPLETION (default)

Graph-aware question answering.
  • What it does: Finds relevant graph triplets using vector hints across indexed fields, resolves them into readable context, and asks an LLM to answer your question grounded in that context.
  • Why it’s useful: Combines fuzzy matching (vectors) with precise structure (graph) so answers reflect relationships, not just nearby text.
  • Typical output: A natural-language answer with references to the supporting graph context.
Retrieve-then-generate over text chunks.
  • What it does: Pulls top-k chunks via vector search, stitches a context window, then asks an LLM to answer.
  • When to use: You want fast, text-only RAG without graph structure.
  • Output: An LLM answer grounded in retrieved chunks.
Direct chunk retrieval.
  • What it does: Returns the most similar text chunks to your query via vector search.
  • When to use: You want raw passages/snippets to display or post-process.
  • Output: Chunk objects with metadata.
Search over precomputed summaries.
  • What it does: Vector search on TextSummary content for concise, high-signal hits.
  • When to use: You prefer short summaries instead of full chunks.
  • Output: Summary objects with provenance.
Graph-aware summary answering.
  • What it does: Builds graph context like GRAPH_COMPLETION, then condenses it before answering.
  • When to use: You want a tighter, summary-first response.
  • Output: A concise answer grounded in graph context.
Chain-of-thought over the graph.
  • What it does: Iterative rounds of graph retrieval and LLM reasoning to refine the answer.
  • When to use: Complex questions that benefit from stepwise reasoning.
  • Output: A refined answer produced through multiple reasoning steps.
Iterative context expansion.
  • What it does: Starts with initial graph context, lets the LLM suggest follow-ups, fetches more graph context, repeats.
  • When to use: Open-ended queries that need broader exploration.
  • Output: An answer assembled after expanding the relevant subgraph.
Natural language to Cypher to execution.
  • What it does: Infers a Cypher query from your question using the graph schema, runs it, returns the results.
  • When to use: You want structured graph answers without writing Cypher.
  • Output: Executed graph results.
Run Cypher directly.
  • What it does: Executes your Cypher query against the graph database.
  • When to use: You know the schema and want full control.
  • Output: Raw query results.
CYPHER and NATURAL_LANGUAGE are disabled when ALLOW_CYPHER_QUERY=false (environment variable).
Code-focused retrieval (coding rules / codebase search).
  • What it does: Retrieves rules or code context from the coding_agent_rules nodeset and returns structured code information.
  • When to use: Codebases or coding guidelines indexed by Cognee (e.g. via memify).
  • Output: Structured code contexts and related graph information.
  • Prereq: The coding_agent_rules nodeset must be populated (e.g. via memify).
Triple-based retrieval with LLM completion (no full graph traversal).
  • What it does: Retrieves graph triplets by vector similarity, resolves them to text, and asks an LLM to answer.
  • When to use: You want triplet-level context without full graph expansion.
  • Output: An LLM answer grounded in retrieved triplets.
  • Prereq: Triplet embeddings must exist—set TRIPLET_EMBEDDING=true before running cognify or run the memify pipeline create_triplet_embeddings (retriever uses the Triplet_text collection).
Lexical (keyword-style) chunk search.
  • What it does: Returns chunks that match your query using token-based similarity (e.g. Jaccard), not semantic embeddings.
  • When to use: Exact-term or keyword-style lookups; stopword-aware search.
  • Output: Ranked text chunks, optionally with scores.
Time-aware retrieval.
  • What it does: Retrieves and ranks content by temporal relevance (dates, events) and answers with time context.
  • When to use: Queries about “before/after X”, “in 2020”, or event timelines.
  • Output: An answer grounded in time-filtered graph context. See Time-awareness for setup.
Automatic mode selection.
  • What it does: Uses an LLM to pick the most suitable search mode for your query, then runs it.
  • When to use: You’re not sure which mode fits best.
  • Output: Results from the selected mode.
Feedback is handled via Sessions and the Feedback System—use cognee.session.add_feedback and cognee.session.delete_feedback. See the Sessions Guide and Feedback System for full details.