Cognify

cognify() is a legacy operation. In Cognee v1.0, most users should use remember() instead, which replaces the add() + cognify() + memify() workflow with a single call.

What is the cognify operation

The .cognify operation takes the ingested data with Add and turns plain text into structured knowledge: chunks, embeddings, summaries, nodes, and edges that live in Cognee’s vector and graph stores. It prepares your data for downstream operations like Search.

Transforms ingested data: builds chunks, embeddings, and summaries
Graph creation: extracts entities and relationships to form a knowledge graph
Vector indexing: makes everything searchable via embeddings
Dataset-scoped: runs per dataset, respecting ownership and permissions

.cognify can be run multiple times as the dataset grows, and Cognee will skip what’s already processed. Read more about Incremental loading in Examples and details

What happens under the hood

The .cognify pipeline is made of six ordered Tasks. Each task takes the output of the previous one and moves your data closer to becoming a searchable knowledge graph.

Classify documents — wrap each ingested file as a Document object with metadata and optional node sets
Check permissions — enforce that you have write access to the target dataset
Extract chunks — split documents into smaller pieces (paragraphs, sections)
Extract graph — use LLMs to identify entities and relationships, inserting them into the graph DB
Summarize text — generate summaries for each chunk, stored as TextSummary DataPoints
Add data points — embed nodes and summaries, write them into the vector store, and update graph edges

The result is a fully searchable, structured knowledge graph connected to your data.

After cognify finishes

When .cognify completes for a dataset:

DocumentChunks exist in memory as the granular breakdown of your files
Summaries are stored and indexed in the vector database for semantic search
Knowledge graph nodes and edges are committed to the graph database
Dataset metadata is updated with token counts and pipeline status
Your dataset is now query-ready: you can run Search or graph queries immediately

Because cognify() calls the LLM for entity extraction and summarization, it can fail when the configured LLM provider (or LiteLLM proxy) reports that its token budget is exhausted. In that case it raises LLMPaymentRequiredError, which the API surfaces as HTTP 402 (Payment Required) with body {"error": "Token budget exhausted", "detail": "..."}. This error is terminal — Cognee does not retry budget-exhaustion failures — so treat a 402 as final for the request and prompt the user to top up their token budget rather than reissuing the call.

Examples and details

Pipeline tasks (detailed)

Classify documents
- Turns raw Data rows into Document objects
- Chooses the right document type (PDF, text, image, audio, etc.)
- Attaches metadata and optional node sets
Check permissions
- Verifies that the user has write access to the dataset
Extract chunks
- Splits documents into DocumentChunks using a chunker
- You can customize the chunk size and strategy — see Chunkers for details
- Updates token counts in the relational DB
Extract graph
- Calls the LLM to extract entities and relationships
- Deduplicates nodes and edges, commits to the graph DB
Summarize text
- Generates concise summaries per chunk
- Stores them as TextSummary DataPoints for vector search
Add data points
- Converts summaries and other DataPoints into graph + vector nodes
- Embeds them in the vector store, persists in the graph DB

Default extraction prompts

Cognee ships with several built-in system prompts for entity and relationship extraction, stored in cognee/infrastructure/llm/prompts/. The active prompt is controlled by the GRAPH_PROMPT_PATH environment variable (default: generate_graph_prompt.txt).

Prompt file	Use case	What it does
`generate_graph_prompt.txt`	Default balanced extraction	Extracts entities and relationships using the standard Cognee rules: basic node types, human-readable IDs, normalized dates, `snake_case` relationships, and coreference consistency.
`generate_graph_prompt_simple.txt`	Lightweight extraction	Uses a shorter, more compact rule set for straightforward graph extraction while keeping the same core conventions around node types, IDs, dates, and relationship naming.
`generate_graph_prompt_strict.txt`	Tighter schema control	Applies a more explicit prompt with named node categories, stronger relationship constraints, examples, and a strict instruction not to infer facts that are not present in the text.
`generate_graph_prompt_guided.txt`	More directed graph shaping	Adds guidance for edge direction, allows multi-word entity labels, and encourages logically implied facts when they improve graph clarity without repeating the same fact.

To switch to a different built-in prompt, set the environment variable:

GRAPH_PROMPT_PATH=generate_graph_prompt_strict.txt

Or configure it at runtime via cognee.config:

import cognee

cognee.config.llm_config.graph_prompt_path = "generate_graph_prompt_strict.txt"

If you need to use a custom prompt, refer to our Custom Prompts guide

Datasets and permissions

Cognify always runs on a dataset
You must have write access to the target dataset
Permissions are enforced at pipeline start
Each dataset maintains its own cognify status and token counts

Incremental loading and deduplication

incremental_loading=True is the default for both cognee.add() and cognee.cognify(). Together they give you two layers of deduplication:Layer 1 — content-hash deduplication in add()Before cognify() runs, add() already deduplicates by content hash. Re-adding unchanged content is skipped at ingestion time, while changed content updates the record and marks it for reprocessing.For the full behavior and scenario table, see Hash-based deduplication on the Add page.Layer 2 — pipeline-status tracking in cognify()Before processing each data item, cognify() checks a pipeline_status field on the record. If the status for the cognify_pipeline in the current dataset is already COMPLETED, the item is skipped entirely — no LLM calls, no re-embedding, no graph writes.

Scenario	What happens
Same file added and cognified again (unchanged)	Skipped — `pipeline_status` is already `COMPLETED`
File content changes, then `add()` + `cognify()`	`pipeline_status` is reset by `add()`; `cognify()` reprocesses
New file added to an existing dataset, then `cognify()`	Only the new file is processed; existing ones are skipped
`incremental_loading=False` passed to `cognify()`	All items are reprocessed regardless of previous status

Common usage patterns:

Appending new data to an existing dataset

You can grow a dataset over time without reprocessing what’s already there:

import cognee

# Initial load
await cognee.add("First document content", dataset_name="my_dataset")
await cognee.cognify(datasets=["my_dataset"])

# Later: add more data — existing items are skipped automatically
await cognee.add("Second document content", dataset_name="my_dataset")
await cognee.cognify(datasets=["my_dataset"])  # only processes the new document

Forcing a full reprocess

To reprocess everything regardless of status, pass incremental_loading=False:

await cognee.cognify(datasets=["my_dataset"], incremental_loading=False)

This bypasses the pipeline-status check but does not re-ingest files — use cognee.datasets.empty_dataset() first if you also need to clear the stored data.

Batching for faster processing

Two batching parameters control how much work Cognee runs at once during ingestion and graph building:For new workflows, prefer remember(). It is the current API and accepts these batching controls for permanent-memory ingestion. Use add() and cognify() directly only when you need lower-level control over ingestion and graph building as separate legacy steps.

Parameter	Applies to	What it controls	Default
`data_per_batch`	`remember()` permanent memory; legacy `add()` / `cognify()`	Maximum number of data items processed concurrently within one dataset pipeline run	`20`
`chunks_per_batch`	`remember()` permanent memory; legacy `cognify()`	Number of chunks emitted to each chunk-level Cognify task batch, including graph extraction/summarization and data-point persistence	Default Cognify: `100`; temporal Cognify: `10`; `remember` HTTP endpoint form default: `36`

data_per_batch is the outer concurrency limit. Cognee schedules the data items in a dataset and uses a semaphore so at most this many items are processed at the same time.chunks_per_batch is the inner chunk-task batch size. In the default Cognify pipeline, Cognee passes it as batch_size to the graph extraction/summarization task and to add_data_points. If you do not pass it directly, Cognee checks the chunks_per_batch value from CognifyConfig; when that is unset, the default Cognify pipeline uses 100.Where to configure them

Pass data_per_batch and chunks_per_batch to permanent-memory remember() for the current API path.
Use legacy add() / cognify() only when you intentionally split ingestion and graph building; data_per_batch applies to both, while chunks_per_batch applies to cognify().
For the legacy /api/v1/cognify endpoint, both values are accepted in the JSON request body.
For /api/v1/remember, chunks_per_batch is exposed as a multipart form field. The current remember endpoint does not expose data_per_batch as a form field, so tune data_per_batch through the Python SDK or the lower-level Cognify API when you need that control.

Tuning guidance

Scenario	Suggested starting point
Small local datasets	Keep the defaults. Lower values rarely help unless you are debugging provider limits.
Large datasets on a larger machine	Increase gradually, for example `data_per_batch=30` or `50` and `chunks_per_batch=100` or higher, while watching memory, database load, and provider rate limits.
Memory-constrained environments	Lower both values, for example `data_per_batch=2` to `5` and `chunks_per_batch=10` to `25`, to reduce concurrent model calls and in-memory intermediate results.
Faster ingestion with many independent files	Increase `data_per_batch` first, because it controls how many data items can move through the pipeline concurrently.
Faster graph extraction for long documents	Tune `chunks_per_batch`, because it controls chunk-level batches after documents are split.

Larger batches can improve throughput by keeping the pipeline, model provider, and databases busier, but they also increase memory pressure and can hit LLM, embedding, or database rate limits sooner. The best values depend on document size, chunk count, model/provider limits, embedding batch behavior, graph/vector database capacity, and the CPU/RAM available to the Cognee process.

How entity and relationship names are determined

During the Extract graph step, Cognee asks the LLM to turn each chunk into graph nodes and edges. The names and types in that graph are inferred from your content rather than fixed in advance.

Element	Fields
Node (vertex)	`id` - unique identifier derived from the entity name; `name` - human-readable label; `type` - semantic category such as `Person` or `Organization`; `description` - short summary
Edge (relationship)	`source_node_id`, `target_node_id`, `relationship_name` - a free-text verb phrase such as `works_at` or `produces`

The extraction prompt instructs the model to:

Capture entities, names, nouns, and implied mentions exhaustively
Form relationships as (start_node, relationship_name, end_node) triplets using explicit and inferred connections
Avoid duplicates and overly generic terms

That means the resulting graph schema emerges from the data you ingest. Different datasets, prompts, or LLMs can produce slightly different node types and relationship names for similar content.

If you need tighter control over naming, use an OWL ontology or a custom graph model. See Ontologies and Custom Graph Model.

Inspect extracted graph schema

Once .cognify finishes, the graph schema is inspectable because the extracted node types and relationship names now exist in the graph store.

Python SDK

Use the graph engine directly to inspect the stored nodes and edges:

from cognee.infrastructure.databases.graph import get_graph_engine

graph_engine = await get_graph_engine()

# Returns all nodes and edges
nodes, edges = await graph_engine.get_graph_data()

# Inspect unique node types
node_types = {props.get("type") for _, props in nodes if props.get("type")}
print("Node types:", node_types)

# Inspect unique relationship names
relationship_names = {rel_name for _, _, rel_name, _ in edges}
print("Relationship names:", relationship_names)

get_graph_data() returns:

Nodes as (node_id: str, properties: dict)
Edges as (source_id: str, target_id: str, relationship_name: str, properties: dict)

If you only need aggregate information, inspect graph metrics instead:

metrics = await graph_engine.get_graph_metrics()
# Returns: num_nodes, num_edges, mean_degree, edge_density,
#          num_connected_components, sizes_of_connected_components

metrics = await graph_engine.get_graph_metrics(include_optional=True)
print(metrics)

HTTP server mode

When you run the Cognee HTTP server, you can inspect graph data through the dataset graph endpoint:

GET /api/v1/datasets/{dataset_id}/graph

To explore the same graph visually, use the Graph Visualization guide.

Re-cognify after schema changes

If you update your data model (e.g., add new entity fields or relationships) and want to reprocess existing data:

Delete the dataset first, then re-add and re-cognify:

# Clear existing processed data
await cognee.datasets.empty_dataset(dataset_id=my_dataset.id)

# Re-add source files
await cognee.add(source_files, dataset_name="my_dataset")

# Re-cognify with the updated schema
await cognee.cognify()

Alternatively, use Memify for additive enrichment — it runs extraction and enrichment tasks over the existing graph without re-ingesting data. This is useful when you want to add new derived facts without reprocessing from scratch.

.cognify skips already-processed data by default. Simply re-running .cognify on unchanged files will not pick up schema changes. You must delete and re-add the data, or use memify for enrichment.

Final outcome

Vector database contains embeddings for summaries and nodes
Graph database contains entities and relationships
Relational database tracks token counts and pipeline run status
Your dataset is now ready for Search (semantic or graph-based)

Checking indexing status

If you are using the current v1.0 API, see Remember for the recommended indexing-status workflow built around remember() and recall().If you are working directly with legacy .cognify() or the MCP cognify_status tool, the same dataset status primitives still apply:

cognee.datasets.get_status([dataset_id])
GET /api/v1/datasets/status?dataset=<dataset-uuid>
GET /api/v1/activity/pipeline-runs?dataset_id=<dataset-uuid>

In MCP mode, cognify_status(dataset_name="main_dataset") provides a text summary of recent cognify runs.

LLM call count and cost estimation

The default cognify pipeline makes 2 LLM calls per chunk:

Graph extraction — identifies entities and relationships from the chunk text
Summarization — generates a concise summary of the chunk

Estimating total callsThe number of chunks depends on your document size and the configured chunk_size:

chunks          = ceil(document_tokens / chunk_size)
total_llm_calls = chunks × 2

When chunk_size is not set explicitly, Cognee auto-calculates it as:

chunk_size = min(embedding_model_max_tokens, llm_max_tokens ÷ 2)

With typical defaults (e.g., gpt-4o-mini + text-embedding-3-small) this usually falls in the 1 024 – 8 192 token range. See Chunkers for details.Example estimates at chunk_size = 1024:

Document size	Chunks	LLM calls
100 tokens	1	2
1 000 tokens	1	2
10 000 tokens	10	20

Tips for reducing API usage

Increase chunk_size — fewer, larger chunks mean fewer calls:
await cognee.cognify(chunk_size=4096)
Skip summarization — use a custom pipeline that omits the summarize_text task, reducing calls to 1 per chunk.
Enable rate limiting — set LLM_RATE_LIMIT_ENABLED=true to avoid bursting your provider quota when processing many chunks in parallel.

Concurrent search while cognify is running

You can run search while a cognify pipeline is active — there is no global lock that blocks one from the other.Cognee’s locks are session-level: they serialize short read-modify-write operations (such as update_qa or add_feedback) within the same (session_id, operation) pair. They do not apply across cognify and search.Within a single process, the default LanceDB vector store uses an asyncio lock per adapter instance to serialize concurrent write coroutines, so interleaved searches and writes within the same worker are safe.Across multiple processes (e.g., two workers sharing the same LanceDB data directory), concurrent writes can produce errors like:

CommitConflict: Too many concurrent writers

This is LanceDB’s own multi-process commit conflict detection. To reduce these conflicts, enable the dataset queue:

DATASET_QUEUE_ENABLED=true
# Optional: tune the concurrency limit (defaults to DATABASE_MAX_LRU_CACHE_SIZE=128)
DATASET_QUEUE_MAX_CONCURRENT=4

When enabled, the queue limits how many dataset-level operations run at the same time, serializing writes to the same dataset and reducing commit conflicts in shared LanceDB setups.

For single-process deployments (the default), concurrent search during cognify works without any special configuration.

Add

First bring data into Cognee

Search

Query embeddings or graph structures built by Cognify

Memify

Enrich your graph with derived facts after cognify

Getting Started

Core Concepts

Setup Configuration

Guides

Examples

CLI

Rust SDK

TypeScript SDK

OSS

What is the cognify operation

What happens under the hood

After cognify finishes

Examples and details

Python SDK

HTTP server mode

Add

Search

Memify

​What is the cognify operation

​What happens under the hood

​After cognify finishes

​Examples and details

​Python SDK

​HTTP server mode

Add

Search

Memify

What is the cognify operation

What happens under the hood

After cognify finishes

Examples and details

Python SDK

HTTP server mode