What is the cognify operation
The.cognify
operation takes the data you ingested with Add and turns plain text into structured knowledge: chunks, embeddings, summaries, nodes, and edges that live in Cognee’s vector and graph stores. It prepares your data for downstream operations like Search.
- Transforms ingested data: builds chunks, embeddings, and summaries; always comes after Add
- Graph creation: extracts entities and relationships to form a knowledge graph
- Vector indexing: makes everything searchable via embeddings
- Dataset-scoped: runs per dataset, respecting ownership and permissions
- Incremental loading: you can run
.cognify
multiple times as your dataset grows, and Cognee will skip what’s already processed
What happens under the hood
The.cognify
pipeline is made of six ordered Tasks. Each task takes the output of the previous one and moves your data closer to becoming a searchable knowledge graph.
- Classify documents — wrap each ingested file as a
Document
object with metadata and optional node sets - Check permissions — enforce that you have the right to modify the target dataset
- Extract chunks — split documents into smaller pieces (paragraphs, sections)
- Extract graph — use LLMs to identify entities and relationships, inserting them into the graph DB
- Summarize text — generate summaries for each chunk, stored as
TextSummary
DataPoints - Add data points — embed nodes and summaries, write them into the vector store, and update graph edges
After cognify finishes
When.cognify
completes for a dataset:
- DocumentChunks exist in memory as the granular breakdown of your files
- Summaries are stored and indexed in the vector database for semantic search
- Knowledge graph nodes and edges are committed to the graph database
- Dataset metadata is updated with token counts and pipeline status
- Your dataset is now query-ready: you can run Search or graph queries immediately
Examples and details
Pipeline tasks (detailed)
Pipeline tasks (detailed)
-
Classify documents
- Turns raw
Data
rows intoDocument
objects - Chooses the right document type (PDF, text, image, audio, etc.)
- Attaches metadata and optional node sets
- Turns raw
-
Check permissions
- Verifies that the user has write access to the dataset
-
Extract chunks
- Splits documents into
DocumentChunk
s using a chunker - Updates token counts in the relational DB
- Splits documents into
-
Extract graph
- Calls the LLM to extract entities and relationships
- Deduplicates nodes and edges, commits to the graph DB
-
Summarize text
- Generates concise summaries per chunk
- Stores them as
TextSummary
DataPoints for vector search
-
Add data points
- Converts summaries and other DataPoints into graph + vector nodes
- Embeds them in the vector store, persists in the graph DB
Datasets and permissions
Datasets and permissions
- Cognify always runs on a dataset
- You must have write access to the dataset
- Permissions are enforced at pipeline start
- Each dataset maintains its own cognify status and token counts
Incremental loading
Incremental loading
- By default,
.cognify
processes all data in a dataset - With
incremental_loading=True
, only new or updated files are processed - Saves time and compute for large, evolving datasets
Final outcome
Final outcome
- Vector database contains embeddings for summaries and nodes
- Graph database contains entities and relationships
- Relational database tracks token counts and pipeline run status
- Your dataset is now ready for Search (semantic or graph-based)