Core Concepts

What cognee does is not trivial, nor is it rocket surgery.

These fundamental concepts will help you understand how cognee works and how to leverage its capabilities effectively.

How Cognee Works - The Big Picture

Before diving into details, here’s how the main concepts work together:

The Flow: Raw Data → Tasks → DataPoints → Graph + Vector Storage → Intelligent Search

Key Components Working Together

Tasks are individual processing steps (like “extract entities” or “create embeddings”)

Think of them as specialized functions that do one thing well
They can be any Python function
Examples: chunk text, extract relationships, generate summaries

Pipelines chain tasks together to complete complex workflows

Like assembly lines that transform raw data into knowledge
Examples: data ingestion pipeline, cognify pipeline, code analysis pipeline

DataPoints are the structured output that becomes your knowledge graph

Standardized format that carries both data and metadata
Extends Pydantic and provides base handling of the your DataModel but also gives flexibility
Each DataPoint becomes a node in your graph with vector embeddings
You will have thousands of nodes and edges defined with your DataPoints and they will in effect be your semantic memory

Dual Storage enables both relationship-based and semantic search

Graph database stores explicit relationships between concepts
Vector database stores semantic embeddings for similarity search
Together they provide comprehensive, intelligent retrieval

Think of it like a smart library: Tasks are the librarians who organize books (data), Pipelines are the organizational systems they follow, DataPoints are the properly catalogued books, and the dual storage is having both a structured catalog (graph) and a content-based search system (vector) working together.

We will explain the key components and principles that power cognee’s knowledge graph system.

Let’s start with data indexing and explain how to search

Data to Memory is the process of converting and ingesting your raw data into Cognee’s memory system.
- Node Sets provide a simple yet powerful tagging mechanism that helps in managing the growing complexity of your knowledge base as you add more content.
- Ontologies structure and organize knowledge in meaningful ways.
- Chunking is how cognee breaks down large datasets into manageable pieces for efficient processing and analysis.
Memory Processing encompasses the computational workflows that transform raw data into structured, queryable knowledge.
- Tasks are the building blocks of cognee’s data processing pipeline.
- Pipelines are the data processing workflows that transform raw information into structured knowledge graphs.
- DataPoints are the fundamental units of information that carry metadata and relationships.
Search Memory enables you to query and retrieve information from your knowledge graphs.
Architecture shows cognee’s system design, components, and how they work together to create a powerful knowledge graph platform.