Core Concepts
What cognee does is not trivial, nor is it rocket surgery.
These fundamental concepts will help you understand how cognee works and how to leverage its capabilities effectively.
How Cognee Works - The Big Picture
Before diving into details, here’s how the main concepts work together:
The Flow: Raw Data → Tasks → DataPoints → Graph + Vector Storage → Intelligent Search
Key Components Working Together
Tasks are individual processing steps (like “extract entities” or “create embeddings”)
- Think of them as specialized functions that do one thing well
- They can be any Python function
- Examples: chunk text, extract relationships, generate summaries
Pipelines chain tasks together to complete complex workflows
- Like assembly lines that transform raw data into knowledge
- Examples: data ingestion pipeline, cognify pipeline, code analysis pipeline
DataPoints are the structured output that becomes your knowledge graph
- Standardized format that carries both data and metadata
- Extends Pydantic and provides base handling of the your DataModel but also gives flexibility
- Each DataPoint becomes a node in your graph with vector embeddings
- You will have thousands of nodes and edges defined with your DataPoints and they will in effect be your semantic memory
Dual Storage enables both relationship-based and semantic search
- Graph database stores explicit relationships between concepts
- Vector database stores semantic embeddings for similarity search
- Together they provide comprehensive, intelligent retrieval
Think of it like a smart library: Tasks are the librarians who organize books (data), Pipelines are the organizational systems they follow, DataPoints are the properly catalogued books, and the dual storage is having both a structured catalog (graph) and a content-based search system (vector) working together.
We will explain the key components and principles that power cognee’s knowledge graph system.
Let’s start with data indexing and explain how to search
- Data to Memory is the process of converting and ingesting your raw data into Cognee’s memory system.
- Node Sets provide a simple yet powerful tagging mechanism that helps in managing the growing complexity of your knowledge base as you add more content.
- Ontologies structure and organize knowledge in meaningful ways.
- Chunking is how cognee breaks down large datasets into manageable pieces for efficient processing and analysis.
- Memory Processing encompasses the computational workflows that transform raw data into structured, queryable knowledge.
- Tasks are the building blocks of cognee’s data processing pipeline.
- Pipelines are the data processing workflows that transform raw information into structured knowledge graphs.
- DataPoints are the fundamental units of information that carry metadata and relationships.
- Search Memory enables you to query and retrieve information from your knowledge graphs.
- Architecture shows cognee’s system design, components, and how they work together to create a powerful knowledge graph platform.