cargo doc --no-deps --open). For what the system does see
operations; for how it fits together see
architecture.
Architecture: three stores
Cognee keeps memory in three complementary backends. Every cognify run writes to all three; search reads across them.| Store | Role | Crate | Default backend |
|---|---|---|---|
| Relational | Document tracking, deduplication, provenance/lineage, sessions | cognee-database | SQLite via SeaORM (Postgres supported) |
| Vector | Semantic similarity over embeddings (chunks, entities, summaries, triplets) | cognee-vector | LanceDB embedded (brute-force on Android / :memory:; pgvector via feature) |
| Graph | Entity relationships — the knowledge graph itself | cognee-graph | Embedded Ladybug |
Building blocks
DataPoints
A DataPoint is the base storage-layer unit: a structured record that carries a stable UUID, timestamps, atype discriminator, free-form metadata, and
provenance fields (source_pipeline, source_task, source_node_set,
source_content_hash). Typed graph nodes — Entity, EntityType, EdgeType,
DocumentChunk, etc. — embed a DataPoint as their base, exposed through the
HasDataPoint trait so provenance stamping can walk any node uniformly. When a
DataPoint is indexed, its serialized form becomes the vector-store payload
(vector_metadata()), keeping the on-disk shape comparable to Python’s.
Rust: DataPoint / HasDataPoint in cognee-models.
Tasks
A Task is one reusable unit of work that transforms data — classify, chunk, extract, summarize, embed. Tasks come in eight execution flavours (sync/async × single/iterator/stream × single-value/batch) so a step can stream, fan out, or process whole batches. They are composed with optional per-task config (TaskInfo: name, batch size, weight, rate limiter) and the pipeline executor
routes values between them.
Rust: Task / TypedTask / TaskInfo in cognee-core.
Pipelines
A Pipeline is an orchestrated sequence of Tasks with shared context (database, graph, vector, cancellation, progress) and a watcher for status events. The concrete pipelines are:| Pipeline | What it composes | Crate / entry point |
|---|---|---|
| add | ingest → hash → dedup → persist | cognee-ingestion (AddPipeline) |
| cognify | classify → chunk → extract → summarize → index → FK edges | cognee-cognify (cognify()) |
| memify | read graph → build triplets → embed → index | cognee-cognify (memify()) |
| search | route query → retrieve → (optionally) complete | cognee-search (SearchOrchestrator) |
PipelineWatcher, ExecStatusManager,
thread pool) live in cognee-core. The end-to-end flow is
described in operations.
Key concepts
Datasets
A Dataset is the organizational scope for memory operations: named, owned by a user, optionally tenant-scoped. Data and DataPoints belong to one or more datasets (DataPoint.belongs_to_set), and add / cognify / search / delete all
operate within a dataset scope. Dataset IDs are deterministic (UUID5 of name +
owner) for cross-SDK reproducibility.
Rust: Dataset in cognee-models; lifecycle helpers in
DatasetManager (cognee-lib api::datasets).
Sessions
A Session is a temporary memory context — search/answer history and feedback for a single conversational thread — distinct from permanent, graph-backed storage. Passing a--session-id to remember / recall scopes a turn to that
session and lets retrieval reuse prior context; omitting it persists input as
permanent graph memory (see the memory API in operations).
The store backend is pluggable.
Rust: SessionStore trait in cognee-session, with
FsSessionStore (feature fs), RedisSessionStore, and SeaOrmSessionStore
backends.
Node Sets
A node set is a tag attached to ingested data and the DataPoints derived from it, used to categorize and later scope the knowledge base. It is partially realized in Rust today:- Tagging at ingest —
addaccepts anode_set(stored onData.node_setand propagated to derived DataPoints assource_node_set); the pipeline executor can attachnode_setprovenance to task outputs (Tagged/TaggedMetaincognee-core). - Scoping memify — memify enrichment can be restricted to a subset of the
graph by node type and node name via
--node-type/--node-name(MemifyConfig::with_node_type_filter/with_node_name_filter, backed by the graph trait’sget_nodeset_subgraph). The internalpersist_sessionsstep tags cached session data with a fixed node set.
Ontologies
An ontology grounds extracted entities in external, structured knowledge. cognee-rust loads RDF/OWL ontologies (Turtle, RDF/XML, N-Triples, JSON-LD) and uses them for fuzzy entity matching and subgraph enrichment during cognify. The default is a no-op resolver (no grounding), matching Python’sontology_file=None.
Rust: OntologyResolver trait in cognee-ontology, with
NoOpOntologyResolver (default) and RdfLibOntologyResolver. Enabled per run
via cognify’s --ontology-file; see configuration.
Loaders & Chunkers
Loaders handle file-format reading at ingest: a loader registry dispatches by MIME type / extension to per-format loaders (text, PDF, CSV, HTML, image, audio, and theunstructured office formats), most behind feature flags. Chunkers
then segment a document into token-bounded pieces through a word → sentence →
paragraph hierarchy, sizing chunks with a pluggable TokenCounter
(WordCounter, or the feature-gated HuggingFace / tiktoken counters).
Rust: loaders (LoaderRegistry, DocumentLoader) in
cognee-ingestion; chunking (text_chunker,
TokenCounter) in cognee-chunking. Token-counter
selection is configured in configuration.
See also
- operations — what the pipelines and memory API actually do
- configuration — env vars and runtime config for every concept above
- architecture — crate layering and design patterns
- tools/backends — choosing relational / vector / graph backends
- roadmap/README.md — what is partial or not yet implemented