Datasets

What is a dataset in Cognee?

A dataset is a named container that groups documents and their metadata. It is the main boundary for:

Organizing content
Running pipelines
Applying permissions

Dataset isolation requires specific configuration. See permissions system for details on access control requirements and supported database setups.

Add:
- Direct new content into a specific dataset (by name or ID)
- If it doesn’t exist, Cognee creates it and associates your permissions
- Items ingested are linked to that dataset and deduplicated within it
Cognify:
- Choose which dataset(s) to transform into a knowledge graph
- Loads the dataset’s content, checks rights, and runs the pipeline per dataset
- If none are specified, processes all datasets you’re authorized to use
- Progress is tracked per dataset for reliable re-runs
Search:
- Queries can be scoped by dataset
- Results and metrics remain separated by dataset

Access control

Permissions (read, write, share, delete) are enforced at the dataset level
Share one dataset with a team, keep another private
Independently manage who can modify or distribute content

Incremental processing

Processing status is tracked per dataset
After you add more data, Cognify focuses on new or changed items
Skips what’s already completed for that dataset

Datasets vs NodeSets

Datasets scope storage, permissions, and pipeline execution; NodeSets are semantic tags within a dataset.

During Add, you can label items with one or more NodeSet names (e.g., “AI”, “FinTech”)
Cognify propagates those labels into the graph by creating NodeSet nodes and linking derived chunks and entities via belongs_to_set relationships
This lets you slice a single dataset’s graph by topic or team without creating new datasets, while dataset-level permissions still control overall access

Add

Direct content into a dataset

Cognify

Run pipelines per dataset

Search

Scope queries by dataset

Getting Started

Core Concepts

Setup Configuration

Guides

Examples

CLI

What is a dataset in Cognee?

Access control

Incremental processing

Datasets vs NodeSets

Add

Cognify

Search

Getting Started

Core Concepts

Setup Configuration

Guides

Examples

CLI

​What is a dataset in Cognee?

​Access control

​Incremental processing

​Datasets vs NodeSets

Add

Cognify

Search

What is a dataset in Cognee?

Access control

Incremental processing

Datasets vs NodeSets