Datasets: The Core Unit of Data

A dataset is a logical container for related documents and their processed knowledge graphs. All data in Cognee belongs to a dataset. When you add documents to Cognee using cognee.add(), they are processed and stored within a specific dataset.
Dataset-scoped permissions — All permissions in Cognee are defined at the dataset level, never for individual documents.

Ownership

When a principal creates a dataset, they become its owner. A principal is any entity that can have permissions - usually a user, but it can also be a tenant or role (we’ll explain these later). Once a dataset is created, its ownership cannot be changed. The owner can do anything with the dataset and can give permissions to others.

Permission Types

Permissions are always defined at the dataset level, never for individual documents. There are four permission types:
  • Read — View documents and query the knowledge graph
  • Write — Add, modify, or remove documents and data
  • Delete — Remove the entire dataset
  • Share — Grant permissions to other principals

Default Behavior

When no specific dataset is provided, Cognee uses a default dataset called main_dataset. This dataset is created automatically if it doesn’t exist. Users can create additional datasets as needed for organizing their data. You can specify a different dataset by passing the dataset_name parameter to cognee.add().

Dataset Creation

Cognee keeps core dataset metadata in the relational (SQL) database: each dataset row records its UUID, name, owner, and audit timestamps, and the Access Control List tables map principals (users, tenants, roles) to the permissions they hold on that dataset.

Integration with Main Operations

Datasets work seamlessly with Cognee’s main operations:
  • Add — Direct new content into a specific dataset (by name or ID)
  • Cognify — Choose which dataset(s) to transform into a knowledge graph
  • Search — Queries can be scoped by dataset
  • Memify — Optional semantic enrichment per dataset

Access Control

Permissions (read, write, share, delete) are enforced at the dataset level. This allows you to:
  • Share one dataset with a team, keep another private
  • Independently manage who can modify or distribute content
  • Control access granularly across different data collections
See ACL for how permissions are stored and checked.

Incremental Processing

Processing status is tracked per dataset. After you add more data, Cognify focuses on new or changed items, skipping what’s already completed for that dataset.

Limitations

  • Dataset ownership cannot be transferred
  • Graph and vector stores are enforced as Kùzu and LanceDB in access control mode
  • Cross-dataset search: Queries are dataset-scoped; cross-dataset searches run per authorized dataset context