Skip to main content

Datasets: The Core Unit of Data

A dataset is a logical container for related documents and their processed knowledge graphs. All data in Cognee belongs to a dataset. When you add documents to Cognee using cognee.add(), they are processed and stored within a specific dataset.
Dataset-scoped permissions — All permissions in Cognee are defined at the dataset level, never for individual documents.

Ownership and Permissions

When a principal creates a dataset, they become its owner. A principal is any entity that can have permissions, like a user, tenant, or role. Ownership cannot be changed. The owner has full control and can grant permissions to others. There are four types of permissions you can grant on a dataset:
  • Read — View documents and query the knowledge graph.
  • Write — Add, modify, or remove documents and data.
  • Delete — Remove the entire dataset.
  • Share — Grant permissions to other principals.

Dataset Isolation: How Access Is Enforced

Cognee can enforce strict data isolation between datasets, but it’s important to understand when this happens.
  • Isolation is optional: Dataset boundaries are only enforced when the ENABLE_BACKEND_ACCESS_CONTROL setting is true.
  • Without isolation: If this setting is false, dataset parameters are ignored during searches, and queries will run across all data in the system, regardless of permissions.
  • Database support: True isolation is currently supported when using Kùzu for the graph store, LanceDB for the vector store, and SQLite or Postgres for the relational database. Other database backends (like Neo4j or Qdrant) do not support dataset isolation.
See ACL for details on how permissions are stored and checked. For setup instructions, see Permissions Setup.

Using Datasets in Operations

Datasets integrate with Cognee’s main operations:
  • add: Direct new content into a specific dataset by name or ID. If no dataset is specified, a default main_dataset is used.
  • cognify: Choose which dataset(s) to transform into AI memory stored in the graph and vector stores.
  • search: Scope queries to run only against datasets you have read access to.
  • memify: Apply optional semantic enrichment on a per-dataset basis.

Technical Details

Different operations require different permissions:
  • add/cognify operations → require write permission
  • search operations → require read permission
  • delete operations → require delete permission
  • Permission management → requires share permission
Cognee provides two helper methods for creating datasets:
  • create_dataset(): This is a lower-level function that only inserts the dataset record. It expects the caller to manage the Access Control List (ACL) entries separately.
  • create_authorized_dataset(): This is the recommended method for most user-facing flows. It wraps create_dataset() and then immediately grants the creator full read/write/delete/share permissions. This ensures the dataset is usable as soon as it’s created, especially when ENABLE_BACKEND_ACCESS_CONTROL is active.
The core dataset metadata is stored in a relational (SQL) database. The datasets table includes:
  • id: Unique identifier (UUID primary key)
  • name: Human-readable name
  • owner_id: ID of the principal who created the dataset
  • created_at: Timestamp when created
  • updated_at: Timestamp when last modified

Limitations

  • Dataset ownership cannot be transferred.
  • When access control is enabled, the graph and vector stores are enforced as Kùzu and LanceDB.
  • Cross-dataset searches are not supported directly. Queries are always scoped to a single dataset. To search multiple datasets, you must run separate queries for each one you have access to.