> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cognee.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Datasets

> Project-level containers for organization, permissions, and processing

## What is a dataset in Cognee?

A dataset is a named container that groups documents and their metadata. It is the main boundary for:

* Organizing content
* Running pipelines
* Applying permissions

<Warning>
  **Dataset isolation** requires specific configuration. See [permissions system](../multi-user-mode/multi-user-mode-overview) for details on access control requirements and supported database setups.
</Warning>

* **[Remember](../main-operations/remember)**:
  * Direct new content into a specific dataset (by name or ID)
  * If it doesn’t exist, Cognee creates it and associates your permissions
  * Items ingested are linked to that dataset and deduplicated within it

* **[Improve](../main-operations/improve)**:
  * Runs enrichment against a chosen dataset
  * Loads the dataset’s existing graph, checks rights, and runs the improvement pipeline in dataset scope
  * Lets you deepen or bridge memory without re-ingesting the source data

* **[Recall](../main-operations/recall)**:
  * Queries can be scoped by dataset
  * Results and metrics remain separated by dataset

* **[Forget](../main-operations/forget)**:
  * Removes memory at item, dataset, or full-user scope
  * Uses dataset permissions to decide what the current user can remove

## Access control

* Permissions (read, write, share, delete) are enforced at the dataset level
* Share one dataset with a team, keep another private
* Independently manage who can modify or distribute content

## Incremental processing

* Processing status is tracked per dataset
* After you remember more data, the underlying cognify step focuses on new or changed items
* Skips what’s already completed for that dataset

## Datasets vs NodeSets

**Datasets** scope storage, permissions, and pipeline execution; **[NodeSets](../further-concepts/node-sets)** are semantic tags within a dataset.

* During `remember()`, you can label items with one or more NodeSet names (e.g., "AI", "FinTech")
* The underlying graph-building step propagates those labels into the graph by creating `NodeSet` nodes and linking derived chunks and entities via `belongs_to_set` relationships
* This lets you slice a single dataset’s graph by topic or team without creating new datasets, while dataset-level permissions still control overall access

<Columns cols={2}>
  <Card title="Remember" icon="plus" href="/core-concepts/main-operations/remember">
    Direct content into a dataset
  </Card>

  <Card title="Improve" icon="brain-cog" href="/core-concepts/main-operations/improve">
    Enrich memory within a dataset
  </Card>

  <Card title="Recall" icon="search" href="/core-concepts/main-operations/recall">
    Scope queries by dataset
  </Card>

  <Card title="Forget" icon="trash" href="/core-concepts/main-operations/forget">
    Remove datasets and data
  </Card>
</Columns>
