cognee.datasets
Static class for managing datasets and their data.Methods
datasets.list_datasets()
| Parameter | Type | Default | Notes |
|---|---|---|---|
user | Optional[User] | None | If omitted, Cognee resolves the default user. |
datasets.discover_datasets()
| Parameter | Type | Default | Notes |
|---|---|---|---|
directory_path | str | required | Local directory to scan for dataset-style subdirectories. |
datasets.list_data()
Data records in a dataset.
This is the API to use when you want to read back DataItem fields stored during cognee.add(), such as label and external_metadata.
| Parameter | Type | Default | Notes |
|---|---|---|---|
dataset_id | UUID | required | Dataset UUID to inspect. |
user | Optional[User] | None | If omitted, Cognee resolves the default user before permission checks. |
datasets.has_data()
| Parameter | Type | Default | Notes |
|---|---|---|---|
dataset_id | str | required | Dataset identifier to check. |
user | Optional[User] | None | If omitted, Cognee resolves the default user before permission checks. |
datasets.get_status()
pipeline_names is omitted, this method keeps the legacy flat shape and returns the status of cognify_pipeline only.
| Parameter | Type | Default | Notes |
|---|---|---|---|
dataset_ids | list[UUID] | required | Dataset UUIDs to check. |
pipeline_names | Optional[list[str]] | None | Pipeline names to query. If omitted, defaults to cognify_pipeline. Duplicate names are deduplicated while preserving order. |
pipeline_names or a single pipeline name, the method returns {str(dataset_id): PipelineRunStatus}.
With multiple pipeline names, it returns {str(dataset_id): {pipeline_name: PipelineRunStatus}}.
Possible values:
| Value | Meaning |
|---|---|
DATASET_PROCESSING_INITIATED | Pipeline queued but not yet started |
DATASET_PROCESSING_STARTED | Pipeline is running |
DATASET_PROCESSING_COMPLETED | Indexing finished successfully |
DATASET_PROCESSING_ERRORED | Processing failed |
datasets.empty_dataset()
| Parameter | Type | Default | Notes |
|---|---|---|---|
dataset_id | UUID | required | Dataset UUID to empty. |
user | Optional[User] | None | If omitted, Cognee resolves the default user and checks delete permission. |
Notes
Notes
Despite the name,
empty_dataset() does not leave an empty dataset record behind. It deletes graph content, data records, and the dataset entity itself.datasets.delete_data()
| Parameter | Type | Default | Notes |
|---|---|---|---|
dataset_id | UUID | required | Dataset UUID containing the target data item. |
data_id | UUID | required | Data item UUID to delete. |
user | Optional[User] | None | If omitted, Cognee resolves the default user and checks delete permission. |
mode | str | soft | Kept for backward compatibility. The implementation warns against using "hard". |
delete_dataset_if_empty | bool | False | If True, deletes the dataset when the removed item was its last remaining data item. |
Notes
Notes
datasets.delete_all()
| Parameter | Type | Default | Notes |
|---|---|---|---|
user | Optional[User] | None | If omitted, Cognee resolves the default user. |
Examples
Basic dataset operations
Basic dataset operations
Poll for indexing completion across parallel datasets
Poll for indexing completion across parallel datasets
Use The same pattern works when indexing is triggered via the HTTP API — poll
get_status() in a wait loop to confirm all datasets in a parallel batch have finished indexing before querying.get_status() from a separate process until all datasets reach DATASET_PROCESSING_COMPLETED or DATASET_PROCESSING_ERRORED.Read back DataItem metadata
Read back DataItem metadata
external_metadata is stored on the relational Data record only. It is not placed into the vector store or knowledge graph and is not returned by cognee.search(). If you need metadata to be vector-searchable, define a custom DataPoint subclass and list the fields to embed in metadata.index_fields. See DataPoints.