What pipelines are
Pipelines coordinate ordered Tasks into a reproducible workflow. Default Cognee operations like Add and Cognify run on top of the same execution layer. You typically do not call low-level functions directly; you trigger pipelines through these operations.Prerequisites
- Dataset: a container (name or UUID) where your data is stored and processed. Every document added to cognee belongs to a dataset.
 - User: the identity for ownership and access control. A default user is created and used if none is provided.
 - More details are available below
 
How pipelines run
Somewhat unsurprisingly, the function used to run pipelines is calledrun_pipeline.
Cognee uses a layered execution model: a single call to run_pipeline orchestrates multi-dataset processing by running per-file pipelines through the sequence of tasks.
- Statuses are yielded as the pipeline runs and written to databases where appropriate
 - User access to datasets and files is carefully verified at each layer
 - Pipeline run information includes dataset IDs, completion status, and error handling
 - Background execution uses queues to manage status updates and avoid database conflicts
 
Layered execution
Layered execution
- Innermost layer: individual task execution with telemetry and recursive task running in batches
 - Middle layer: per-dataset pipeline management and task orchestration
 - Outermost layer: multi-dataset orchestration and overall pipeline execution
 - Execution modes: blocking (wait for completion) or background (return immediately with “started” status)
 
Customization approaches and tips
Customization approaches and tips
Users
Users
- Identity: represents who owns and acts on data. If omitted, a default user is used
 - Ownership: every ingested item is tied to a user; content is deduplicated per owner
 - Permissions: enforced per dataset (read/write/delete/share) during processing and API access
 
Datasets
Datasets
- Container: a named or UUID-scoped collection of related data and derived knowledge
 - Scoping: Add writes into a specific dataset; Cognify processes the dataset(s) you pass
 - Lifecycle: new names create datasets and grant the calling user permissions; UUIDs let you target existing datasets (given permission)