What pipelines are
Pipelines coordinate ordered Tasks into a reproducible workflow. Default Cognee operations like Add and Cognify run on top of the same execution layer. You typically do not call low-level functions directly; you trigger pipelines through these operations.Prerequisites
- Dataset: a container (name or UUID) where your data is stored and processed. Every document added to cognee belongs to a dataset.
- User: the identity for ownership and access control. A default user is created and used if none is provided.
- More details are available below
How pipelines run
Somewhat unsurprisingly, the function used to run pipelines is calledrun_pipeline
.
Cognee uses a layered execution model: a single call to run_pipeline
orchestrates multi-dataset processing by running per-file pipelines through the sequence of tasks.
- Statuses are yielded as the pipeline runs and written to databases where appropriate
- User access to datasets and files is carefully verified at each layer
- Pipeline run information includes dataset IDs, completion status, and error handling
- Background execution uses queues to manage status updates and avoid database conflicts
Layered execution
Layered execution
- Innermost layer: individual task execution with telemetry and recursive task running in batches
- Middle layer: per-dataset pipeline management and task orchestration
- Outermost layer: multi-dataset orchestration and overall pipeline execution
- Execution modes: blocking (wait for completion) or background (return immediately with “started” status)
Customization approaches and tips
Customization approaches and tips
Users
Users
- Identity: represents who owns and acts on data. If omitted, a default user is used
- Ownership: every ingested item is tied to a user; content is deduplicated per owner
- Permissions: enforced per dataset (read/write/delete/share) during processing and API access
Datasets
Datasets
- Container: a named or UUID-scoped collection of related data and derived knowledge
- Scoping: Add writes into a specific dataset; Cognify processes the dataset(s) you pass
- Lifecycle: new names create datasets and grant the calling user permissions; UUIDs let you target existing datasets (given permission)