What is a Pipeline

A Pipeline in Cognee is a coordinated sequence of Tasks that work together to transform data. Think of it as an assembly line where each Task handles a specific step, and data flows from one Task to the next until the final output is produced. Pipelines provide:
  • Orchestration — Managing the order and dependencies between Tasks
  • Data flow — Ensuring information passes correctly between processing steps
  • Error handling — Managing failures and retries across the entire workflow
  • Monitoring — Tracking progress and performance of the complete process

Default Pipelines

Cognee comes with three pre-built pipelines that handle the most common workflows:

Data Ingestion Pipeline

Handles the initial processing when you call Add. This pipeline:
  • Processes incoming files and data sources
  • Applies chunking strategies to break down large content
  • Prepares data for subsequent processing stages
  • Manages metadata and provenance tracking

Main Processing Pipeline

Executes when you call Cognify. This pipeline:
  • Extracts entities and relationships from your data
  • Generates embeddings for semantic search
  • Builds the knowledge graph structure
  • Creates connections between related concepts

Code Analysis Pipeline

Specialized for codebases when you call .codify(). This pipeline:
  • Analyzes code structure and dependencies
  • Extracts function and class relationships
  • Maps code architecture and call graphs
  • Enables code-specific search and navigation

Execution Flow

All pipelines follow a similar execution pattern:
  1. Initialization — Set up the pipeline with configuration and resources
  2. Task Sequencing — Execute Tasks in the correct order based on dependencies
  3. Data Transformation — Pass data through each Task, transforming it step by step
  4. Completion — Finalize the process and prepare outputs for storage or further use

Customization

While the default pipelines work for most use cases, you can:
  • Modify existing pipelines by adding, removing, or reordering Tasks
  • Create custom pipelines from scratch using your own Task combinations
  • Override pipeline behavior for specific data types or processing needs
  • Extend pipelines with domain-specific logic and workflows

Examples and details

For detailed information about each specific pipeline, see the dedicated pipeline pages:
  • Data Ingestion Pipeline details
  • Main Processing Pipeline details
  • Code Analysis Pipeline details