What is the add operation

The .add operation is how you bring content into Cognee. It takes your files, directories, or raw text, normalizes them into plain text, and records them into a dataset that Cognee can later expand into vectors and graphs with Cognify.
  • Ingestion-only: no embeddings, no graph yet
  • Flexible input: raw text, local files, directories, or S3 URIs
  • Normalized storage: everything is turned into text and stored consistently
  • Deduplicated: Cognee uses content hashes to avoid duplicates
  • Dataset-first: everything you add goes into a dataset
    • Datasets are how Cognee keeps different collections organized (e.g. “research-papers”, “customer-reports”)
    • Each dataset has its own ID, owner, and permissions for access control
    • You can read more about them below

Where add fits

  • First step before you run Cognify
  • Use it to create a dataset from scratch, or append new data over time
  • Ideal for both local experiments and programmatic ingestion from storage (e.g. S3)

What happens under the hood

  1. Expand your input
    • Directories are walked, S3 paths are expanded, raw text is passed through
    • Result: a flat list of items (files, text, handles)
  2. Ingest and register
    • Files are saved into Cognee’s storage and converted to text
    • Cognee computes a stable content hash to prevent duplicates
    • Each item becomes a record in the database and is attached to your dataset
    • Text extraction: Converts various file formats into plain text
    • Metadata preservation: Keeps file information like source, creation date, and format
    • Content normalization: Ensures consistent text encoding and formatting
  3. Return a summary
    • You get a pipeline run info object that tells you where everything went and which dataset is ready for the next step

After add finishes

After .add completes, your data is ready for the next stage:
  • Files are safely stored in Cognee’s storage system with metadata preserved
  • Database records track each ingested item and link it to your dataset
  • Dataset is prepared for transformation with Cognify — which will chunk, embed, and connect everything

Further details