Remember

What is the remember operation

The .remember operation is the main ingestion entry point in Cognee v1.0. It stores information in memory with a single API call. It has two modes:

Permanent memory: without session_id provided, remember() runs the full ingestion pipeline for you — normalizing your data, building the knowledge graph, and enriching it for retrieval, all in one call. Cognee does not create a session ID for this path.
Session memory: with session_id, remember() writes to the session cache for fast short-term memory. If self_improvement=True (the default), it then starts a background Improve pass to bridge that session content into the permanent graph.
Dataset-aware: permanent memory is written into a named dataset, which defaults to main_dataset.

Where remember fits

Use remember() when you want the simplest way to get data into Cognee.
Use it for raw text, files, file lists, and DataItem objects.
Use session_id when you want fast conversational memory first and long-term graph sync second.
Use self_improvement=False if you want to skip the follow-up improvement pass.

What happens under the hood

Permanent memory

Ingest — data is normalized and attached to a named dataset.
Build the graph — documents are chunked, entities and relationships are extracted, and embeddings are created.
Enrich — by default, Cognee runs a follow-up Improve pass that adds derived retrieval structures to the graph.

Session memory

Store in session cache — the content is written to the cache keyed by user and session.
Return immediately — the call completes quickly with a session-stored result.
Optional background bridge — when self_improvement=True (default), Cognee immediately starts Improve in the background to sync that session into the permanent graph.
Manual bridge when disabled — when self_improvement=False, nothing is bridged automatically; the content stays in session cache until you explicitly run cognee.improve(dataset=..., session_ids=[...]).

After remember finishes

Permanent memory: your data is stored in a dataset, chunked, turned into graph structure, embedded for retrieval, and usually enriched with an immediate follow-up improve pass.
Session memory: your data is available quickly through the session cache; it only becomes permanent graph memory if the improvement bridge runs.

Examples and details

Accepted inputs

remember() accepts these input types (any of these, or a list mixing them):

Raw text strings — e.g. "Einstein was born in Ulm."
Local file paths as strings — absolute ("/path/to/doc.pdf"), file:// URLs, or relative paths that resolve to an existing file
HTTP/HTTPS URLs as strings — fetched and ingested in permanent mode
S3 paths as strings — "s3://bucket/key" (requires s3fs)
DataItem objects — a lightweight wrapper for attaching metadata to any of the above (see below)
HTTP upload objects with .file and .filename attributes — e.g. FastAPI UploadFile. This is the supported path for “file-like” inputs in permanent-memory mode; it is what the HTTP API uses internally

In the default blocking mode, plain Python file handles such as open("doc.pdf", "rb") (i.e. BufferedReader) are not accepted and will raise IngestionError: Data type not supported. Pass the file path as a string instead, wrap the bytes in an object that exposes .file and .filename, or use run_in_background=True so Cognee can materialize stream-like inputs before scheduling the pipeline.

In permanent-memory mode:

Plain text is stored directly as memory content.
File paths are ingested and normalized into text before graph building.
HTTP/HTTPS URLs are fetched and passed through the ingestion pipeline.
Dataset scoping still applies through dataset_name.

In session-memory mode, a URL string is stored in the session cache as text. It is not fetched or scraped. Permanent mode is required for URL ingestion to work end-to-end.

What is DataItem?DataItem is a small dataclass exported from cognee.tasks.ingestion.data_item that lets you attach metadata to a single input alongside its content. Use it when you want a stable ID or extra labels travelling with the data through the ingestion pipeline.

from cognee.tasks.ingestion.data_item import DataItem
import cognee

await cognee.remember(
    DataItem(
        data="Einstein was born in Ulm.",   # any accepted input above
        label="biography-note",             # optional free-form label
        external_metadata={"source": "wiki"},  # optional dict
        data_id=None,                       # optional UUID to pin an ID
    )
)

The data field of a DataItem accepts the same types as remember() itself (string text, file path, URL, etc.). You can also pass a list of DataItem objects to ingest several items in one call.

Supported formats

In permanent-memory mode, remember() supports the following file formats out of the box:

Loader	Extensions	Install extra
TextLoader	`.txt` `.md` `.json` `.xml` `.yaml` `.yml` `.log`	built-in
CsvLoader	`.csv`	built-in
PyPdfLoader	`.pdf`	built-in
ImageLoader	`.png` `.jpg` `.jpeg` `.gif` `.webp` `.bmp` `.tif` `.tiff` `.heic` `.avif` `.ico` `.psd` `.apng` `.cr2` `.dwg` `.xcf` `.jxr` `.jpx`	built-in
AudioLoader	`.mp3` `.wav` `.aac` `.flac` `.ogg` `.m4a` `.mid` `.amr` `.aiff`	built-in
UnstructuredLoader	`.docx` `.doc` `.odt` `.xlsx` `.xls` `.ods` `.pptx` `.ppt` `.odp` `.rtf` `.html` `.htm` `.eml` `.msg` `.epub`	`pip install cognee[docs]`
AdvancedPdfLoader	`.pdf` with layout-aware extraction	`pip install cognee[docs]`
BeautifulSoupLoader	`.html`	`pip install cognee[scraping]`
DoclingDocument	pre-converted `DoclingDocument` objects	`pip install cognee[docling]`

This formats table applies only to permanent-memory remember(). In session-memory mode, remember() does not parse files or fetch URLs; it stores the provided content as session text, so plain text inputs are the recommended form. Non-text inputs become text representations rather than parsed source content.

Users and ownership

Permanent-memory remember() writes data into a dataset owned by a user.
If you do not pass a user explicitly, Cognee resolves or creates the default user context.
Ownership affects who can later read, write, share, improve, or forget that dataset.
In session-memory mode, entries are still scoped to the current user in the session cache, so the same session_id under different users does not mean shared memory.

If you need the deeper dataset-permissions model, see Datasets and the multi-user docs.

What permanent remember produces

When remember() runs without session_id, the result is not just stored text. It produces the full retrieval-ready memory stack:

Dataset records in the relational store
Normalized source content attached to that dataset
Chunks created from the ingested content
Graph nodes and edges extracted from those chunks
Embeddings and summaries used for retrieval
Improvement artifacts from the follow-up improve pass when self_improvement=True, depending on the active enrichment configuration

In practice, this means the dataset becomes immediately usable by Recall.

What session remember produces

When remember() runs with session_id, it behaves differently:

The content is written to the session cache for that user and session.
The call returns quickly with a session-stored result.
The content is available for session-aware Recall immediately.
No permanent graph write happens unless the improvement bridge runs.

So session memory is best thought of as fast short-term memory first, with optional long-term graph persistence second.

Session availability and configuration

Session memory is enabled by default in the current Cognee 1.0 configuration.

session_id works through the cache layer used for session storage.
The default cache backend is filesystem-based cache.
If you want to disable session caching entirely, set CACHING=false.
If you want a shared or production-oriented backend, set CACHE_BACKEND=redis.

# Disable session caching
CACHING=false

# Or keep sessions enabled and use Redis
CACHING=true
CACHE_BACKEND=redis

If session caching is disabled or unavailable, remember(..., session_id=...) cannot persist session memory in the normal way. For broader cache behavior and adapters, see Sessions and Caching.

If the session cache is unavailable, remember(..., session_id=...) can still return a session-style result object, but the session content may not actually have been stored. If session memory matters for your workflow, make sure caching is enabled and available.

Latency and cost

The two remember modes have very different cost and latency profiles:

Session memory is the fast path. It writes to session cache and returns quickly.
Permanent memory is the heavier path. It runs ingestion, chunking, graph extraction, embedding, and usually improvement.

That means permanent remember() is the right choice when you want durable graph memory, but it costs more time and more model work than session-only storage.The biggest cost drivers are:

How much data you ingest
How many chunks the content becomes
The graph extraction and summarization work during graph building
Whether self_improvement=True adds a follow-up improvement pass

If you want more control over chunking or prompt behavior, use chunk_size, chunker, and custom_prompt. If you want the lightest permanent ingestion path, set self_improvement=False.

Background execution and result object

remember() returns a promise-like RememberResult.
In blocking mode, it finishes only after the permanent pipeline completes.
With run_in_background=True, it returns immediately and you can await the result later.
Useful fields include status, dataset name, elapsed time, and the raw pipeline result.

With run_in_background=True, file-like inputs (HTTP upload objects exposing .file/.filename, or stream-like objects with .read()) are read up-front and copied into owned in-memory buffers before the background task is scheduled. This lets background runs outlive the originating request or stream — you can pass an upload object or stream and let the call return immediately without the underlying stream being closed out from under the pipeline. Plain text strings, file paths, and URLs are unaffected.

result = await cognee.remember(
    "Einstein was born in Ulm.",
    run_in_background=True,
)

print(result)   # status='running'
await result
print(result)   # status='completed'

Concurrent recall while indexing is running

You can call recall while a permanent-memory remember() run is still indexing in the background. Cognee does not place a global lock over ingestion and retrieval.

Existing indexed data remains searchable while new data is being processed.
Newly ingested content only becomes available to graph-backed recall() once its indexing work has completed.
In practice, that means a background remember() run may produce partial retrieval coverage until the dataset finishes processing.

For multi-process or multi-instance production setups, prefer external graph and relational stores rather than local file-based defaults. See Graph Stores, Relational Databases, and the deployment guides.

Checking indexing status

When remember() writes to permanent memory, it runs a multi-step indexing pipeline in the background of that operation. For larger datasets, you may want to verify that indexing completed before calling recall.

Python SDK
HTTP API

import cognee

result = await cognee.remember(
    "Cognee turns documents into AI memory.",
    dataset_name="docs",
    run_in_background=True,
)

dataset_id = result.dataset_id
status = await cognee.datasets.get_status([dataset_id])

if status.get(str(dataset_id)) == "DATASET_PROCESSING_COMPLETED":
    answers = await cognee.recall(
        query_text="What does Cognee do?",
        datasets=["docs"],
    )

GET /api/v1/datasets/status?dataset=<dataset-uuid>
GET /api/v1/activity/pipeline-runs?dataset_id=<dataset-uuid>

GET /api/v1/datasets/status returns the current indexing status for one or more datasets
GET /api/v1/activity/pipeline-runs returns recent pipeline runs with timestamps and run IDs

Possible status values:

Status	Meaning
`DATASET_PROCESSING_INITIATED`	Pipeline queued, not yet started
`DATASET_PROCESSING_STARTED`	Pipeline actively processing
`DATASET_PROCESSING_COMPLETED`	Indexing finished successfully
`DATASET_PROCESSING_ERRORED`	Pipeline encountered an error

For the low-level API reference, see cognee.datasets.get_status().

RememberResult format

remember() returns a RememberResult object. In docs terms, it behaves like a small status object you can print, inspect, or await.Common fields

Field	What it means
`status`	Current state of the operation. Common values include `running`, `completed`, `errored`, and `session_stored`.
`dataset_name`	The target dataset name used for the operation.
`dataset_id`	Dataset UUID when available.
`session_ids`	Session IDs associated with the result when session bridging is involved.
`pipeline_run_id`	Pipeline run UUID when the permanent pipeline has produced one.
`elapsed_seconds`	Wall-clock time from start to completion.
`items_processed`	Number of processed items when available.
`content_hash`	Content hash of the first processed item when available.
`items`	Per-item metadata such as IDs, names, token counts, MIME type, or content hashes when available.
`raw_result`	The raw pipeline result payload for advanced inspection.
`error`	Error text when the operation fails.

Typical shapes

Permanent memory completed

{
  "status": "completed",
  "dataset_name": "main_dataset",
  "dataset_id": "...",
  "pipeline_run_id": "...",
  "items_processed": 1,
  "elapsed_seconds": 4.2
}

Session memory stored in cache

{
  "status": "session_stored",
  "dataset_name": "main_dataset",
  "session_ids": ["chat_1"],
  "elapsed_seconds": 0.02
}

Background execution running

{
  "status": "running",
  "dataset_name": "main_dataset"
}

Parameters

Basic Parameters
Advanced Parameters

Option	What it does
`data`	The content to store. Can be a string (text, local path, `file://`, `http(s)://`, or `s3://` URL), a `DataItem`, an HTTP upload object exposing `.file`/`.filename`, or a list mixing any of these. Plain `open(..., "rb")` handles are not accepted — see “Accepted inputs”.
`dataset_name`	Chooses the target permanent dataset. Defaults to `main_dataset`.
`session_id`	Switches `remember()` into session-memory mode.
`self_improvement`	Controls whether Improve starts automatically after storage. In session mode, this is what enables or disables automatic bridging into the permanent graph. Defaults to `True`.
`run_in_background`	Starts the permanent pipeline asynchronously.
`chunk_size` / `chunker`	Customizes chunking for the permanent pipeline.
`custom_prompt`	Overrides the graph-extraction prompt used during graph building.
`session_ids`	Syncs new permanent graph knowledge back into specific sessions during the self-improvement pass.

Option	What it does
`dataset_id`	Targets a specific existing dataset by UUID instead of resolving only by name.
`graph_model`	Overrides the graph schema/model used during graph building. Defaults to `KnowledgeGraph`; pass a `DataPoint` subclass to constrain LLM extraction to your own fields and relationships. See Custom Graph Model.
`node_set`	Attaches one or more node-set tags during permanent ingestion.
`preferred_loaders`	Controls which loaders are preferred when ingesting source files.
`incremental_loading`	Controls whether only new or changed data should be processed when reusing a dataset.
`data_per_batch`	Tunes batching for ingestion and pipeline processing in the current `remember()` API. See Cognify for lower-level batching details.
`chunks_per_batch`	Tunes batching during chunk processing in the graph-building stage in the current `remember()` API. See Cognify for lower-level batching details.
`user`	Runs `remember()` under a specific user context instead of the default user. This affects dataset ownership, permissions, and session-cache scoping.
`vector_db_config` / `graph_db_config`	Overrides database backend configuration for the vector or graph stores.
`primary_key`	dlt structured-ingestion option — names the column used for upsert/dedup behavior when `write_disposition="merge"` is used with CSV files, database connection strings, or dlt resources. Forwarded to the underlying ingestion step. See dlt integration.
`write_disposition`	dlt structured-ingestion option — `replace` (default), `merge`, or `append`. Forwarded to the underlying ingestion step. See dlt integration.
`query`	dlt structured-ingestion option — SQL `WHERE`-style filter applied when ingesting from a database connection string. Forwarded to the underlying ingestion step.
`max_rows_per_table`	dlt structured-ingestion option — caps the number of rows pulled per table for a single call, overriding `DLT_MAX_ROWS_PER_TABLE` for that call.

Under the hood — legacy operations

remember() runs Add → Cognify → Improve under the hood.Use the legacy operations directly when you need explicit control over each step — for example, to inspect intermediate results, tune pipeline parameters independently, or integrate ingestion and graph-building into a more complex workflow.

Inspect what you've stored

After calling remember(), you can list datasets and browse their contents using cognee.datasets:

import cognee

# List all datasets
datasets = await cognee.datasets.list_datasets()
for ds in datasets:
    print(ds.name, ds.id)

# List data items inside a dataset
items = await cognee.datasets.list_data(dataset_id=ds.id)
for item in items:
    print(item.id, item.name)

See datasets API reference for the full set of listing and management methods.

Recall

Query memory with auto-routing and session awareness

Improve

Enrich the graph and bridge session memory

Getting Started

Core Concepts

Setup Configuration

Guides

Examples

CLI

Rust SDK

TypeScript SDK

OSS

What is the remember operation

Where remember fits

What happens under the hood

Permanent memory

Session memory

After remember finishes

Examples and details

Recall

Improve

​What is the remember operation

​Where remember fits

​What happens under the hood

​Permanent memory

​Session memory

​After remember finishes

​Examples and details

Recall

Improve

What is the remember operation

Where remember fits

What happens under the hood

Permanent memory

Session memory

After remember finishes

Examples and details