Skip to main content

What is the remember operation

The .remember operation is the main ingestion entry point in Cognee v1.0. It stores information in memory with a single API call. It has two modes:
  • Permanent memory: without session_id provided, remember() runs the full ingestion pipeline for you — normalizing your data, building the knowledge graph, and enriching it for retrieval, all in one call.
  • Session memory: with session_id, remember() writes to the session cache for fast short-term memory. If self_improvement=True (the default), it then starts a background Improve pass to bridge that session content into the permanent graph.
  • Dataset-aware: permanent memory is written into a named dataset, which defaults to main_dataset.

Where remember fits

  • Use remember() when you want the simplest way to get data into Cognee.
  • Use it for raw text, files, file lists, and DataItem objects.
  • Use session_id when you want fast conversational memory first and long-term graph sync second.
  • Use self_improvement=False if you want to skip the follow-up improvement pass.

What happens under the hood

Permanent memory

  1. Ingest — data is normalized and attached to a named dataset.
  2. Build the graph — documents are chunked, entities and relationships are extracted, and embeddings are created.
  3. Enrich — by default, Cognee runs a follow-up Improve pass that adds derived retrieval structures to the graph.

Session memory

  1. Store in session cache — the content is written to the cache keyed by user and session.
  2. Return immediately — the call completes quickly with a session-stored result.
  3. Optional background bridge — when self_improvement=True (default), Cognee immediately starts Improve in the background to sync that session into the permanent graph.
  4. Manual bridge when disabled — when self_improvement=False, nothing is bridged automatically; the content stays in session cache until you explicitly run cognee.improve(dataset=..., session_ids=[...]).

After remember finishes

  • Permanent memory: your data is stored in a dataset, chunked, turned into graph structure, embedded for retrieval, and usually enriched with an immediate follow-up improve pass.
  • Session memory: your data is available quickly through the session cache; it only becomes permanent graph memory if the improvement bridge runs.

Examples and details

  • Raw text strings
  • Lists of strings
  • Local file paths
  • URLs
  • File-like binary streams
  • DataItem objects and lists of DataItem objects
In permanent-memory mode:
  • Plain text is stored directly as memory content.
  • File paths are ingested and normalized into text before graph building.
  • HTTP/HTTPS URLs are fetched and passed through the ingestion pipeline.
  • DataItem lets you attach labels, metadata, or explicit IDs per item.
  • Dataset scoping still applies through dataset_name.
In session-memory mode, a URL string is stored in the session cache as text. It is not fetched or scraped. Permanent mode is required for URL ingestion to work end-to-end.
In permanent-memory mode, remember() supports the following file formats out of the box:
LoaderExtensionsInstall extra
TextLoader.txt .md .json .xml .yaml .yml .logbuilt-in
CsvLoader.csvbuilt-in
PyPdfLoader.pdfbuilt-in
ImageLoader.png .jpg .jpeg .gif .webp .bmp .tif .tiff .heic .avif .ico .psd .apng .cr2 .dwg .xcf .jxr .jpxbuilt-in
AudioLoader.mp3 .wav .aac .flac .ogg .m4a .mid .amr .aiffbuilt-in
UnstructuredLoader.docx .doc .odt .xlsx .xls .ods .pptx .ppt .odp .rtf .html .htm .eml .msg .epubpip install cognee[docs]
AdvancedPdfLoader.pdf with layout-aware extractionpip install cognee[docs]
BeautifulSoupLoader.htmlpip install cognee[scraping]
DoclingDocumentpre-converted DoclingDocument objectspip install cognee[docling]
This formats table applies only to permanent-memory remember(). In session-memory mode, remember() does not parse files or fetch URLs; it stores the provided content as session text, so plain text inputs are the recommended form. Non-text inputs become text representations rather than parsed source content.
  • Permanent-memory remember() writes data into a dataset owned by a user.
  • If you do not pass a user explicitly, Cognee resolves or creates the default user context.
  • Ownership affects who can later read, write, share, improve, or forget that dataset.
  • In session-memory mode, entries are still scoped to the current user in the session cache, so the same session_id under different users does not mean shared memory.
If you need the deeper dataset-permissions model, see Datasets and the multi-user docs.
When remember() runs without session_id, the result is not just stored text. It produces the full retrieval-ready memory stack:
  • Dataset records in the relational store
  • Normalized source content attached to that dataset
  • Chunks created from the ingested content
  • Graph nodes and edges extracted from those chunks
  • Embeddings and summaries used for retrieval
  • Improvement artifacts from the follow-up improve pass when self_improvement=True, depending on the active enrichment configuration
In practice, this means the dataset becomes immediately usable by Recall.
When remember() runs with session_id, it behaves differently:
  • The content is written to the session cache for that user and session.
  • The call returns quickly with a session-stored result.
  • The content is available for session-aware Recall immediately.
  • No permanent graph write happens unless the improvement bridge runs.
So session memory is best thought of as fast short-term memory first, with optional long-term graph persistence second.
Session memory is enabled by default in the current Cognee 1.0 configuration.
  • session_id works through the cache layer used for session storage.
  • The default cache backend is filesystem-based cache.
  • If you want to disable session caching entirely, set CACHING=false.
  • If you want a shared or production-oriented backend, set CACHE_BACKEND=redis.
# Disable session caching
CACHING=false

# Or keep sessions enabled and use Redis
CACHING=true
CACHE_BACKEND=redis
If session caching is disabled or unavailable, remember(..., session_id=...) cannot persist session memory in the normal way. For broader cache behavior and adapters, see Sessions and Caching.
If the session cache is unavailable, remember(..., session_id=...) can still return a session-style result object, but the session content may not actually have been stored. If session memory matters for your workflow, make sure caching is enabled and available.
The two remember modes have very different cost and latency profiles:
  • Session memory is the fast path. It writes to session cache and returns quickly.
  • Permanent memory is the heavier path. It runs ingestion, chunking, graph extraction, embedding, and usually improvement.
That means permanent remember() is the right choice when you want durable graph memory, but it costs more time and more model work than session-only storage.The biggest cost drivers are:
  • How much data you ingest
  • How many chunks the content becomes
  • The graph extraction and summarization work during graph building
  • Whether self_improvement=True adds a follow-up improvement pass
If you want more control over chunking or prompt behavior, use chunk_size, chunker, and custom_prompt. If you want the lightest permanent ingestion path, set self_improvement=False.
  • remember() returns a promise-like RememberResult.
  • In blocking mode, it finishes only after the permanent pipeline completes.
  • With run_in_background=True, it returns immediately and you can await the result later.
  • Useful fields include status, dataset name, elapsed time, and the raw pipeline result.
result = await cognee.remember(
    "Einstein was born in Ulm.",
    run_in_background=True,
)

print(result)   # status='running'
await result
print(result)   # status='completed'
remember() returns a RememberResult object. In docs terms, it behaves like a small status object you can print, inspect, or await.Common fields
FieldWhat it means
statusCurrent state of the operation. Common values include running, completed, errored, and session_stored.
dataset_nameThe target dataset name used for the operation.
dataset_idDataset UUID when available.
session_idsSession IDs associated with the result when session bridging is involved.
pipeline_run_idPipeline run UUID when the permanent pipeline has produced one.
elapsed_secondsWall-clock time from start to completion.
items_processedNumber of processed items when available.
content_hashContent hash of the first processed item when available.
itemsPer-item metadata such as IDs, names, token counts, MIME type, or content hashes when available.
raw_resultThe raw pipeline result payload for advanced inspection.
errorError text when the operation fails.
Typical shapes
{
  "status": "completed",
  "dataset_name": "main_dataset",
  "dataset_id": "...",
  "pipeline_run_id": "...",
  "items_processed": 1,
  "elapsed_seconds": 4.2
}
{
  "status": "session_stored",
  "dataset_name": "main_dataset",
  "session_ids": ["chat_1"],
  "elapsed_seconds": 0.02
}
{
  "status": "running",
  "dataset_name": "main_dataset"
}
OptionWhat it does
dataThe content to store. Can be text, file paths, URLs, file-like objects, DataItem, or lists of those inputs depending on mode.
dataset_nameChooses the target permanent dataset. Defaults to main_dataset.
session_idSwitches remember() into session-memory mode.
self_improvementControls whether Improve starts automatically after storage. In session mode, this is what enables or disables automatic bridging into the permanent graph. Defaults to True.
run_in_backgroundStarts the permanent pipeline asynchronously.
chunk_size / chunkerCustomizes chunking for the permanent pipeline.
custom_promptOverrides the graph-extraction prompt used during graph building.
session_idsSyncs new permanent graph knowledge back into specific sessions during the self-improvement pass.
remember() runs AddCognifyImprove under the hood.Use the legacy operations directly when you need explicit control over each step — for example, to inspect intermediate results, tune pipeline parameters independently, or integrate ingestion and graph-building into a more complex workflow.

Recall

Query memory with auto-routing and session awareness

Improve

Enrich the graph and bridge session memory