What is the remember operation
The.remember operation is the main ingestion entry point in Cognee v1.0. It stores information in memory with a single API call.
It has two modes:
- Permanent memory: without
session_idprovided,remember()runs the full ingestion pipeline for you — normalizing your data, building the knowledge graph, and enriching it for retrieval, all in one call. - Session memory: with
session_id,remember()writes to the session cache for fast short-term memory. Ifself_improvement=True(the default), it then starts a background Improve pass to bridge that session content into the permanent graph. - Dataset-aware: permanent memory is written into a named dataset, which defaults to
main_dataset.
Where remember fits
- Use
remember()when you want the simplest way to get data into Cognee. - Use it for raw text, files, file lists, and
DataItemobjects. - Use
session_idwhen you want fast conversational memory first and long-term graph sync second. - Use
self_improvement=Falseif you want to skip the follow-up improvement pass.
What happens under the hood
Permanent memory
- Ingest — data is normalized and attached to a named dataset.
- Build the graph — documents are chunked, entities and relationships are extracted, and embeddings are created.
- Enrich — by default, Cognee runs a follow-up Improve pass that adds derived retrieval structures to the graph.
Session memory
- Store in session cache — the content is written to the cache keyed by user and session.
- Return immediately — the call completes quickly with a session-stored result.
- Optional background bridge — when
self_improvement=True(default), Cognee immediately starts Improve in the background to sync that session into the permanent graph. - Manual bridge when disabled — when
self_improvement=False, nothing is bridged automatically; the content stays in session cache until you explicitly runcognee.improve(dataset=..., session_ids=[...]).
After remember finishes
- Permanent memory: your data is stored in a dataset, chunked, turned into graph structure, embedded for retrieval, and usually enriched with an immediate follow-up improve pass.
- Session memory: your data is available quickly through the session cache; it only becomes permanent graph memory if the improvement bridge runs.
Examples and details
Accepted inputs
Accepted inputs
- Raw text strings
- Lists of strings
- Local file paths
- URLs
- File-like binary streams
DataItemobjects and lists ofDataItemobjects
- Plain text is stored directly as memory content.
- File paths are ingested and normalized into text before graph building.
- HTTP/HTTPS URLs are fetched and passed through the ingestion pipeline.
DataItemlets you attach labels, metadata, or explicit IDs per item.- Dataset scoping still applies through
dataset_name.
In session-memory mode, a URL string is stored in the session cache as text. It is not fetched or scraped. Permanent mode is required for URL ingestion to work end-to-end.
Supported formats
Supported formats
In permanent-memory mode,
remember() supports the following file formats out of the box:| Loader | Extensions | Install extra |
|---|---|---|
| TextLoader | .txt .md .json .xml .yaml .yml .log | built-in |
| CsvLoader | .csv | built-in |
| PyPdfLoader | .pdf | built-in |
| ImageLoader | .png .jpg .jpeg .gif .webp .bmp .tif .tiff .heic .avif .ico .psd .apng .cr2 .dwg .xcf .jxr .jpx | built-in |
| AudioLoader | .mp3 .wav .aac .flac .ogg .m4a .mid .amr .aiff | built-in |
| UnstructuredLoader | .docx .doc .odt .xlsx .xls .ods .pptx .ppt .odp .rtf .html .htm .eml .msg .epub | pip install cognee[docs] |
| AdvancedPdfLoader | .pdf with layout-aware extraction | pip install cognee[docs] |
| BeautifulSoupLoader | .html | pip install cognee[scraping] |
| DoclingDocument | pre-converted DoclingDocument objects | pip install cognee[docling] |
This formats table applies only to permanent-memory
remember(). In session-memory mode, remember() does not parse files or fetch URLs; it stores the provided content as session text, so plain text inputs are the recommended form. Non-text inputs become text representations rather than parsed source content.Users and ownership
Users and ownership
- Permanent-memory
remember()writes data into a dataset owned by a user. - If you do not pass a user explicitly, Cognee resolves or creates the default user context.
- Ownership affects who can later read, write, share, improve, or forget that dataset.
- In session-memory mode, entries are still scoped to the current user in the session cache, so the same
session_idunder different users does not mean shared memory.
What permanent remember produces
What permanent remember produces
When
remember() runs without session_id, the result is not just stored text. It produces the full retrieval-ready memory stack:- Dataset records in the relational store
- Normalized source content attached to that dataset
- Chunks created from the ingested content
- Graph nodes and edges extracted from those chunks
- Embeddings and summaries used for retrieval
- Improvement artifacts from the follow-up improve pass when
self_improvement=True, depending on the active enrichment configuration
What session remember produces
What session remember produces
When
remember() runs with session_id, it behaves differently:- The content is written to the session cache for that user and session.
- The call returns quickly with a session-stored result.
- The content is available for session-aware Recall immediately.
- No permanent graph write happens unless the improvement bridge runs.
Session availability and configuration
Session availability and configuration
Session memory is enabled by default in the current Cognee 1.0 configuration.If session caching is disabled or unavailable,
session_idworks through the cache layer used for session storage.- The default cache backend is filesystem-based cache.
- If you want to disable session caching entirely, set
CACHING=false. - If you want a shared or production-oriented backend, set
CACHE_BACKEND=redis.
remember(..., session_id=...) cannot persist session memory in the normal way. For broader cache behavior and adapters, see Sessions and Caching.If the session cache is unavailable,
remember(..., session_id=...) can still return a session-style result object, but the session content may not actually have been stored. If session memory matters for your workflow, make sure caching is enabled and available.Latency and cost
Latency and cost
The two remember modes have very different cost and latency profiles:
- Session memory is the fast path. It writes to session cache and returns quickly.
- Permanent memory is the heavier path. It runs ingestion, chunking, graph extraction, embedding, and usually improvement.
remember() is the right choice when you want durable graph memory, but it costs more time and more model work than session-only storage.The biggest cost drivers are:- How much data you ingest
- How many chunks the content becomes
- The graph extraction and summarization work during graph building
- Whether
self_improvement=Trueadds a follow-up improvement pass
chunk_size, chunker, and custom_prompt. If you want the lightest permanent ingestion path, set self_improvement=False.Background execution and result object
Background execution and result object
remember()returns a promise-likeRememberResult.- In blocking mode, it finishes only after the permanent pipeline completes.
- With
run_in_background=True, it returns immediately and you canawaitthe result later. - Useful fields include status, dataset name, elapsed time, and the raw pipeline result.
RememberResult format
RememberResult format
remember() returns a RememberResult object. In docs terms, it behaves like a small status object you can print, inspect, or await.Common fields| Field | What it means |
|---|---|
status | Current state of the operation. Common values include running, completed, errored, and session_stored. |
dataset_name | The target dataset name used for the operation. |
dataset_id | Dataset UUID when available. |
session_ids | Session IDs associated with the result when session bridging is involved. |
pipeline_run_id | Pipeline run UUID when the permanent pipeline has produced one. |
elapsed_seconds | Wall-clock time from start to completion. |
items_processed | Number of processed items when available. |
content_hash | Content hash of the first processed item when available. |
items | Per-item metadata such as IDs, names, token counts, MIME type, or content hashes when available. |
raw_result | The raw pipeline result payload for advanced inspection. |
error | Error text when the operation fails. |
Permanent memory completed
Permanent memory completed
Session memory stored in cache
Session memory stored in cache
Background execution running
Background execution running
Parameters
Parameters
- Basic Parameters
- Advanced Parameters
| Option | What it does |
|---|---|
data | The content to store. Can be text, file paths, URLs, file-like objects, DataItem, or lists of those inputs depending on mode. |
dataset_name | Chooses the target permanent dataset. Defaults to main_dataset. |
session_id | Switches remember() into session-memory mode. |
self_improvement | Controls whether Improve starts automatically after storage. In session mode, this is what enables or disables automatic bridging into the permanent graph. Defaults to True. |
run_in_background | Starts the permanent pipeline asynchronously. |
chunk_size / chunker | Customizes chunking for the permanent pipeline. |
custom_prompt | Overrides the graph-extraction prompt used during graph building. |
session_ids | Syncs new permanent graph knowledge back into specific sessions during the self-improvement pass. |
Under the hood — legacy operations
Under the hood — legacy operations
Recall
Query memory with auto-routing and session awareness
Improve
Enrich the graph and bridge session memory