> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cognee.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Remember

> Store data in Cognee's permanent graph or session memory

## What is the remember operation

The `.remember` operation is the main ingestion entry point in Cognee v1.0. It stores information in memory with a single API call.

It has two modes:

* **Permanent memory**: without `session_id` provided, `remember()` runs the full ingestion pipeline for you — normalizing your data, building the knowledge graph, and enriching it for retrieval, all in one call.
* **Session memory**: with `session_id`, `remember()` writes to the session cache for fast short-term memory. If `self_improvement=True` (the default), it then starts a background [Improve](/core-concepts/main-operations/improve) pass to bridge that session content into the permanent graph.
* **Dataset-aware**: permanent memory is written into a named dataset, which defaults to `main_dataset`.

## Where remember fits

* Use `remember()` when you want the simplest way to get data into Cognee.
* Use it for raw text, files, file lists, and `DataItem` objects.
* Use `session_id` when you want fast conversational memory first and long-term graph sync second.
* Use `self_improvement=False` if you want to skip the follow-up improvement pass.

## What happens under the hood

### Permanent memory

1. **Ingest** — data is normalized and attached to a named dataset.
2. **Build the graph** — documents are chunked, entities and relationships are extracted, and embeddings are created.
3. **Enrich** — by default, Cognee runs a follow-up [Improve](/core-concepts/main-operations/improve) pass that adds derived retrieval structures to the graph.

### Session memory

1. **Store in session cache** — the content is written to the cache keyed by user and session.
2. **Return immediately** — the call completes quickly with a session-stored result.
3. **Optional background bridge** — when `self_improvement=True` (default), Cognee immediately starts [Improve](/core-concepts/main-operations/improve) in the background to sync that session into the permanent graph.
4. **Manual bridge when disabled** — when `self_improvement=False`, nothing is bridged automatically; the content stays in session cache until you explicitly run `cognee.improve(dataset=..., session_ids=[...])`.

## After remember finishes

* **Permanent memory**: your data is stored in a dataset, chunked, turned into graph structure, embedded for retrieval, and usually enriched with an immediate follow-up improve pass.
* **Session memory**: your data is available quickly through the session cache; it only becomes permanent graph memory if the improvement bridge runs.

## Examples and details

<Accordion title="Accepted inputs">
  * Raw text strings
  * Lists of strings
  * Local file paths
  * URLs
  * File-like binary streams
  * `DataItem` objects and lists of `DataItem` objects

  In permanent-memory mode:

  * Plain text is stored directly as memory content.
  * File paths are ingested and normalized into text before graph building.
  * HTTP/HTTPS URLs are fetched and passed through the ingestion pipeline.
  * `DataItem` lets you attach labels, metadata, or explicit IDs per item.
  * Dataset scoping still applies through `dataset_name`.

  <Note>
    In session-memory mode, a URL string is stored in the session cache as text. It is not fetched or scraped. Permanent mode is required for URL ingestion to work end-to-end.
  </Note>
</Accordion>

<Accordion title="Supported formats">
  In permanent-memory mode, `remember()` supports the following file formats out of the box:

  | Loader                  | Extensions                                                                                                                          | Install extra                  |
  | ----------------------- | ----------------------------------------------------------------------------------------------------------------------------------- | ------------------------------ |
  | **TextLoader**          | `.txt` `.md` `.json` `.xml` `.yaml` `.yml` `.log`                                                                                   | built-in                       |
  | **CsvLoader**           | `.csv`                                                                                                                              | built-in                       |
  | **PyPdfLoader**         | `.pdf`                                                                                                                              | built-in                       |
  | **ImageLoader**         | `.png` `.jpg` `.jpeg` `.gif` `.webp` `.bmp` `.tif` `.tiff` `.heic` `.avif` `.ico` `.psd` `.apng` `.cr2` `.dwg` `.xcf` `.jxr` `.jpx` | built-in                       |
  | **AudioLoader**         | `.mp3` `.wav` `.aac` `.flac` `.ogg` `.m4a` `.mid` `.amr` `.aiff`                                                                    | built-in                       |
  | **UnstructuredLoader**  | `.docx` `.doc` `.odt` `.xlsx` `.xls` `.ods` `.pptx` `.ppt` `.odp` `.rtf` `.html` `.htm` `.eml` `.msg` `.epub`                       | `pip install cognee[docs]`     |
  | **AdvancedPdfLoader**   | `.pdf` with layout-aware extraction                                                                                                 | `pip install cognee[docs]`     |
  | **BeautifulSoupLoader** | `.html`                                                                                                                             | `pip install cognee[scraping]` |
  | **DoclingDocument**     | pre-converted `DoclingDocument` objects                                                                                             | `pip install cognee[docling]`  |

  <Note>
    This formats table applies only to permanent-memory `remember()`. In session-memory mode, `remember()` does not parse files or fetch URLs; it stores the provided content as session text, so plain text inputs are the recommended form. Non-text inputs become text representations rather than parsed source content.
  </Note>
</Accordion>

<Accordion title="Users and ownership">
  * Permanent-memory `remember()` writes data into a dataset owned by a user.
  * If you do not pass a user explicitly, Cognee resolves or creates the default user context.
  * Ownership affects who can later read, write, share, improve, or forget that dataset.
  * In session-memory mode, entries are still scoped to the current user in the session cache, so the same `session_id` under different users does not mean shared memory.

  If you need the deeper dataset-permissions model, see [Datasets](/core-concepts/further-concepts/datasets) and the multi-user docs.
</Accordion>

<Accordion title="What permanent remember produces">
  When `remember()` runs without `session_id`, the result is not just stored text. It produces the full retrieval-ready memory stack:

  * **Dataset records** in the relational store
  * **Normalized source content** attached to that dataset
  * **Chunks** created from the ingested content
  * **Graph nodes and edges** extracted from those chunks
  * **Embeddings and summaries** used for retrieval
  * **Improvement artifacts** from the follow-up improve pass when `self_improvement=True`, depending on the active enrichment configuration

  In practice, this means the dataset becomes immediately usable by [Recall](/core-concepts/main-operations/recall).
</Accordion>

<Accordion title="What session remember produces">
  When `remember()` runs with `session_id`, it behaves differently:

  * The content is written to the session cache for that user and session.
  * The call returns quickly with a session-stored result.
  * The content is available for session-aware [Recall](/core-concepts/main-operations/recall) immediately.
  * No permanent graph write happens unless the improvement bridge runs.

  So session memory is best thought of as **fast short-term memory first**, with optional long-term graph persistence second.
</Accordion>

<Accordion title="Session availability and configuration">
  Session memory is **enabled by default** in the current Cognee 1.0 configuration.

  * `session_id` works through the cache layer used for session storage.
  * The default cache backend is filesystem-based cache.
  * If you want to disable session caching entirely, set `CACHING=false`.
  * If you want a shared or production-oriented backend, set `CACHE_BACKEND=redis`.

  ```dotenv theme={null}
  # Disable session caching
  CACHING=false

  # Or keep sessions enabled and use Redis
  CACHING=true
  CACHE_BACKEND=redis
  ```

  If session caching is disabled or unavailable, `remember(..., session_id=...)` cannot persist session memory in the normal way. For broader cache behavior and adapters, see [Sessions and Caching](/core-concepts/sessions-and-caching).

  <Note>
    If the session cache is unavailable, `remember(..., session_id=...)` can still return a session-style result object, but the session content may not actually have been stored. If session memory matters for your workflow, make sure caching is enabled and available.
  </Note>
</Accordion>

<Accordion title="Latency and cost">
  The two remember modes have very different cost and latency profiles:

  * **Session memory** is the fast path. It writes to session cache and returns quickly.
  * **Permanent memory** is the heavier path. It runs ingestion, chunking, graph extraction, embedding, and usually improvement.

  That means permanent `remember()` is the right choice when you want durable graph memory, but it costs more time and more model work than session-only storage.

  The biggest cost drivers are:

  * How much data you ingest
  * How many chunks the content becomes
  * The graph extraction and summarization work during graph building
  * Whether `self_improvement=True` adds a follow-up improvement pass

  If you want more control over chunking or prompt behavior, use `chunk_size`, `chunker`, and `custom_prompt`. If you want the lightest permanent ingestion path, set `self_improvement=False`.
</Accordion>

<Accordion title="Background execution and result object">
  * `remember()` returns a promise-like `RememberResult`.
  * In blocking mode, it finishes only after the permanent pipeline completes.
  * With `run_in_background=True`, it returns immediately and you can `await` the result later.
  * Useful fields include status, dataset name, elapsed time, and the raw pipeline result.

  ```python theme={null}
  result = await cognee.remember(
      "Einstein was born in Ulm.",
      run_in_background=True,
  )

  print(result)   # status='running'
  await result
  print(result)   # status='completed'
  ```
</Accordion>

<Accordion title="Concurrent recall while indexing is running">
  You can call [recall](/core-concepts/main-operations/recall) while a permanent-memory `remember()` run is still indexing in the background. Cognee does not place a global lock over ingestion and retrieval.

  * Existing indexed data remains searchable while new data is being processed.
  * Newly ingested content only becomes available to graph-backed `recall()` once its indexing work has completed.
  * In practice, that means a background `remember()` run may produce **partial retrieval coverage** until the dataset finishes processing.

  For multi-process or multi-instance production setups, prefer external graph and relational stores rather than local file-based defaults. See [Graph Stores](/setup-configuration/graph-stores), [Relational Databases](/setup-configuration/relational-databases), and the [deployment guides](/how-to-guides/cognee-sdk/deployment).
</Accordion>

<Accordion title="Checking indexing status">
  When `remember()` writes to permanent memory, it runs a multi-step indexing pipeline in the background of that operation. For larger datasets, you may want to verify that indexing completed before calling [recall](/core-concepts/main-operations/recall).

  <Tabs>
    <Tab title="Python SDK">
      ```python theme={null}
      import cognee

      result = await cognee.remember(
          "Cognee turns documents into AI memory.",
          dataset_name="docs",
          run_in_background=True,
      )

      dataset_id = result.dataset_id
      status = await cognee.datasets.get_status([dataset_id])

      if status.get(str(dataset_id)) == "DATASET_PROCESSING_COMPLETED":
          answers = await cognee.recall(
              query_text="What does Cognee do?",
              datasets=["docs"],
          )
      ```
    </Tab>

    <Tab title="HTTP API">
      ```http theme={null}
      GET /api/v1/datasets/status?dataset=<dataset-uuid>
      GET /api/v1/activity/pipeline-runs?dataset_id=<dataset-uuid>
      ```

      * `GET /api/v1/datasets/status` returns the current indexing status for one or more datasets
      * `GET /api/v1/activity/pipeline-runs` returns recent pipeline runs with timestamps and run IDs
    </Tab>
  </Tabs>

  Possible status values:

  | Status                         | Meaning                          |
  | ------------------------------ | -------------------------------- |
  | `DATASET_PROCESSING_INITIATED` | Pipeline queued, not yet started |
  | `DATASET_PROCESSING_STARTED`   | Pipeline actively processing     |
  | `DATASET_PROCESSING_COMPLETED` | Indexing finished successfully   |
  | `DATASET_PROCESSING_ERRORED`   | Pipeline encountered an error    |

  For the low-level API reference, see [cognee.datasets.get\_status()](/python-api/datasets#datasetsget_status).
</Accordion>

<Accordion title="RememberResult format">
  `remember()` returns a `RememberResult` object. In docs terms, it behaves like a small status object you can print, inspect, or await.

  **Common fields**

  | Field             | What it means                                                                                                  |
  | ----------------- | -------------------------------------------------------------------------------------------------------------- |
  | `status`          | Current state of the operation. Common values include `running`, `completed`, `errored`, and `session_stored`. |
  | `dataset_name`    | The target dataset name used for the operation.                                                                |
  | `dataset_id`      | Dataset UUID when available.                                                                                   |
  | `session_ids`     | Session IDs associated with the result when session bridging is involved.                                      |
  | `pipeline_run_id` | Pipeline run UUID when the permanent pipeline has produced one.                                                |
  | `elapsed_seconds` | Wall-clock time from start to completion.                                                                      |
  | `items_processed` | Number of processed items when available.                                                                      |
  | `content_hash`    | Content hash of the first processed item when available.                                                       |
  | `items`           | Per-item metadata such as IDs, names, token counts, MIME type, or content hashes when available.               |
  | `raw_result`      | The raw pipeline result payload for advanced inspection.                                                       |
  | `error`           | Error text when the operation fails.                                                                           |

  **Typical shapes**

  <AccordionGroup>
    <Accordion title="Permanent memory completed">
      ```python theme={null}
      {
        "status": "completed",
        "dataset_name": "main_dataset",
        "dataset_id": "...",
        "pipeline_run_id": "...",
        "items_processed": 1,
        "elapsed_seconds": 4.2
      }
      ```
    </Accordion>

    <Accordion title="Session memory stored in cache">
      ```python theme={null}
      {
        "status": "session_stored",
        "dataset_name": "main_dataset",
        "session_ids": ["chat_1"],
        "elapsed_seconds": 0.02
      }
      ```
    </Accordion>

    <Accordion title="Background execution running">
      ```python theme={null}
      {
        "status": "running",
        "dataset_name": "main_dataset"
      }
      ```
    </Accordion>
  </AccordionGroup>
</Accordion>

<Accordion title="Parameters">
  <Tabs>
    <Tab title="Basic Parameters">
      | Option                   | What it does                                                                                                                                                                                                              |
      | ------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
      | `data`                   | The content to store. Can be text, file paths, URLs, file-like objects, `DataItem`, or lists of those inputs depending on mode.                                                                                           |
      | `dataset_name`           | Chooses the target permanent dataset. Defaults to `main_dataset`.                                                                                                                                                         |
      | `session_id`             | Switches `remember()` into session-memory mode.                                                                                                                                                                           |
      | `self_improvement`       | Controls whether [Improve](/core-concepts/main-operations/improve) starts automatically after storage. In session mode, this is what enables or disables automatic bridging into the permanent graph. Defaults to `True`. |
      | `run_in_background`      | Starts the permanent pipeline asynchronously.                                                                                                                                                                             |
      | `chunk_size` / `chunker` | Customizes chunking for the permanent pipeline.                                                                                                                                                                           |
      | `custom_prompt`          | Overrides the graph-extraction prompt used during graph building.                                                                                                                                                         |
      | `session_ids`            | Syncs new permanent graph knowledge back into specific sessions during the self-improvement pass.                                                                                                                         |
    </Tab>

    <Tab title="Advanced Parameters">
      | Option                                 | What it does                                                                                                                                         |
      | -------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
      | `dataset_id`                           | Targets a specific existing dataset by UUID instead of resolving only by name.                                                                       |
      | `graph_model`                          | Overrides the graph schema/model used during graph building.                                                                                         |
      | `node_set`                             | Attaches one or more node-set tags during permanent ingestion.                                                                                       |
      | `preferred_loaders`                    | Controls which loaders are preferred when ingesting source files.                                                                                    |
      | `incremental_loading`                  | Controls whether only new or changed data should be processed when reusing a dataset.                                                                |
      | `data_per_batch`                       | Tunes batching for ingestion and pipeline processing.                                                                                                |
      | `chunks_per_batch`                     | Tunes batching during chunk processing in the graph-building stage.                                                                                  |
      | `user`                                 | Runs `remember()` under a specific user context instead of the default user. This affects dataset ownership, permissions, and session-cache scoping. |
      | `vector_db_config` / `graph_db_config` | Overrides database backend configuration for the vector or graph stores.                                                                             |
    </Tab>
  </Tabs>
</Accordion>

<Accordion title="Under the hood — legacy operations">
  `remember()` runs [Add](/core-concepts/main-operations/legacy-operations/add) → [Cognify](/core-concepts/main-operations/legacy-operations/cognify) → [Improve](/core-concepts/main-operations/improve) under the hood.

  Use the legacy operations directly when you need explicit control over each step — for example, to inspect intermediate results, tune pipeline parameters independently, or integrate ingestion and graph-building into a more complex workflow.
</Accordion>

<Accordion title="Inspect what you've stored">
  After calling `remember()`, you can list datasets and browse their contents using `cognee.datasets`:

  ```python theme={null}
  import cognee

  # List all datasets
  datasets = await cognee.datasets.list_datasets()
  for ds in datasets:
      print(ds.name, ds.id)

  # List data items inside a dataset
  items = await cognee.datasets.list_data(dataset_id=ds.id)
  for item in items:
      print(item.id, item.name)
  ```

  See [datasets API reference](/python-api/datasets) for the full set of listing and management methods.
</Accordion>

<Columns cols={2}>
  <Card title="Recall" icon="search" href="/core-concepts/main-operations/recall">
    Query memory with auto-routing and session awareness
  </Card>

  <Card title="Improve" icon="sparkles" href="/core-concepts/main-operations/improve">
    Enrich the graph and bridge session memory
  </Card>
</Columns>
