> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cognee.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Remember

> Store data in Cognee as permanent graph memory or session memory.

## What is the remember operation

The `.remember` operation is the main ingestion entry point in Cognee v1.0. It stores information in memory with a single API call.

It has two modes:

* **Permanent memory**: without `session_id` provided, `remember()` runs the full ingestion pipeline for you — normalizing your data, building the knowledge graph, and enriching it for retrieval, all in one call. Cognee does not create a session ID for this path.
* **Session memory**: with `session_id`, `remember()` writes to the session cache for fast short-term memory. If `self_improvement=True` (the default), it then starts a background [Improve](/core-concepts/main-operations/improve) pass to bridge that session content into the permanent graph.
* **Dataset-aware**: permanent memory is written into a named dataset, which defaults to `main_dataset`.

## Where remember fits

* Use `remember()` when you want the simplest way to get data into Cognee.
* Use it for raw text, files, file lists, and `DataItem` objects.
* Use `session_id` when you want fast conversational memory first and long-term graph sync second.
* Use `self_improvement=False` if you want to skip the follow-up improvement pass.

## What happens under the hood

### Permanent memory

1. **Ingest** — data is normalized and attached to a named dataset.
2. **Build the graph** — documents are chunked, entities and relationships are extracted, and embeddings are created.
3. **Enrich** — by default, Cognee runs a follow-up [Improve](/core-concepts/main-operations/improve) pass that adds derived retrieval structures to the graph.

### Session memory

1. **Store in session cache** — the content is written to the cache keyed by user and session.
2. **Return immediately** — the call completes quickly with a session-stored result.
3. **Optional background bridge** — when `self_improvement=True` (default), Cognee immediately starts [Improve](/core-concepts/main-operations/improve) in the background to sync that session into the permanent graph.
4. **Manual bridge when disabled** — when `self_improvement=False`, nothing is bridged automatically; the content stays in session cache until you explicitly run `cognee.improve(dataset=..., session_ids=[...])`.

## After remember finishes

* **Permanent memory**: your data is stored in a dataset, chunked, turned into graph structure, embedded for retrieval, and usually enriched with an immediate follow-up improve pass.
* **Session memory**: your data is available quickly through the session cache; it only becomes permanent graph memory if the improvement bridge runs.

## Examples and details

<Accordion title="Accepted inputs">
  `remember()` accepts these input types (any of these, or a list mixing them):

  * **Raw text strings** — e.g. `"Einstein was born in Ulm."`
  * **Local file paths as strings** — absolute (`"/path/to/doc.pdf"`), `file://` URLs, or relative paths that resolve to an existing file
  * **HTTP/HTTPS URLs as strings** — fetched and ingested in permanent mode
  * **S3 paths as strings** — `"s3://bucket/key"` (requires `s3fs`)
  * **`DataItem` objects** — a lightweight wrapper for attaching metadata to any of the above (see below)
  * **HTTP upload objects with `.file` and `.filename` attributes** — e.g. FastAPI `UploadFile`. This is the supported path for "file-like" inputs in permanent-memory mode; it is what the HTTP API uses internally

  <Note>
    In the default blocking mode, plain Python file handles such as `open("doc.pdf", "rb")` (i.e. `BufferedReader`) are **not** accepted and will raise `IngestionError: Data type not supported`. Pass the file path as a string instead, wrap the bytes in an object that exposes `.file` and `.filename`, or use `run_in_background=True` so Cognee can materialize stream-like inputs before scheduling the pipeline.
  </Note>

  In permanent-memory mode:

  * Plain text is stored directly as memory content.
  * File paths are ingested and normalized into text before graph building.
  * HTTP/HTTPS URLs are fetched and passed through the ingestion pipeline.
  * Dataset scoping still applies through `dataset_name`.

  <Note>
    In session-memory mode, a URL string is stored in the session cache as text. It is not fetched or scraped. Permanent mode is required for URL ingestion to work end-to-end.
  </Note>

  **What is `DataItem`?**

  `DataItem` is a small dataclass exported from `cognee.tasks.ingestion.data_item` that lets you attach metadata to a single input alongside its content. Use it when you want a stable ID or extra labels travelling with the data through the ingestion pipeline.

  ```python theme={null}
  from cognee.tasks.ingestion.data_item import DataItem
  import cognee

  await cognee.remember(
      DataItem(
          data="Einstein was born in Ulm.",   # any accepted input above
          label="biography-note",             # optional free-form label
          external_metadata={"source": "wiki"},  # optional dict
          data_id=None,                       # optional UUID to pin an ID
      )
  )
  ```

  The `data` field of a `DataItem` accepts the same types as `remember()` itself (string text, file path, URL, etc.). You can also pass a list of `DataItem` objects to ingest several items in one call.
</Accordion>

<Accordion title="Supported formats">
  In permanent-memory mode, `remember()` supports the following file formats out of the box:

  | Loader                  | Extensions                                                                                                                          | Install extra                  |
  | ----------------------- | ----------------------------------------------------------------------------------------------------------------------------------- | ------------------------------ |
  | **TextLoader**          | `.txt` `.md` `.json` `.xml` `.yaml` `.yml` `.log`                                                                                   | built-in                       |
  | **CsvLoader**           | `.csv`                                                                                                                              | built-in                       |
  | **PyPdfLoader**         | `.pdf`                                                                                                                              | built-in                       |
  | **ImageLoader**         | `.png` `.jpg` `.jpeg` `.gif` `.webp` `.bmp` `.tif` `.tiff` `.heic` `.avif` `.ico` `.psd` `.apng` `.cr2` `.dwg` `.xcf` `.jxr` `.jpx` | built-in                       |
  | **AudioLoader**         | `.mp3` `.wav` `.aac` `.flac` `.ogg` `.m4a` `.mid` `.amr` `.aiff`                                                                    | built-in                       |
  | **UnstructuredLoader**  | `.docx` `.doc` `.odt` `.xlsx` `.xls` `.ods` `.pptx` `.ppt` `.odp` `.rtf` `.html` `.htm` `.eml` `.msg` `.epub`                       | `pip install cognee[docs]`     |
  | **AdvancedPdfLoader**   | `.pdf` with layout-aware extraction                                                                                                 | `pip install cognee[docs]`     |
  | **BeautifulSoupLoader** | `.html`                                                                                                                             | `pip install cognee[scraping]` |
  | **DoclingDocument**     | pre-converted `DoclingDocument` objects                                                                                             | `pip install cognee[docling]`  |

  <Note>
    This formats table applies only to permanent-memory `remember()`. In session-memory mode, `remember()` does not parse files or fetch URLs; it stores the provided content as session text, so plain text inputs are the recommended form. Non-text inputs become text representations rather than parsed source content.
  </Note>
</Accordion>

<Accordion title="Users and ownership">
  * Permanent-memory `remember()` writes data into a dataset owned by a user.
  * If you do not pass a user explicitly, Cognee resolves or creates the default user context.
  * Ownership affects who can later read, write, share, improve, or forget that dataset.
  * In session-memory mode, entries are still scoped to the current user in the session cache, so the same `session_id` under different users does not mean shared memory.

  If you need the deeper dataset-permissions model, see [Datasets](/core-concepts/further-concepts/datasets) and the multi-user docs.
</Accordion>

<Accordion title="What permanent remember produces">
  When `remember()` runs without `session_id`, the result is not just stored text. It produces the full retrieval-ready memory stack:

  * **Dataset records** in the relational store
  * **Normalized source content** attached to that dataset
  * **Chunks** created from the ingested content
  * **Graph nodes and edges** extracted from those chunks
  * **Embeddings and summaries** used for retrieval
  * **Improvement artifacts** from the follow-up improve pass when `self_improvement=True`, depending on the active enrichment configuration

  In practice, this means the dataset becomes immediately usable by [Recall](/core-concepts/main-operations/recall).
</Accordion>

<Accordion title="What session remember produces">
  When `remember()` runs with `session_id`, it behaves differently:

  * The content is written to the session cache for that user and session.
  * The call returns quickly with a session-stored result.
  * The content is available for session-aware [Recall](/core-concepts/main-operations/recall) immediately.
  * No permanent graph write happens unless the improvement bridge runs.

  So session memory is best thought of as **fast short-term memory first**, with optional long-term graph persistence second.
</Accordion>

<Accordion title="Session availability and configuration">
  Session memory is **enabled by default** in the current Cognee 1.0 configuration.

  * `session_id` works through the cache layer used for session storage.
  * The default cache backend is filesystem-based cache.
  * If you want to disable session caching entirely, set `CACHING=false`.
  * If you want a shared or production-oriented backend, set `CACHE_BACKEND=redis`.

  ```dotenv theme={null}
  # Disable session caching
  CACHING=false

  # Or keep sessions enabled and use Redis
  CACHING=true
  CACHE_BACKEND=redis
  ```

  If session caching is disabled or unavailable, `remember(..., session_id=...)` cannot persist session memory in the normal way. For broader cache behavior and adapters, see [Sessions and Caching](/core-concepts/sessions-and-caching).

  <Note>
    If the session cache is unavailable, `remember(..., session_id=...)` can still return a session-style result object, but the session content may not actually have been stored. If session memory matters for your workflow, make sure caching is enabled and available.
  </Note>
</Accordion>

<Accordion title="Latency and cost">
  The two remember modes have very different cost and latency profiles:

  * **Session memory** is the fast path. It writes to session cache and returns quickly.
  * **Permanent memory** is the heavier path. It runs ingestion, chunking, graph extraction, embedding, and usually improvement.

  That means permanent `remember()` is the right choice when you want durable graph memory, but it costs more time and more model work than session-only storage.

  The biggest cost drivers are:

  * How much data you ingest
  * How many chunks the content becomes
  * The graph extraction and summarization work during graph building
  * Whether `self_improvement=True` adds a follow-up improvement pass

  If you want more control over chunking or prompt behavior, use `chunk_size`, `chunker`, and `custom_prompt`. If you want the lightest permanent ingestion path, set `self_improvement=False`.
</Accordion>

<Accordion title="Background execution and result object">
  * `remember()` returns a promise-like `RememberResult`.
  * In blocking mode, it finishes only after the permanent pipeline completes.
  * With `run_in_background=True`, it returns immediately and you can `await` the result later.
  * Useful fields include status, dataset name, elapsed time, and the raw pipeline result.

  <Note>
    With `run_in_background=True`, file-like inputs (HTTP upload objects exposing `.file`/`.filename`, or stream-like objects with `.read()`) are read up-front and copied into owned in-memory buffers before the background task is scheduled. This lets background runs outlive the originating request or stream — you can pass an upload object or stream and let the call return immediately without the underlying stream being closed out from under the pipeline. Plain text strings, file paths, and URLs are unaffected.
  </Note>

  ```python theme={null}
  result = await cognee.remember(
      "Einstein was born in Ulm.",
      run_in_background=True,
  )

  print(result)   # status='running'
  await result
  print(result)   # status='completed'
  ```
</Accordion>

<Accordion title="Concurrent recall while indexing is running">
  You can call [recall](/core-concepts/main-operations/recall) while a permanent-memory `remember()` run is still indexing in the background. Cognee does not place a global lock over ingestion and retrieval.

  * Existing indexed data remains searchable while new data is being processed.
  * Newly ingested content only becomes available to graph-backed `recall()` once its indexing work has completed.
  * In practice, that means a background `remember()` run may produce **partial retrieval coverage** until the dataset finishes processing.

  For multi-process or multi-instance production setups, prefer external graph and relational stores rather than local file-based defaults. See [Graph Stores](/setup-configuration/graph-stores), [Relational Databases](/setup-configuration/relational-databases), and the [deployment guides](/how-to-guides/cognee-sdk/deployment).
</Accordion>

<Accordion title="Checking indexing status">
  When `remember()` writes to permanent memory, it runs a multi-step indexing pipeline in the background of that operation. For larger datasets, you may want to verify that indexing completed before calling [recall](/core-concepts/main-operations/recall).

  <Tabs>
    <Tab title="Python SDK">
      ```python theme={null}
      import cognee

      result = await cognee.remember(
          "Cognee turns documents into AI memory.",
          dataset_name="docs",
          run_in_background=True,
      )

      dataset_id = result.dataset_id
      status = await cognee.datasets.get_status([dataset_id])

      if status.get(str(dataset_id)) == "DATASET_PROCESSING_COMPLETED":
          answers = await cognee.recall(
              query_text="What does Cognee do?",
              datasets=["docs"],
          )
      ```
    </Tab>

    <Tab title="HTTP API">
      ```http theme={null}
      GET /api/v1/datasets/status?dataset=<dataset-uuid>
      GET /api/v1/activity/pipeline-runs?dataset_id=<dataset-uuid>
      ```

      * `GET /api/v1/datasets/status` returns the current indexing status for one or more datasets
      * `GET /api/v1/activity/pipeline-runs` returns recent pipeline runs with timestamps and run IDs
    </Tab>
  </Tabs>

  Possible status values:

  | Status                         | Meaning                          |
  | ------------------------------ | -------------------------------- |
  | `DATASET_PROCESSING_INITIATED` | Pipeline queued, not yet started |
  | `DATASET_PROCESSING_STARTED`   | Pipeline actively processing     |
  | `DATASET_PROCESSING_COMPLETED` | Indexing finished successfully   |
  | `DATASET_PROCESSING_ERRORED`   | Pipeline encountered an error    |

  For the low-level API reference, see [cognee.datasets.get\_status()](/python-api/datasets#datasetsget_status).
</Accordion>

<Accordion title="RememberResult format">
  `remember()` returns a `RememberResult` object. In docs terms, it behaves like a small status object you can print, inspect, or await.

  **Common fields**

  | Field             | What it means                                                                                                  |
  | ----------------- | -------------------------------------------------------------------------------------------------------------- |
  | `status`          | Current state of the operation. Common values include `running`, `completed`, `errored`, and `session_stored`. |
  | `dataset_name`    | The target dataset name used for the operation.                                                                |
  | `dataset_id`      | Dataset UUID when available.                                                                                   |
  | `session_ids`     | Session IDs associated with the result when session bridging is involved.                                      |
  | `pipeline_run_id` | Pipeline run UUID when the permanent pipeline has produced one.                                                |
  | `elapsed_seconds` | Wall-clock time from start to completion.                                                                      |
  | `items_processed` | Number of processed items when available.                                                                      |
  | `content_hash`    | Content hash of the first processed item when available.                                                       |
  | `items`           | Per-item metadata such as IDs, names, token counts, MIME type, or content hashes when available.               |
  | `raw_result`      | The raw pipeline result payload for advanced inspection.                                                       |
  | `error`           | Error text when the operation fails.                                                                           |

  **Typical shapes**

  <AccordionGroup>
    <Accordion title="Permanent memory completed">
      ```python theme={null}
      {
        "status": "completed",
        "dataset_name": "main_dataset",
        "dataset_id": "...",
        "pipeline_run_id": "...",
        "items_processed": 1,
        "elapsed_seconds": 4.2
      }
      ```
    </Accordion>

    <Accordion title="Session memory stored in cache">
      ```python theme={null}
      {
        "status": "session_stored",
        "dataset_name": "main_dataset",
        "session_ids": ["chat_1"],
        "elapsed_seconds": 0.02
      }
      ```
    </Accordion>

    <Accordion title="Background execution running">
      ```python theme={null}
      {
        "status": "running",
        "dataset_name": "main_dataset"
      }
      ```
    </Accordion>
  </AccordionGroup>
</Accordion>

<Accordion title="Parameters">
  <Tabs>
    <Tab title="Basic Parameters">
      | Option                   | What it does                                                                                                                                                                                                                                                                  |
      | ------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
      | `data`                   | The content to store. Can be a string (text, local path, `file://`, `http(s)://`, or `s3://` URL), a `DataItem`, an HTTP upload object exposing `.file`/`.filename`, or a list mixing any of these. Plain `open(..., "rb")` handles are not accepted — see "Accepted inputs". |
      | `dataset_name`           | Chooses the target permanent dataset. Defaults to `main_dataset`.                                                                                                                                                                                                             |
      | `session_id`             | Switches `remember()` into session-memory mode.                                                                                                                                                                                                                               |
      | `self_improvement`       | Controls whether [Improve](/core-concepts/main-operations/improve) starts automatically after storage. In session mode, this is what enables or disables automatic bridging into the permanent graph. Defaults to `True`.                                                     |
      | `run_in_background`      | Starts the permanent pipeline asynchronously.                                                                                                                                                                                                                                 |
      | `chunk_size` / `chunker` | Customizes chunking for the permanent pipeline.                                                                                                                                                                                                                               |
      | `custom_prompt`          | Overrides the graph-extraction prompt used during graph building.                                                                                                                                                                                                             |
      | `session_ids`            | Syncs new permanent graph knowledge back into specific sessions during the self-improvement pass.                                                                                                                                                                             |
    </Tab>

    <Tab title="Advanced Parameters">
      | Option                                 | What it does                                                                                                                                                                                                                                                                                |
      | -------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
      | `dataset_id`                           | Targets a specific existing dataset by UUID instead of resolving only by name.                                                                                                                                                                                                              |
      | `graph_model`                          | Overrides the graph schema/model used during graph building. Defaults to `KnowledgeGraph`; pass a `DataPoint` subclass to constrain LLM extraction to your own fields and relationships. See [Custom Graph Model](/guides/custom-graph-model).                                              |
      | `node_set`                             | Attaches one or more node-set tags during permanent ingestion.                                                                                                                                                                                                                              |
      | `preferred_loaders`                    | Controls which loaders are preferred when ingesting source files.                                                                                                                                                                                                                           |
      | `incremental_loading`                  | Controls whether only new or changed data should be processed when reusing a dataset.                                                                                                                                                                                                       |
      | `data_per_batch`                       | Tunes batching for ingestion and pipeline processing in the current `remember()` API. See [Cognify](/core-concepts/main-operations/legacy-operations/cognify) for lower-level batching details.                                                                                             |
      | `chunks_per_batch`                     | Tunes batching during chunk processing in the graph-building stage in the current `remember()` API. See [Cognify](/core-concepts/main-operations/legacy-operations/cognify) for lower-level batching details.                                                                               |
      | `user`                                 | Runs `remember()` under a specific user context instead of the default user. This affects dataset ownership, permissions, and session-cache scoping.                                                                                                                                        |
      | `vector_db_config` / `graph_db_config` | Overrides database backend configuration for the vector or graph stores.                                                                                                                                                                                                                    |
      | `primary_key`                          | dlt structured-ingestion option — names the column used for upsert/dedup behavior when `write_disposition="merge"` is used with CSV files, database connection strings, or dlt resources. Forwarded to the underlying ingestion step. See [dlt integration](/integrations/dlt-integration). |
      | `write_disposition`                    | dlt structured-ingestion option — `replace` (default), `merge`, or `append`. Forwarded to the underlying ingestion step. See [dlt integration](/integrations/dlt-integration).                                                                                                              |
      | `query`                                | dlt structured-ingestion option — SQL `WHERE`-style filter applied when ingesting from a database connection string. Forwarded to the underlying ingestion step.                                                                                                                            |
      | `max_rows_per_table`                   | dlt structured-ingestion option — caps the number of rows pulled per table for a single call, overriding `DLT_MAX_ROWS_PER_TABLE` for that call.                                                                                                                                            |
    </Tab>
  </Tabs>
</Accordion>

<Accordion title="Under the hood — legacy operations">
  `remember()` runs [Add](/core-concepts/main-operations/legacy-operations/add) → [Cognify](/core-concepts/main-operations/legacy-operations/cognify) → [Improve](/core-concepts/main-operations/improve) under the hood.

  Use the legacy operations directly when you need explicit control over each step — for example, to inspect intermediate results, tune pipeline parameters independently, or integrate ingestion and graph-building into a more complex workflow.
</Accordion>

<Accordion title="Inspect what you've stored">
  After calling `remember()`, you can list datasets and browse their contents using `cognee.datasets`:

  ```python theme={null}
  import cognee

  # List all datasets
  datasets = await cognee.datasets.list_datasets()
  for ds in datasets:
      print(ds.name, ds.id)

  # List data items inside a dataset
  items = await cognee.datasets.list_data(dataset_id=ds.id)
  for item in items:
      print(item.id, item.name)
  ```

  See [datasets API reference](/python-api/datasets) for the full set of listing and management methods.
</Accordion>

<Columns cols={2}>
  <Card title="Recall" icon="search" href="/core-concepts/main-operations/recall">
    Query memory with auto-routing and session awareness
  </Card>

  <Card title="Improve" icon="sparkles" href="/core-concepts/main-operations/improve">
    Enrich the graph and bridge session memory
  </Card>
</Columns>