> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cognee.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# dlt (Data Load Tool)

> Ingest structured data into Cognee with dlt.

Ingest structured relational data — databases, CSV files, and [dlt](https://dlthub.com/) resources — directly into cognee's knowledge graph. Foreign keys become graph edges, tables become schema nodes, and each row becomes a searchable document, all built deterministically from the schema without LLM extraction.

## Why Use This Integration

* **Schema-Aware Graphs**: Foreign key relationships are preserved as first-class edges in the knowledge graph
* **Deterministic Graph Construction**: Structured data bypasses LLM entity extraction — no hallucination risk
* **Mixed Ingestion**: Combine structured (dlt) and unstructured (text, PDF) data in the same dataset
* **Multiple Input Modes**: Pass explicit dlt resources, CSV file paths, or database connection strings
* **Write Dispositions**: Control how data is synced — merge (upsert), append, or replace

## Installation

```bash theme={null}
pip install 'cognee[dlt]'
```

Or with uv:

```bash theme={null}
uv pip install 'cognee[dlt]'
```

## Quick Start

### 1. Ingest a dlt Resource

Define a dlt resource and pass it to `cognee.remember()`. The dlt-specific structured-ingestion options `primary_key`, `write_disposition`, SQL `query`, and `max_rows_per_table` are accepted by `cognee.remember()` and forwarded to the underlying ingestion step. After ingestion, use `cognee.recall(...)` to query the graph.

```python theme={null}
import dlt
import cognee
import asyncio

@dlt.resource()
def users_and_pets():
    yield [
        {
            "id": 1,
            "name": "Alice",
            "pets": [
                {"id": 1, "name": "Fluffy", "type": "cat"},
                {"id": 2, "name": "Spot", "type": "dog"},
            ],
        },
        {
            "id": 2,
            "name": "Bob",
            "pets": [{"id": 3, "name": "Fido", "type": "dog"}],
        },
    ]

async def main():
    await cognee.remember(
        users_and_pets,
        dataset_name="users_and_pets",
        primary_key="id",
    )
    results = await cognee.recall(
        query_text="Which pet does Alice have?",
        datasets=["users_and_pets"],
    )
    print(results)

asyncio.run(main())
```

dlt automatically detects nested structures (like `pets` inside each user) and creates separate tables with foreign key relationships.

<Note>
  The lower-level `cognee.add(...)` + `cognee.cognify(...)` pair still accepts the same dlt kwargs and remains useful when you need to run ingestion and graph building as separate steps. For the runnable end-to-end version of this walkthrough, see [`examples/demos/dlt_ingestion_example.py`](https://github.com/topoteretes/cognee/blob/main/examples/demos/dlt_ingestion_example.py).
</Note>

### 2. Build and Query the Graph

Once `remember()` finishes ingesting and building the graph, use `cognee.recall(...)` to query it.

## Other Input Modes

### CSV Auto-Detection

Pass a `.csv` file path and cognee creates a dlt source automatically:

```python theme={null}
await cognee.remember(
    "/path/to/employees.csv",
    dataset_name="employees",
    primary_key="id",
)
```

### Database Connection String

Ingest tables directly from an existing database:

```python theme={null}
await cognee.remember(
    "postgresql://user:pass@host/db",
    dataset_name="company_db",
    primary_key="id",
)
```

Supported databases via auto-detection: SQLite, PostgreSQL, MySQL, MSSQL, Oracle. Amazon Redshift is also compatible since it speaks the PostgreSQL wire protocol — use a standard `postgresql://` connection string pointing to your Redshift endpoint.

For Snowflake and Google BigQuery, construct a dlt source directly and pass it to `cognee.remember()` (see the [Cloud Data Warehouses](#cloud-data-warehouses) accordion below).

You can optionally filter with a SQL WHERE clause:

```python theme={null}
await cognee.remember(
    "postgresql://user:pass@host/db",
    dataset_name="engineering_team",
    primary_key="id",
    query="SELECT * FROM employees WHERE department = 'Engineering'",
)
```

### Mixed Structured + Unstructured

Combine dlt resources with unstructured text in a single dataset:

```python theme={null}
text = """Alice has two pets: a cat named Fluffy and a dog named Spot.
Bob has a dog named Fido, who is friendly with both Fluffy and Spot."""

await cognee.remember(
    [text, users_and_pets],
    dataset_name="users_and_pets_with_text",
    primary_key="id",
)
```

<Info>
  Structured data creates deterministic graph nodes from the schema, while unstructured text goes through LLM-based entity extraction. Both are combined in the same knowledge graph.
</Info>

## Write Dispositions

Control how data is synced on repeated runs using the `write_disposition` parameter:

* **`replace`** (default): Drop and recreate tables on each run. Use for full snapshot refreshes.
* **`merge`**: Upsert by primary key — updates existing rows, inserts new ones. Best for data that changes over time.
* **`append`**: Always insert without deduplication. Use for time-series data and event logs.

```python theme={null}
# Append mode — every call adds new rows, no dedup
await cognee.remember(
    event_resource,
    dataset_name="events",
    primary_key="id",
    write_disposition="append",
)
```

## How It Works

1. **Source Detection**: cognee identifies dlt resources, CSV files, and connection strings in the input
2. **Pipeline Execution**: A dlt pipeline loads data into a per-dataset staging database
3. **Schema Extraction**: Table schemas, primary keys, and foreign keys are extracted
4. **Graph Construction**: Each row becomes a document node; foreign keys become edges between nodes
5. **LLM Bypass**: Structured rows skip chunking, entity extraction, and summarization — the graph is built entirely from schema metadata

<Info>
  The `primary_key` parameter controls upsert behavior when you use `write_disposition="merge"`. If not specified, cognee auto-detects from an `id` column or falls back to the first column. Use the `max_rows_per_table` kwarg on `remember()` / `add()` to override the per-table row cap for a single call, or set the `DLT_MAX_ROWS_PER_TABLE` environment variable (default: `50`) to change the process-wide default.
</Info>

## Foreign Key Resolution

A foreign key becomes a graph edge only when **both** the source row and the target row are loaded in the same ingestion run. Two edge cases are worth knowing about — cognee now logs a warning in each so they are diagnosable rather than silent:

* **Target row not loaded**: if a foreign key points at a row that wasn't ingested — most commonly because the target table hit the `max_rows_per_table` cap — the reference is dropped and no edge is created. The warning identifies the dropped references as `source_table.column -> ref_table:value`. If you see missing edges, raise `max_rows_per_table` so the referenced rows are included.
* **Duplicate primary keys within a table**: if multiple rows in a table share the same primary key, foreign key edges that target that key resolve to the **last** such row loaded; earlier rows with the same key are shadowed for FK targeting. The warning names the affected `table` and `pk`.

## Use Cases

<AccordionGroup>
  <Accordion title="CRM and Relational Data">
    Load customer, order, and product tables from a database. Foreign keys between tables (e.g., `order.customer_id → customer.id`) become graph edges, enabling cross-table queries like "Which customers ordered product X?"
  </Accordion>

  <Accordion title="CSV Analytics Pipeline">
    Point cognee at CSV exports from analytics tools. Each row becomes a searchable node in the graph, and you can combine them with unstructured reports in the same dataset.
  </Accordion>

  <Accordion title="Event Log Ingestion">
    Use `write_disposition="append"` to stream event batches into cognee without deduplication. Query across the full event history with natural language.
  </Accordion>

  <Accordion title="Database Mirroring">
    Use `write_disposition="merge"` to keep cognee's graph in sync with a live database. Rows that are removed upstream are cleaned up best-effort; any orphaned rows that fail to delete are logged and retried on the next ingest.
  </Accordion>

  <Accordion title="Cloud Data Warehouses (Snowflake, Redshift, BigQuery)" id="cloud-data-warehouses">
    **Amazon Redshift** speaks the PostgreSQL wire protocol, so the standard connection string auto-detection works:

    ```python theme={null}
    await cognee.remember(
        "postgresql://user:pass@my-cluster.us-east-1.redshift.amazonaws.com:5439/mydb",
        dataset_name="redshift_data",
        primary_key="id",
    )
    ```

    **Snowflake** requires constructing a dlt `sql_database` source manually (install `snowflake-sqlalchemy` first):

    ```bash theme={null}
    pip install 'cognee[dlt]' snowflake-sqlalchemy
    ```

    ```python theme={null}
    from dlt.sources.sql_database import sql_database
    import cognee

    source = sql_database(
        credentials="snowflake://user:password@account_identifier/database/schema?warehouse=MY_WH",
        table_names=["orders", "customers"],
    )

    await cognee.remember(source, dataset_name="snowflake_data", primary_key="id")
    ```

    The `account_identifier` is the part before `.snowflakecomputing.com` in your Snowflake URL (e.g. `myorg-myaccount`). Omit `table_names` to ingest all tables in the schema.

    **Google BigQuery** works the same way using dlt's BigQuery connector — construct the source and pass it directly to `cognee.remember()`. See the [dlt sql\_database docs](https://dlthub.com/docs/dlt-ecosystem/verified-sources/sql_database) for connector-specific setup.
  </Accordion>
</AccordionGroup>

***

<CardGroup cols={2}>
  <Card title="Remember Operation" icon="brain" href="/core-concepts/main-operations/remember">
    Learn more about data ingestion in cognee
  </Card>

  <Card title="dlt Documentation" icon="book" href="https://dlthub.com/docs">
    Official dlt documentation and guides
  </Card>
</CardGroup>