> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cognee.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# dlt (Data Load Tool)

Ingest structured relational data — databases, CSV files, and [dlt](https://dlthub.com/) resources — directly into cognee's knowledge graph. Foreign keys become graph edges, tables become schema nodes, and each row becomes a searchable document, all built deterministically from the schema without LLM extraction.

## Why Use This Integration

* **Schema-Aware Graphs**: Foreign key relationships are preserved as first-class edges in the knowledge graph
* **Deterministic Graph Construction**: Structured data bypasses LLM entity extraction — no hallucination risk
* **Mixed Ingestion**: Combine structured (dlt) and unstructured (text, PDF) data in the same dataset
* **Multiple Input Modes**: Pass explicit dlt resources, CSV file paths, or database connection strings
* **Write Dispositions**: Control how data is synced — merge (upsert), append, or replace

## Installation

```bash theme={null}
pip install 'cognee[dlt]'
```

Or with uv:

```bash theme={null}
uv pip install 'cognee[dlt]'
```

## Quick Start

### 1. Ingest a dlt Resource

Define a dlt resource and pass it to `cognee.add()`.

The reason this integration uses `cognee.add()` and `cognee.cognify()` instead of `cognee.remember()` is that dlt-specific structured-ingestion options such as `primary_key`, `write_disposition`, and SQL `query` are supported on the `add()` + `cognify()` path, while `remember()` is the higher-level wrapper and does not expose this full dlt ingestion surface. After that ingestion step, use `cognee.recall(...)` to query the graph.

```python theme={null}
import dlt
import cognee
import asyncio

@dlt.resource()
def users_and_pets():
    yield [
        {
            "id": 1,
            "name": "Alice",
            "pets": [
                {"id": 1, "name": "Fluffy", "type": "cat"},
                {"id": 2, "name": "Spot", "type": "dog"},
            ],
        },
        {
            "id": 2,
            "name": "Bob",
            "pets": [{"id": 3, "name": "Fido", "type": "dog"}],
        },
    ]

async def main():
    await cognee.add(
        users_and_pets,
        dataset_name="users_and_pets",
        primary_key="id",
    )
    await cognee.cognify()
    results = await cognee.recall(
        query_text="Which pet does Alice have?",
        datasets=["users_and_pets"],
    )
    print(results)

asyncio.run(main())
```

dlt automatically detects nested structures (like `pets` inside each user) and creates separate tables with foreign key relationships.

### 2. Build and Query the Graph

Once `cognify()` completes, use `cognee.recall(...)` to query the graph.

## Other Input Modes

### CSV Auto-Detection

Pass a `.csv` file path and cognee creates a dlt source automatically:

```python theme={null}
await cognee.add(
    "/path/to/employees.csv",
    dataset_name="employees",
    primary_key="id",
)
await cognee.cognify()
```

### Database Connection String

Ingest tables directly from an existing database:

```python theme={null}
await cognee.add(
    "postgresql://user:pass@host/db",
    dataset_name="company_db",
    primary_key="id",
)
await cognee.cognify()
```

Supported databases via auto-detection: SQLite, PostgreSQL, MySQL, MSSQL, Oracle. Amazon Redshift is also compatible since it speaks the PostgreSQL wire protocol — use a standard `postgresql://` connection string pointing to your Redshift endpoint.

For Snowflake and Google BigQuery, construct a dlt source directly and pass it to `cognee.add()` (see the [Cloud Data Warehouses](#cloud-data-warehouses) accordion below).

You can optionally filter with a SQL WHERE clause:

```python theme={null}
await cognee.add(
    "postgresql://user:pass@host/db",
    dataset_name="engineering_team",
    primary_key="id",
    query="SELECT * FROM employees WHERE department = 'Engineering'",
)
```

### Mixed Structured + Unstructured

Combine dlt resources with unstructured text in a single dataset:

```python theme={null}
text = """Alice has two pets: a cat named Fluffy and a dog named Spot.
Bob has a dog named Fido, who is friendly with both Fluffy and Spot."""

await cognee.add(
    [text, users_and_pets],
    dataset_name="users_and_pets_with_text",
    primary_key="id",
)
await cognee.cognify()
```

<Info>
  Structured data creates deterministic graph nodes from the schema, while unstructured text goes through LLM-based entity extraction. Both are combined in the same knowledge graph.
</Info>

## Write Dispositions

Control how data is synced on repeated runs using the `write_disposition` parameter:

* **`merge`** (default): Upsert by primary key — updates existing rows, inserts new ones. Best for data that changes over time.
* **`append`**: Always insert without deduplication. Use for time-series data and event logs.
* **`replace`**: Drop and recreate tables on each run. Use for full snapshot refreshes.

```python theme={null}
# Append mode — every call adds new rows, no dedup
await cognee.add(
    event_resource,
    dataset_name="events",
    primary_key="id",
    write_disposition="append",
)
```

## How It Works

1. **Source Detection**: cognee identifies dlt resources, CSV files, and connection strings in the input
2. **Pipeline Execution**: A dlt pipeline loads data into a per-dataset staging database
3. **Schema Extraction**: Table schemas, primary keys, and foreign keys are extracted
4. **Graph Construction**: Each row becomes a document node; foreign keys become edges between nodes
5. **LLM Bypass**: Structured rows skip chunking, entity extraction, and summarization — the graph is built entirely from schema metadata

<Info>
  The `primary_key` parameter controls upsert behavior. If not specified, cognee auto-detects from an `id` column or falls back to the first column. Set `DLT_MAX_ROWS_PER_TABLE` (default: `50`) to control the maximum rows ingested per table.
</Info>

## Use Cases

<AccordionGroup>
  <Accordion title="CRM and Relational Data">
    Load customer, order, and product tables from a database. Foreign keys between tables (e.g., `order.customer_id → customer.id`) become graph edges, enabling cross-table queries like "Which customers ordered product X?"
  </Accordion>

  <Accordion title="CSV Analytics Pipeline">
    Point cognee at CSV exports from analytics tools. Each row becomes a searchable node in the graph, and you can combine them with unstructured reports in the same dataset.
  </Accordion>

  <Accordion title="Event Log Ingestion">
    Use `write_disposition="append"` to stream event batches into cognee without deduplication. Query across the full event history with natural language.
  </Accordion>

  <Accordion title="Database Mirroring">
    Use `write_disposition="merge"` to keep cognee's graph in sync with a live database. Rows that are removed upstream are automatically cleaned up.
  </Accordion>

  <Accordion title="Cloud Data Warehouses (Snowflake, Redshift, BigQuery)" id="cloud-data-warehouses">
    **Amazon Redshift** speaks the PostgreSQL wire protocol, so the standard connection string auto-detection works:

    ```python theme={null}
    await cognee.add(
        "postgresql://user:pass@my-cluster.us-east-1.redshift.amazonaws.com:5439/mydb",
        dataset_name="redshift_data",
        primary_key="id",
    )
    ```

    **Snowflake** requires constructing a dlt `sql_database` source manually (install `snowflake-sqlalchemy` first):

    ```bash theme={null}
    pip install 'cognee[dlt]' snowflake-sqlalchemy
    ```

    ```python theme={null}
    from dlt.sources.sql_database import sql_database
    import cognee

    source = sql_database(
        credentials="snowflake://user:password@account_identifier/database/schema?warehouse=MY_WH",
        table_names=["orders", "customers"],
    )

    await cognee.add(source, dataset_name="snowflake_data", primary_key="id")
    await cognee.cognify()
    ```

    The `account_identifier` is the part before `.snowflakecomputing.com` in your Snowflake URL (e.g. `myorg-myaccount`). Omit `table_names` to ingest all tables in the schema.

    **Google BigQuery** works the same way using dlt's BigQuery connector — construct the source and pass it directly to `cognee.add()`. See the [dlt sql\_database docs](https://dlthub.com/docs/dlt-ecosystem/verified-sources/sql_database) for connector-specific setup.
  </Accordion>
</AccordionGroup>

***

<CardGroup cols={2}>
  <Card title="Add Operation" icon="plus" href="/core-concepts/main-operations/legacy-operations/add">
    Learn more about data ingestion in cognee
  </Card>

  <Card title="dlt Documentation" icon="book" href="https://dlthub.com/docs">
    Official dlt documentation and guides
  </Card>
</CardGroup>
