Skip to main content
Ingest structured relational data — databases, CSV files, and dlt resources — directly into cognee’s knowledge graph. Foreign keys become graph edges, tables become schema nodes, and each row becomes a searchable document, all built deterministically from the schema without LLM extraction.

Why Use This Integration

  • Schema-Aware Graphs: Foreign key relationships are preserved as first-class edges in the knowledge graph
  • Deterministic Graph Construction: Structured data bypasses LLM entity extraction — no hallucination risk
  • Mixed Ingestion: Combine structured (dlt) and unstructured (text, PDF) data in the same dataset
  • Multiple Input Modes: Pass explicit dlt resources, CSV file paths, or database connection strings
  • Write Dispositions: Control how data is synced — merge (upsert), append, or replace

Installation

pip install 'cognee[dlt]'
Or with uv:
uv pip install 'cognee[dlt]'

Quick Start

1. Ingest a dlt Resource

Define a dlt resource and pass it to cognee.remember(). The dlt-specific structured-ingestion options primary_key, write_disposition, SQL query, and max_rows_per_table are accepted by cognee.remember() and forwarded to the underlying ingestion step. After ingestion, use cognee.recall(...) to query the graph.
import dlt
import cognee
import asyncio

@dlt.resource()
def users_and_pets():
    yield [
        {
            "id": 1,
            "name": "Alice",
            "pets": [
                {"id": 1, "name": "Fluffy", "type": "cat"},
                {"id": 2, "name": "Spot", "type": "dog"},
            ],
        },
        {
            "id": 2,
            "name": "Bob",
            "pets": [{"id": 3, "name": "Fido", "type": "dog"}],
        },
    ]

async def main():
    await cognee.remember(
        users_and_pets,
        dataset_name="users_and_pets",
        primary_key="id",
    )
    results = await cognee.recall(
        query_text="Which pet does Alice have?",
        datasets=["users_and_pets"],
    )
    print(results)

asyncio.run(main())
dlt automatically detects nested structures (like pets inside each user) and creates separate tables with foreign key relationships.
The lower-level cognee.add(...) + cognee.cognify(...) pair still accepts the same dlt kwargs and remains useful when you need to run ingestion and graph building as separate steps. For the runnable end-to-end version of this walkthrough, see examples/demos/dlt_ingestion_example.py.

2. Build and Query the Graph

Once remember() finishes ingesting and building the graph, use cognee.recall(...) to query it.

Other Input Modes

CSV Auto-Detection

Pass a .csv file path and cognee creates a dlt source automatically:
await cognee.remember(
    "/path/to/employees.csv",
    dataset_name="employees",
    primary_key="id",
)

Database Connection String

Ingest tables directly from an existing database:
await cognee.remember(
    "postgresql://user:pass@host/db",
    dataset_name="company_db",
    primary_key="id",
)
Supported databases via auto-detection: SQLite, PostgreSQL, MySQL, MSSQL, Oracle. Amazon Redshift is also compatible since it speaks the PostgreSQL wire protocol — use a standard postgresql:// connection string pointing to your Redshift endpoint. For Snowflake and Google BigQuery, construct a dlt source directly and pass it to cognee.remember() (see the Cloud Data Warehouses accordion below). You can optionally filter with a SQL WHERE clause:
await cognee.remember(
    "postgresql://user:pass@host/db",
    dataset_name="engineering_team",
    primary_key="id",
    query="SELECT * FROM employees WHERE department = 'Engineering'",
)

Mixed Structured + Unstructured

Combine dlt resources with unstructured text in a single dataset:
text = """Alice has two pets: a cat named Fluffy and a dog named Spot.
Bob has a dog named Fido, who is friendly with both Fluffy and Spot."""

await cognee.remember(
    [text, users_and_pets],
    dataset_name="users_and_pets_with_text",
    primary_key="id",
)
Structured data creates deterministic graph nodes from the schema, while unstructured text goes through LLM-based entity extraction. Both are combined in the same knowledge graph.

Write Dispositions

Control how data is synced on repeated runs using the write_disposition parameter:
  • replace (default): Drop and recreate tables on each run. Use for full snapshot refreshes.
  • merge: Upsert by primary key — updates existing rows, inserts new ones. Best for data that changes over time.
  • append: Always insert without deduplication. Use for time-series data and event logs.
# Append mode — every call adds new rows, no dedup
await cognee.remember(
    event_resource,
    dataset_name="events",
    primary_key="id",
    write_disposition="append",
)

How It Works

  1. Source Detection: cognee identifies dlt resources, CSV files, and connection strings in the input
  2. Pipeline Execution: A dlt pipeline loads data into a per-dataset staging database
  3. Schema Extraction: Table schemas, primary keys, and foreign keys are extracted
  4. Graph Construction: Each row becomes a document node; foreign keys become edges between nodes
  5. LLM Bypass: Structured rows skip chunking, entity extraction, and summarization — the graph is built entirely from schema metadata
The primary_key parameter controls upsert behavior when you use write_disposition="merge". If not specified, cognee auto-detects from an id column or falls back to the first column. Use the max_rows_per_table kwarg on remember() / add() to override the per-table row cap for a single call, or set the DLT_MAX_ROWS_PER_TABLE environment variable (default: 50) to change the process-wide default.

Foreign Key Resolution

A foreign key becomes a graph edge only when both the source row and the target row are loaded in the same ingestion run. Two edge cases are worth knowing about — cognee now logs a warning in each so they are diagnosable rather than silent:
  • Target row not loaded: if a foreign key points at a row that wasn’t ingested — most commonly because the target table hit the max_rows_per_table cap — the reference is dropped and no edge is created. The warning identifies the dropped references as source_table.column -> ref_table:value. If you see missing edges, raise max_rows_per_table so the referenced rows are included.
  • Duplicate primary keys within a table: if multiple rows in a table share the same primary key, foreign key edges that target that key resolve to the last such row loaded; earlier rows with the same key are shadowed for FK targeting. The warning names the affected table and pk.

Use Cases

Load customer, order, and product tables from a database. Foreign keys between tables (e.g., order.customer_id → customer.id) become graph edges, enabling cross-table queries like “Which customers ordered product X?”
Point cognee at CSV exports from analytics tools. Each row becomes a searchable node in the graph, and you can combine them with unstructured reports in the same dataset.
Use write_disposition="append" to stream event batches into cognee without deduplication. Query across the full event history with natural language.
Use write_disposition="merge" to keep cognee’s graph in sync with a live database. Rows that are removed upstream are cleaned up best-effort; any orphaned rows that fail to delete are logged and retried on the next ingest.
Amazon Redshift speaks the PostgreSQL wire protocol, so the standard connection string auto-detection works:
await cognee.remember(
    "postgresql://user:pass@my-cluster.us-east-1.redshift.amazonaws.com:5439/mydb",
    dataset_name="redshift_data",
    primary_key="id",
)
Snowflake requires constructing a dlt sql_database source manually (install snowflake-sqlalchemy first):
pip install 'cognee[dlt]' snowflake-sqlalchemy
from dlt.sources.sql_database import sql_database
import cognee

source = sql_database(
    credentials="snowflake://user:password@account_identifier/database/schema?warehouse=MY_WH",
    table_names=["orders", "customers"],
)

await cognee.remember(source, dataset_name="snowflake_data", primary_key="id")
The account_identifier is the part before .snowflakecomputing.com in your Snowflake URL (e.g. myorg-myaccount). Omit table_names to ingest all tables in the schema.Google BigQuery works the same way using dlt’s BigQuery connector — construct the source and pass it directly to cognee.remember(). See the dlt sql_database docs for connector-specific setup.

Remember Operation

Learn more about data ingestion in cognee

dlt Documentation

Official dlt documentation and guides