Why Use This Integration
- Schema-Aware Graphs: Foreign key relationships are preserved as first-class edges in the knowledge graph
- Deterministic Graph Construction: Structured data bypasses LLM entity extraction — no hallucination risk
- Mixed Ingestion: Combine structured (dlt) and unstructured (text, PDF) data in the same dataset
- Multiple Input Modes: Pass explicit dlt resources, CSV file paths, or database connection strings
- Write Dispositions: Control how data is synced — merge (upsert), append, or replace
Installation
Quick Start
1. Ingest a dlt Resource
Define a dlt resource and pass it tocognee.remember(). The dlt-specific structured-ingestion options primary_key, write_disposition, SQL query, and max_rows_per_table are accepted by cognee.remember() and forwarded to the underlying ingestion step. After ingestion, use cognee.recall(...) to query the graph.
pets inside each user) and creates separate tables with foreign key relationships.
The lower-level
cognee.add(...) + cognee.cognify(...) pair still accepts the same dlt kwargs and remains useful when you need to run ingestion and graph building as separate steps. For the runnable end-to-end version of this walkthrough, see examples/demos/dlt_ingestion_example.py.2. Build and Query the Graph
Onceremember() finishes ingesting and building the graph, use cognee.recall(...) to query it.
Other Input Modes
CSV Auto-Detection
Pass a.csv file path and cognee creates a dlt source automatically:
Database Connection String
Ingest tables directly from an existing database:postgresql:// connection string pointing to your Redshift endpoint.
For Snowflake and Google BigQuery, construct a dlt source directly and pass it to cognee.remember() (see the Cloud Data Warehouses accordion below).
You can optionally filter with a SQL WHERE clause:
Mixed Structured + Unstructured
Combine dlt resources with unstructured text in a single dataset:Structured data creates deterministic graph nodes from the schema, while unstructured text goes through LLM-based entity extraction. Both are combined in the same knowledge graph.
Write Dispositions
Control how data is synced on repeated runs using thewrite_disposition parameter:
replace(default): Drop and recreate tables on each run. Use for full snapshot refreshes.merge: Upsert by primary key — updates existing rows, inserts new ones. Best for data that changes over time.append: Always insert without deduplication. Use for time-series data and event logs.
How It Works
- Source Detection: cognee identifies dlt resources, CSV files, and connection strings in the input
- Pipeline Execution: A dlt pipeline loads data into a per-dataset staging database
- Schema Extraction: Table schemas, primary keys, and foreign keys are extracted
- Graph Construction: Each row becomes a document node; foreign keys become edges between nodes
- LLM Bypass: Structured rows skip chunking, entity extraction, and summarization — the graph is built entirely from schema metadata
The
primary_key parameter controls upsert behavior when you use write_disposition="merge". If not specified, cognee auto-detects from an id column or falls back to the first column. Use the max_rows_per_table kwarg on remember() / add() to override the per-table row cap for a single call, or set the DLT_MAX_ROWS_PER_TABLE environment variable (default: 50) to change the process-wide default.Foreign Key Resolution
A foreign key becomes a graph edge only when both the source row and the target row are loaded in the same ingestion run. Two edge cases are worth knowing about — cognee now logs a warning in each so they are diagnosable rather than silent:- Target row not loaded: if a foreign key points at a row that wasn’t ingested — most commonly because the target table hit the
max_rows_per_tablecap — the reference is dropped and no edge is created. The warning identifies the dropped references assource_table.column -> ref_table:value. If you see missing edges, raisemax_rows_per_tableso the referenced rows are included. - Duplicate primary keys within a table: if multiple rows in a table share the same primary key, foreign key edges that target that key resolve to the last such row loaded; earlier rows with the same key are shadowed for FK targeting. The warning names the affected
tableandpk.
Use Cases
CRM and Relational Data
CRM and Relational Data
Load customer, order, and product tables from a database. Foreign keys between tables (e.g.,
order.customer_id → customer.id) become graph edges, enabling cross-table queries like “Which customers ordered product X?”CSV Analytics Pipeline
CSV Analytics Pipeline
Point cognee at CSV exports from analytics tools. Each row becomes a searchable node in the graph, and you can combine them with unstructured reports in the same dataset.
Event Log Ingestion
Event Log Ingestion
Use
write_disposition="append" to stream event batches into cognee without deduplication. Query across the full event history with natural language.Database Mirroring
Database Mirroring
Use
write_disposition="merge" to keep cognee’s graph in sync with a live database. Rows that are removed upstream are cleaned up best-effort; any orphaned rows that fail to delete are logged and retried on the next ingest.Cloud Data Warehouses (Snowflake, Redshift, BigQuery)
Cloud Data Warehouses (Snowflake, Redshift, BigQuery)
Amazon Redshift speaks the PostgreSQL wire protocol, so the standard connection string auto-detection works:Snowflake requires constructing a dlt The
sql_database source manually (install snowflake-sqlalchemy first):account_identifier is the part before .snowflakecomputing.com in your Snowflake URL (e.g. myorg-myaccount). Omit table_names to ingest all tables in the schema.Google BigQuery works the same way using dlt’s BigQuery connector — construct the source and pass it directly to cognee.remember(). See the dlt sql_database docs for connector-specific setup.Remember Operation
Learn more about data ingestion in cognee
dlt Documentation
Official dlt documentation and guides