Skip to main content
Ingest structured relational data — databases, CSV files, and dlt resources — directly into cognee’s knowledge graph. Foreign keys become graph edges, tables become schema nodes, and each row becomes a searchable document, all built deterministically from the schema without LLM extraction.

Why Use This Integration

  • Schema-Aware Graphs: Foreign key relationships are preserved as first-class edges in the knowledge graph
  • Deterministic Graph Construction: Structured data bypasses LLM entity extraction — no hallucination risk
  • Mixed Ingestion: Combine structured (dlt) and unstructured (text, PDF) data in the same dataset
  • Multiple Input Modes: Pass explicit dlt resources, CSV file paths, or database connection strings
  • Write Dispositions: Control how data is synced — merge (upsert), append, or replace

Installation

pip install 'cognee[dlt]'
Or with uv:
uv pip install 'cognee[dlt]'

Quick Start

1. Ingest a dlt Resource

Define a dlt resource and pass it to cognee.add():
import dlt
import cognee
import asyncio

@dlt.resource()
def users_and_pets():
    yield [
        {
            "id": 1,
            "name": "Alice",
            "pets": [
                {"id": 1, "name": "Fluffy", "type": "cat"},
                {"id": 2, "name": "Spot", "type": "dog"},
            ],
        },
        {
            "id": 2,
            "name": "Bob",
            "pets": [{"id": 3, "name": "Fido", "type": "dog"}],
        },
    ]

async def main():
    await cognee.add(
        users_and_pets,
        dataset_name="users_and_pets",
        primary_key="id",
    )
    await cognee.cognify()
    results = await cognee.search("Which pet does Alice have?")
    print(results)

asyncio.run(main())
dlt automatically detects nested structures (like pets inside each user) and creates separate tables with foreign key relationships.

2. Build and Query the Graph

Once cognify completes, use cognee.search to query the graph. See Search for all available search types.

Other Input Modes

CSV Auto-Detection

Pass a .csv file path and cognee creates a dlt source automatically:
await cognee.add(
    "/path/to/employees.csv",
    dataset_name="employees",
    primary_key="id",
)
await cognee.cognify()

Database Connection String

Ingest tables directly from an existing database:
await cognee.add(
    "postgresql://user:pass@host/db",
    dataset_name="company_db",
    primary_key="id",
)
await cognee.cognify()
Supported databases: SQLite, PostgreSQL, MySQL, MSSQL, Oracle. You can optionally filter with a SQL WHERE clause:
await cognee.add(
    "postgresql://user:pass@host/db",
    dataset_name="engineering_team",
    primary_key="id",
    query="SELECT * FROM employees WHERE department = 'Engineering'",
)

Mixed Structured + Unstructured

Combine dlt resources with unstructured text in a single dataset:
text = """Alice has two pets: a cat named Fluffy and a dog named Spot.
Bob has a dog named Fido, who is friendly with both Fluffy and Spot."""

await cognee.add(
    [text, users_and_pets],
    dataset_name="users_and_pets_with_text",
    primary_key="id",
)
await cognee.cognify()
Structured data creates deterministic graph nodes from the schema, while unstructured text goes through LLM-based entity extraction. Both are combined in the same knowledge graph.

Write Dispositions

Control how data is synced on repeated runs using the write_disposition parameter:
  • merge (default): Upsert by primary key — updates existing rows, inserts new ones. Best for data that changes over time.
  • append: Always insert without deduplication. Use for time-series data and event logs.
  • replace: Drop and recreate tables on each run. Use for full snapshot refreshes.
# Append mode — every call adds new rows, no dedup
await cognee.add(
    event_resource,
    dataset_name="events",
    primary_key="id",
    write_disposition="append",
)

How It Works

  1. Source Detection: cognee identifies dlt resources, CSV files, and connection strings in the input
  2. Pipeline Execution: A dlt pipeline loads data into a per-dataset staging database
  3. Schema Extraction: Table schemas, primary keys, and foreign keys are extracted
  4. Graph Construction: Each row becomes a document node; foreign keys become edges between nodes
  5. LLM Bypass: Structured rows skip chunking, entity extraction, and summarization — the graph is built entirely from schema metadata
The primary_key parameter controls upsert behavior. If not specified, cognee auto-detects from an id column or falls back to the first column. Set DLT_MAX_ROWS_PER_TABLE (default: 50) to control the maximum rows ingested per table.

Use Cases

Load customer, order, and product tables from a database. Foreign keys between tables (e.g., order.customer_id → customer.id) become graph edges, enabling cross-table queries like “Which customers ordered product X?”
Point cognee at CSV exports from analytics tools. Each row becomes a searchable node in the graph, and you can combine them with unstructured reports in the same dataset.
Use write_disposition="append" to stream event batches into cognee without deduplication. Query across the full event history with natural language.
Use write_disposition="merge" to keep cognee’s graph in sync with a live database. Rows that are removed upstream are automatically cleaned up.

Add Operation

Learn more about data ingestion in cognee

dlt Documentation

Official dlt documentation and guides