Load your data

Difficulty: Easy

Overview

This tutorial demonstrates how to transform raw text into a structured knowledge graph using Cognee. You’ll learn how to:

Set up your Cognee development environment
Load data from various sources using the dlt integration
Process content through Cognee’s pipeline to extract entities and relationships
Visualize the generated knowledge graph in an interactive format
Search your knowledge graph with natural language queries

By the end of this tutorial, you’ll have transformed unstructured text into a rich, interconnected knowledge graph that enhances LLM reasoning capabilities.

The Transformation

Cognee helps you move from large amounts of unstructured text to structured, interconnected data. Here’s what this transformation looks like: Before - Raw Text:

After - Structured Knowledge Graph:

Prerequisites

Before starting this tutorial, ensure you have:

Python 3.9 to 3.12 installed
Git installed on your system
An OpenAI API key (or alternative LLM provider)
Basic familiarity with Python and command line

Step 1: Environment Setup

Clone the repositories

First, clone both the main Cognee repository and the starter examples:

# Clone the main Cognee repository
git clone https://github.com/topoteretes/cognee.git

# Clone the getting started examples
git clone https://github.com/topoteretes/cognee-starter.git

These repositories contain all the necessary code and examples for this tutorial.

Configure your API key

Set up your environment variables in a .env file:

echo 'LLM_API_KEY="your_openai_api_key_here"' > .env

This enables Cognee to use your LLM provider for entity extraction and relationship building.

Step 2: Install Cognee

Navigate to the Cognee repository and install with all dependencies:

cd cognee
uv sync --dev --all-extras --reinstall

This command installs Cognee with all optional dependencies, including support for various data sources through dlt.

Step 3: Prepare Your Data Pipeline

Create a new Python file for your data loading example:

touch load_data_example.py

Copy the following pipeline code into your file:

import asyncio
import webbrowser
import os
from cognee.api.v1.add import add
from cognee.api.v1.cognify import cognify
from cognee.api.v1.search import search, SearchType
from cognee.api.v1.visualize.visualize import visualize_graph

async def main():
    # Sample data to process
    sample_text = """
    Artificial Intelligence (AI) is revolutionizing healthcare through 
    machine learning algorithms that can analyze medical images, predict 
    patient outcomes, and assist in drug discovery. Deep learning models 
    are particularly effective at pattern recognition in radiology, 
    helping doctors detect early signs of cancer and other diseases.
    
    Natural Language Processing (NLP), a subset of AI, enables computers 
    to understand and process human language. This technology powers 
    chatbots, translation services, and sentiment analysis tools used 
    across various industries.
    
    Computer Vision, another AI domain, allows machines to interpret 
    visual information from the world around them. Applications include 
    autonomous vehicles, facial recognition systems, and quality control 
    in manufacturing.
    """
    
    print("🔄 Adding data to Cognee...")
    await add(sample_text)
    
    print("🧠 Processing data through Cognee pipeline...")
    await cognify()
    
    print("🔍 Searching the knowledge graph...")
    results = await search(
        query_text="How is AI being used in healthcare?",
        query_type=SearchType.GRAPH_COMPLETION
    )
    
    print("📊 Search Results:")
    for result in results:
        print(f"- {result}")
    
    print("📈 Generating visualization...")
    await visualize_graph()
    
    # Open the generated visualization
    home_dir = os.path.expanduser("~")
    html_file = os.path.join(home_dir, "graph_visualization.html")
    
    print(f"🌐 Opening visualization at: {html_file}")
    webbrowser.open(f"file://{html_file}")
    
    print("✅ Tutorial completed successfully!")

if __name__ == '__main__':
    asyncio.run(main())

This script demonstrates the complete Cognee workflow: adding data, processing it, searching the knowledge graph, and generating a visualization.

Step 4: Run Your Data Pipeline

Activate the virtual environment and execute your data loading pipeline:

source .venv/bin/activate && python load_data_example.pypython load_data_example.py

This command will process your text through Cognee’s pipeline, extracting entities like “Artificial Intelligence,” “machine learning,” and “healthcare,” then building relationships between them. You should see output similar to:

🔄 Adding data to Cognee...
🧠 Processing data through Cognee pipeline...
🔍 Searching the knowledge graph...
📊 Search Results:
- AI is revolutionizing healthcare through machine learning algorithms...
📈 Generating visualization...
🌐 Opening visualization at: /Users/yourname/graph-visualization.html
✅ Tutorial completed successfully!

Step 5: Explore Your Knowledge Graph

The pipeline generates an interactive HTML visualization that opens automatically in your browser. In this visualization, you can:

Navigate through connected entities and relationships
Click on nodes to see their properties and connections
Zoom and pan to explore different parts of the graph
Hover over edges to see relationship types

Understanding the Graph Structure

Your knowledge graph will contain:

Entities: Key concepts like “Artificial Intelligence,” “Machine Learning,” “Healthcare”
Relationships: Connections showing how concepts relate to each other
Attributes: Properties and descriptions for each entity
Context: Links back to the original text chunks

Step 6: Advanced Data Loading

Cognee supports loading data from 30+ sources through dlt integration. When you are adding a new set of data to your existing cognee memory after cognification or starting fresh, only thing you need to do is run cognee.add and cognee.cognify. Here are some examples for different inputs:

Loading from Files

import cognee

# Load from various file types
await cognee.add(["document.pdf", "data.csv", "content.txt"])
await cognee.cognify()

Loading from Databases

# Connect to relational databases
from cognee.infrastructure.databases.relational import (
    get_migration_relational_engine,
)
from cognee.tasks.ingestion import migrate_relational_database


engine = get_migration_relational_engine()
# Configure your database connection
# Then load and process data
schema = await engine.extract_schema()
graph_engine = await get_graph_engine()

await migrate_relational_database(graph_engine, schema=schema)

For detailed database integration, see our Migrate Relational DB to Cognee tutorial.

Step 7: Customizing Your Pipeline

You can customize various aspects of the data processing:

Chunk Size Configuration

import cognee

# Configure chunking strategy
cognee.config.chunk_size = 1024  # Adjust chunk size
cognee.config.chunk_overlap = 128  # Set overlap between chunks

Entity Extraction Settings

# Customize entity extraction
cognee.config.entity_extraction_prompt = "Extract key technical concepts and their relationships"

Search Configuration

# Try different search types
results_chunks = await search("your query", SearchType.CHUNKS)
results_insights = await search("your query", SearchType.INSIGHTS)

Next Steps

Now that you’ve successfully loaded your first data into Cognee, you can:

Explore other tutorials:
- Migrate Relational DB - Connect Cognee to SQL databases

Attach OWL Ontology - Define custom knowledge structures
Generate Codebase Graph - Analyze code repositories

Learn about core concepts:
- Memory Processing - Deep dive into Cognee’s pipeline

Architecture - Understanding system design

Try advanced features:
- API Integration - Use Cognee via REST API
- Custom Models - Run with local LLMs

Video Tutorial

If you prefer video learning, watch this introduction by our engineer Igor:

Join the Conversation!

Have questions about loading your data or want to share your knowledge graph visualizations? Join our community to connect with other developers and get support!

Getting Started

Core Concepts

Examples

CLI

Python SDK

UI

Load your data

Load your data

Overview

The Transformation

Prerequisites

Step 1: Environment Setup

Clone the repositories

Configure your API key

Step 2: Install Cognee

Step 3: Prepare Your Data Pipeline

Step 4: Run Your Data Pipeline

Step 5: Explore Your Knowledge Graph

Understanding the Graph Structure

Step 6: Advanced Data Loading

Loading from Files

Loading from Databases

Step 7: Customizing Your Pipeline

Chunk Size Configuration

Entity Extraction Settings

Search Configuration

Next Steps

Video Tutorial

Join the Conversation!

Getting Started

Core Concepts

Examples

CLI

Python SDK

UI

​Load your data

​Overview

​The Transformation

​Prerequisites

​Step 1: Environment Setup

​Clone the repositories

​Configure your API key

​Step 2: Install Cognee

​Step 3: Prepare Your Data Pipeline

​Step 4: Run Your Data Pipeline

​Step 5: Explore Your Knowledge Graph

​Understanding the Graph Structure

​Step 6: Advanced Data Loading

​Loading from Files

​Loading from Databases

​Step 7: Customizing Your Pipeline

​Chunk Size Configuration

​Entity Extraction Settings

​Search Configuration

​Next Steps

​Video Tutorial

​Join the Conversation!

Load your data

Overview

The Transformation

Prerequisites

Step 1: Environment Setup

Clone the repositories

Configure your API key

Step 2: Install Cognee

Step 3: Prepare Your Data Pipeline

Step 4: Run Your Data Pipeline

Step 5: Explore Your Knowledge Graph

Understanding the Graph Structure

Step 6: Advanced Data Loading

Loading from Files

Loading from Databases

Step 7: Customizing Your Pipeline

Chunk Size Configuration

Entity Extraction Settings

Search Configuration

Next Steps

Video Tutorial

Join the Conversation!