Load your data

Difficulty: Easy

Overview

This tutorial demonstrates how to transform raw text into a structured knowledge graph using Cognee. You’ll learn how to:
  • Set up your Cognee development environment
  • Load data from various sources using the dlt integration
  • Process content through Cognee’s pipeline to extract entities and relationships
  • Visualize the generated knowledge graph in an interactive format
  • Search your knowledge graph with natural language queries
By the end of this tutorial, you’ll have transformed unstructured text into a rich, interconnected knowledge graph that enhances LLM reasoning capabilities.

The Transformation

Cognee helps you move from large amounts of unstructured text to structured, interconnected data. Here’s what this transformation looks like: Before - Raw Text: Before: Raw unstructured text After - Structured Knowledge Graph: After: Interconnected knowledge graph

Prerequisites

Before starting this tutorial, ensure you have:
  • Python 3.9 to 3.12 installed
  • Git installed on your system
  • An OpenAI API key (or alternative LLM provider)
  • Basic familiarity with Python and command line

Step 1: Environment Setup

Clone the repositories

First, clone both the main Cognee repository and the starter examples:
# Clone the main Cognee repository
git clone https://github.com/topoteretes/cognee.git

# Clone the getting started examples
git clone https://github.com/topoteretes/cognee-starter.git
These repositories contain all the necessary code and examples for this tutorial.

Configure your API key

Set up your environment variables in a .env file:
echo 'LLM_API_KEY="your_openai_api_key_here"' > .env
This enables Cognee to use your LLM provider for entity extraction and relationship building.

Step 2: Install Cognee

Navigate to the Cognee repository and install with all dependencies:
cd cognee
uv sync --dev --all-extras --reinstall
This command installs Cognee with all optional dependencies, including support for various data sources through dlt.

Step 3: Prepare Your Data Pipeline

Create a new Python file for your data loading example:
touch load_data_example.py
Copy the following pipeline code into your file:
import asyncio
import webbrowser
import os
from cognee.api.v1.add import add
from cognee.api.v1.cognify import cognify
from cognee.api.v1.search import search, SearchType
from cognee.api.v1.visualize.visualize import visualize_graph

async def main():
    # Sample data to process
    sample_text = """
    Artificial Intelligence (AI) is revolutionizing healthcare through 
    machine learning algorithms that can analyze medical images, predict 
    patient outcomes, and assist in drug discovery. Deep learning models 
    are particularly effective at pattern recognition in radiology, 
    helping doctors detect early signs of cancer and other diseases.
    
    Natural Language Processing (NLP), a subset of AI, enables computers 
    to understand and process human language. This technology powers 
    chatbots, translation services, and sentiment analysis tools used 
    across various industries.
    
    Computer Vision, another AI domain, allows machines to interpret 
    visual information from the world around them. Applications include 
    autonomous vehicles, facial recognition systems, and quality control 
    in manufacturing.
    """
    
    print("🔄 Adding data to Cognee...")
    await add(sample_text)
    
    print("🧠 Processing data through Cognee pipeline...")
    await cognify()
    
    print("🔍 Searching the knowledge graph...")
    results = await search(
        query_text="How is AI being used in healthcare?",
        query_type=SearchType.GRAPH_COMPLETION
    )
    
    print("📊 Search Results:")
    for result in results:
        print(f"- {result}")
    
    print("📈 Generating visualization...")
    await visualize_graph()
    
    # Open the generated visualization
    home_dir = os.path.expanduser("~")
    html_file = os.path.join(home_dir, "graph_visualization.html")
    
    print(f"🌐 Opening visualization at: {html_file}")
    webbrowser.open(f"file://{html_file}")
    
    print("✅ Tutorial completed successfully!")

if __name__ == '__main__':
    asyncio.run(main())
This script demonstrates the complete Cognee workflow: adding data, processing it, searching the knowledge graph, and generating a visualization.

Step 4: Run Your Data Pipeline

Activate the virtual environment and execute your data loading pipeline:
source .venv/bin/activate && python load_data_example.pypython load_data_example.py
This command will process your text through Cognee’s pipeline, extracting entities like “Artificial Intelligence,” “machine learning,” and “healthcare,” then building relationships between them. You should see output similar to:
🔄 Adding data to Cognee...
🧠 Processing data through Cognee pipeline...
🔍 Searching the knowledge graph...
📊 Search Results:
- AI is revolutionizing healthcare through machine learning algorithms...
📈 Generating visualization...
🌐 Opening visualization at: /Users/yourname/graph-visualization.html
✅ Tutorial completed successfully!

Step 5: Explore Your Knowledge Graph

The pipeline generates an interactive HTML visualization that opens automatically in your browser. In this visualization, you can:
  • Navigate through connected entities and relationships
  • Click on nodes to see their properties and connections
  • Zoom and pan to explore different parts of the graph
  • Hover over edges to see relationship types

Understanding the Graph Structure

Your knowledge graph will contain:
  • Entities: Key concepts like “Artificial Intelligence,” “Machine Learning,” “Healthcare”
  • Relationships: Connections showing how concepts relate to each other
  • Attributes: Properties and descriptions for each entity
  • Context: Links back to the original text chunks

Step 6: Advanced Data Loading

Cognee supports loading data from 30+ sources through dlt integration. When you are adding a new set of data to your existing cognee memory after cognification or starting fresh, only thing you need to do is run cognee.add and cognee.cognify. Here are some examples for different inputs:

Loading from Files

import cognee

# Load from various file types
await cognee.add(["document.pdf", "data.csv", "content.txt"])
await cognee.cognify()

Loading from Databases

# Connect to relational databases
from cognee.infrastructure.databases.relational import (
    get_migration_relational_engine,
)
from cognee.tasks.ingestion import migrate_relational_database


engine = get_migration_relational_engine()
# Configure your database connection
# Then load and process data
schema = await engine.extract_schema()
graph_engine = await get_graph_engine()

await migrate_relational_database(graph_engine, schema=schema)

For detailed database integration, see our Migrate Relational DB to Cognee tutorial.

Step 7: Customizing Your Pipeline

You can customize various aspects of the data processing:

Chunk Size Configuration

import cognee

# Configure chunking strategy
cognee.config.chunk_size = 1024  # Adjust chunk size
cognee.config.chunk_overlap = 128  # Set overlap between chunks

Entity Extraction Settings

# Customize entity extraction
cognee.config.entity_extraction_prompt = "Extract key technical concepts and their relationships"

Search Configuration

# Try different search types
results_chunks = await search("your query", SearchType.CHUNKS)
results_insights = await search("your query", SearchType.INSIGHTS)

Next Steps

Now that you’ve successfully loaded your first data into Cognee, you can:
  1. Explore other tutorials:
  1. Learn about core concepts:
  1. Try advanced features:

Video Tutorial

If you prefer video learning, watch this introduction by our engineer Igor:

Join the Conversation!

Have questions about loading your data or want to share your knowledge graph visualizations? Join our community to connect with other developers and get support!