Skip to Content
TutorialsBuild Custom Knowledge Graphs

Build Custom Knowledge Graphs

Difficulty: Advanced

Overview

This tutorial demonstrates how to build custom knowledge graphs from scratch using Cognee’s low-level API. You’ll learn how to:

  • Define custom DataPoint classes for your domain
  • Create structured relationships between entities
  • Build custom data ingestion pipelines
  • Process data through Cognee’s low-level pipeline system
  • Visualize and query your custom knowledge graph

By the end of this tutorial, you’ll have created a complete organizational knowledge graph with companies, departments, and employees, demonstrating how to model complex real-world relationships.

What You’ll Build

We’ll create a knowledge graph representing organizational structures with:

  • Companies with multiple departments
  • Departments with employee lists
  • People working in specific departments
  • Company types for categorization
  • Rich relationships connecting all entities

Prerequisites

Before starting this tutorial, ensure you have:

  • Completed the Load Your Data tutorial
  • Python 3.9 to 3.12 installed
  • Cognee installed with development dependencies
  • Basic understanding of Python classes and async programming
  • Familiarity with JSON data structures

Step 1: Project Setup

Create your project structure

In the same environment you used during Load Your Data tutorial, set up a new directory for your custom graph project:

mkdir custom-graph-tutorial cd custom-graph-tutorial

Create the necessary directories and files:

mkdir data mkdir .artifacts touch build_graph.py touch data/companies.json touch data/people.json

This structure separates your data, code, and output artifacts for better organization.

Configure your environment

Create a .env file with your API credentials:

echo 'LLM_API_KEY="your_openai_api_key_here"' > .env

The low-level API still requires LLM access for certain graph operations and search functionality.


Step 2: Prepare Sample Data

Create company data

Add the following content to data/companies.json:

[ { "name": "TechCorp Solutions", "departments": ["Engineering", "Marketing", "Sales"] }, { "name": "GreenFuture Solutions", "departments": ["Research", "Engineering", "Operations"] }, { "name": "DataFlow Analytics", "departments": ["Data Science", "Engineering", "Customer Success"] } ]

Create employee data

Add the following content to data/people.json:

[ {"name": "Alice Johnson", "department": "Engineering"}, {"name": "Bob Smith", "department": "Engineering"}, {"name": "Carol Davis", "department": "Marketing"}, {"name": "David Wilson", "department": "Sales"}, {"name": "Eve Brown", "department": "Research"}, {"name": "Frank Miller", "department": "Operations"}, {"name": "Grace Lee", "department": "Data Science"}, {"name": "Henry Chen", "department": "Customer Success"}, {"name": "Ivy Rodriguez", "department": "Engineering"}, {"name": "Jack Thompson", "department": "Marketing"} ]

This sample data creates a realistic organizational structure with overlapping departments across companies.


Step 3: Define Custom DataPoint Classes

Create your build_graph.py file with custom entity definitions:

import os import uuid import json import asyncio import pathlib from cognee import config, prune, search, SearchType, visualize_graph from cognee.low_level import setup, DataPoint from cognee.pipelines import run_tasks, Task from cognee.tasks.storage import add_data_points from cognee.tasks.storage.index_graph_edges import index_graph_edges from cognee.modules.users.methods import get_default_user class Person(DataPoint): """Represents an individual employee""" name: str metadata: dict = {"index_fields": ["name"]} class Department(DataPoint): """Represents a company department with employees""" name: str employees: list[Person] metadata: dict = {"index_fields": ["name"]} class CompanyType(DataPoint): """Represents the type/category of companies""" name: str = "Company" class Company(DataPoint): """Represents a company with departments and type classification""" name: str departments: list[Department] is_type: CompanyType metadata: dict = {"index_fields": ["name"]}

These custom DataPoint classes define the structure of your knowledge graph. The metadata field with index_fields makes entities searchable by specific attributes.


Step 4: Create Data Ingestion Logic

Add the data ingestion function to your script:

def ingest_files(): """Load and process JSON data into DataPoint instances""" # Load company data companies_file_path = os.path.join(os.path.dirname(__file__), "data/companies.json") companies = json.loads(open(companies_file_path, "r").read()) # Load people data people_file_path = os.path.join(os.path.dirname(__file__), "data/people.json") people = json.loads(open(people_file_path, "r").read()) # Create person DataPoints and organize by department people_data_points = {} departments_data_points = {} print("🔄 Processing employee data...") for person in people: new_person = Person(name=person["name"]) people_data_points[person["name"]] = new_person # Group employees by department if person["department"] not in departments_data_points: departments_data_points[person["department"]] = Department( name=person["department"], employees=[new_person] ) else: departments_data_points[person["department"]].employees.append(new_person) # Create company DataPoints companies_data_points = {} # Create a single CompanyType node for all companies print("🏢 Creating company type classification...") companyType = CompanyType() print("🔄 Processing company data...") for company in companies: new_company = Company( name=company["name"], departments=[], is_type=companyType ) companies_data_points[company["name"]] = new_company # Link departments to companies for department_name in company["departments"]: if department_name not in departments_data_points: departments_data_points[department_name] = Department( name=department_name, employees=[] ) new_company.departments.append(departments_data_points[department_name]) print(f"✅ Created {len(companies_data_points)} companies with {len(departments_data_points)} departments") return companies_data_points.values()

This function demonstrates how to build complex relationships between entities. Notice how we create the relationships between people → departments → companies.


Step 5: Build the Main Pipeline

Add the main execution logic to your script:

async def main(): """Main pipeline for building and querying the custom knowledge graph""" # Setup Cognee system directory cognee_directory_path = str( pathlib.Path(os.path.join(pathlib.Path(__file__).parent, ".cognee_system")).resolve() ) config.system_root_directory(cognee_directory_path) print("🧹 Cleaning up previous runs...") # Prune system metadata for fresh state await prune.prune_system(metadata=True) print("⚙️ Setting up Cognee system...") await setup() # Generate unique dataset ID for this run dataset_id = uuid.uuid4() user = await get_default_user() print("🚀 Running custom data pipeline...") # Create and run custom pipeline pipeline = run_tasks( [ Task(ingest_files), # Load and process data Task(add_data_points), # Add to Cognee storage ], dataset_id, None, user, "custom_graph_pipeline" ) # Monitor pipeline execution async for status in pipeline: print(f"📊 Pipeline status: {status}") print("🔗 Indexing graph relationships...") # Index the graph edges for efficient querying await index_graph_edges() print("📈 Generating graph visualization...") # Create visualization graph_file_path = str( os.path.join(os.path.dirname(__file__), ".artifacts/graph_visualization.html") ) await visualize_graph(graph_file_path) print("🔍 Testing graph queries...") # Test different types of queries queries = [ "Who works for GreenFuture Solutions?", "Which departments does TechCorp Solutions have?", "List all employees in the Engineering department", "What companies have Research departments?" ] for query in queries: print(f"\n🤔 Query: {query}") completion = await search( query_text=query, query_type=SearchType.GRAPH_COMPLETION, ) print(f"💡 Answer: {completion}") print(f"🌐 Graph visualization saved to: {graph_file_path}") print("✅ Custom knowledge graph pipeline completed successfully!") if __name__ == "__main__": asyncio.run(main())

This main function orchestrates the entire process: data ingestion, storage, indexing, visualization, and querying.


Step 6: Run Your Custom Graph Pipeline

Execute your custom knowledge graph builder:

python build_graph.py

This will process your organizational data and create a rich, interconnected knowledge graph.

You should see output similar to:

🧹 Cleaning up previous runs... ⚙️ Setting up Cognee system... 🔄 Processing employee data... 🏢 Creating company type classification... 🔄 Processing company data... ✅ Created 3 companies with 8 departments 🚀 Running custom data pipeline... 📊 Pipeline status: Task completed successfully 🔗 Indexing graph relationships... 📈 Generating graph visualization... 🔍 Testing graph queries... 🤔 Query: Who works for GreenFuture Solutions? 💡 Answer: GreenFuture Solutions has employees in Research, Engineering, and Operations departments... 🌐 Graph visualization saved to: .artifacts/graph_visualization.html ✅ Custom knowledge graph pipeline completed successfully!

Step 7: Explore Your Custom Graph

Interactive Visualization

Open the generated HTML file to explore your knowledge graph:

open .artifacts/graph_visualization.html

In the visualization, you’ll see:

  • Company nodes connected to their departments
  • Department nodes linked to their employees
  • Employee nodes showing individual contributors
  • Type classification connecting all companies to the CompanyType

Graph Structure Analysis

Your custom graph demonstrates several important patterns:

  • Hierarchical relationships: Companies → Departments → People
  • Shared entities: Departments that exist across multiple companies
  • Type classification: All companies connected to a shared type
  • Bidirectional traversal: Navigate up and down the hierarchy

Step 8: Advanced Customization

Adding More Complex Relationships

Extend your DataPoint classes with additional relationships based on your data:

class Project(DataPoint): """Represents a project within a company""" name: str metadata: dict = {"index_fields": ["name"]} class Skill(DataPoint): """Represents a skill that people can have""" name: str category: str metadata: dict = {"index_fields": ["name", "category"]} class Person(DataPoint): """Enhanced person with skills and projects""" name: str skills: list[Skill] = [] current_projects: list[Project] = [] metadata: dict = {"index_fields": ["name"]}

Custom Search Types

Implement domain-specific search functionality:

# Search for people with specific skills skill_query = await search( query_text="Find all engineers with Python skills", query_type=SearchType.GRAPH_COMPLETION ) # Search for project collaborations collaboration_query = await search( query_text="Which people work together on projects?", query_type=SearchType.INSIGHTS )

Batch Data Processing

Handle larger datasets efficiently:

async def batch_ingest_employees(employee_data_batch): """Process employee data in batches for better performance""" batch_size = 100 for i in range(0, len(employee_data_batch), batch_size): batch = employee_data_batch[i:i + batch_size] # Process batch yield batch

Step 9: Integration with External Systems

Database Integration

Connect your custom graph to external databases:

import sqlalchemy from cognee.infrastructure.databases.relational import get_relational_engine async def load_from_database(): """Load organizational data from existing database""" engine = get_relational_engine() # Query your existing HR database query = """ SELECT e.name, e.department, c.company_name FROM employees e JOIN companies c ON e.company_id = c.id """ # Convert to DataPoints # ... processing logic

API Integration

Fetch data from external APIs:

import aiohttp async def load_from_api(): """Load organizational data from HR API""" async with aiohttp.ClientSession() as session: async with session.get('https://api.your-hr-system.com/employees') as response: employee_data = await response.json() # Convert to DataPoints return process_employee_data(employee_data)

Step 10: Testing and Validation

Validate Graph Structure

Add validation to ensure data integrity:

def validate_graph_structure(companies): """Validate the created graph structure""" print("🔍 Validating graph structure...") for company in companies: assert company.name, "Company must have a name" assert company.departments, "Company must have departments" assert company.is_type, "Company must have a type" for dept in company.departments: assert dept.name, "Department must have a name" # Further validation logic... print("✅ Graph structure validation passed")

Performance Testing

Monitor pipeline performance:

import time async def timed_pipeline(): """Run pipeline with performance monitoring""" start_time = time.time() # Run your pipeline await main() end_time = time.time() print(f"⏱️ Pipeline completed in {end_time - start_time:.2f} seconds")

Next Steps

Now that you’ve built your first custom knowledge graph, you can:

  1. Expand your domain model:

    • Add more entity types (Projects, Skills, Locations)
    • Create more complex relationships
    • Implement inheritance hierarchies
  2. Integrate with production systems:

    • Connect to your organization’s databases
    • Set up automated data synchronization
    • Implement real-time updates
  3. Explore advanced features:

  4. Build applications:

    • Create org chart visualizations
    • Build employee search systems
    • Develop recommendation engines

Join the Conversation!

Built something amazing with custom knowledge graphs? Share your creations and get help from the community!