A minimal guide to creating custom data models and inserting them directly into the knowledge graph using add_data_points. Before you start:
  • Complete Quickstart to understand basic operations
  • Ensure you have LLM Providers configured
  • Have some structured data you want to model

What Custom Data Models Do

  • Define your own Pydantic models that inherit from DataPoint
  • Insert structured data directly into the knowledge graph without cognify
  • Create relationships between data points programmatically
  • Control exactly what gets indexed and how

Full Working Example

import asyncio
from typing import Any
from pydantic import SkipValidation

import cognee
from cognee.infrastructure.engine import DataPoint
from cognee.infrastructure.engine.models import Edge
from cognee.tasks.storage import add_data_points

class Person(DataPoint):
    name: str
    # Keep it simple for forward refs / mixed values
    knows: SkipValidation[Any] = None  # single Person or list[Person]
    # Recommended: specify which fields to index for search
    metadata: dict = {"index_fields": ["name"]}

async def main():
    # Start clean (optional in your app)
    await cognee.prune.prune_data()
    await cognee.prune.prune_system(metadata=True)

    alice = Person(name="Alice")
    bob = Person(name="Bob")
    charlie = Person(name="Charlie")

    # Create relationships - field name becomes edge label
    alice.knows = bob
    # You can also do lists: alice.knows = [bob, charlie]
    
    # Optional: add weights and custom relationship types
    bob.knows = (Edge(weight=0.9, relationship_type="friend_of"), charlie)

    await add_data_points([alice, bob, charlie])

asyncio.run(main())
This example shows the complete workflow with metadata for indexing and optional edge weights. In practice, you can create complex nested models with multiple relationships and sophisticated data structures.

What Just Happened

Step 1: Define Your Data Model

class Person(DataPoint):
    name: str
    knows: SkipValidation[Any] = None
    # Recommended: specify which fields to index for search
    metadata: dict = {"index_fields": ["name"]}
Create a Pydantic model that inherits from DataPoint. Use SkipValidation[Any] for fields that will hold other DataPoints to avoid forward reference issues. Metadata is recommended - it tells Cognee which fields to embed and store in the vector database for search.

Step 2: Create Data Instances

alice = Person(name="Alice")
bob = Person(name="Bob")
charlie = Person(name="Charlie")
Instantiate your models with the data you want to store. Each instance becomes a node in the knowledge graph.

Step 3: Create Relationships

alice.knows = bob
# Optional: add weights and custom relationship types
bob.knows = (Edge(weight=0.9, relationship_type="friend_of"), charlie)
Assign DataPoint instances to fields to create edges. The field name becomes the relationship label by default. Weights are optional - you can use Edge to add weights, custom relationship types, or other metadata to your relationships.

Step 4: Insert into Graph

await add_data_points([alice, bob, charlie])
This converts your DataPoint instances into nodes and edges in the knowledge graph, automatically handling the graph structure and indexing. The name field gets embedded and stored in the vector database for search.

Use in Custom Tasks and Pipelines

This approach is particularly useful when creating custom tasks and pipelines where you need to:
  • Insert structured data programmatically
  • Define specific relationships between known entities
  • Control exactly what gets indexed and how
  • Integrate with external data sources or APIs
You can combine this with cognify to extract knowledge from unstructured text, then add your own structured data on top.