DataPoints: The Building Blocks of Knowledge Graphs

Overview

In cognee, a DataPoint is the core building block of a knowledge graph. Each DataPoint type defines both nodes (entities) and their connections (relationships), providing a structured way to model real-world information.

Think of it as a smart container that represents entities and automatically handles their relationships with other entities. Each DataPoint carries its own metadata, relationships, and embedding capabilities. This makes it easy to store, query, and expand knowledge dynamically, without needing to rely on rigid database schemas.

Core DataPoint Structure

Core Concept

Nodes (Entities): Each DataPoint instance becomes a node in the knowledge graph
Edges (Relationships): The fields within DataPoints define connections to other DataPoints
Automatic Graph Formation: When you create DataPoint instances, they automatically form an interconnected graph

Base DataPoint Structure

Every DataPoint in cognee is built on a common foundation:


class DataPoint(BaseModel):
    id: UUID                                    # Unique identifier
    created_at: int                            # Creation timestamp 
    updated_at: int                            # Last update timestamp 
    ontology_valid: bool = False               # Validation status via ontology
    version: int = 1                           # Version number for tracking changes
    topological_rank: Optional[int] = 0        # Graph ordering rank
    metadata: Optional[MetaData] = {"index_fields": []}  # Indexing and embedding configuration
    type: str                                  # Automatically set to class name
    belongs_to_set: Optional[List["DataPoint"]] = None  # Set membership

Key Properties

Unique Identity: Every DataPoint has a UUID for precise identification
Versioning: Built-in version tracking with automatic timestamp updates
Self-Describing: The type field automatically reflects the DataPoint’s class name
Metadata-Driven: The metadata field controls how the DataPoint is indexed and embedded
Graph-Aware: Can be part of larger node sets and maintain topological relationships

How DataPoints Work

1. Creating Custom DataPoints

DataPoints are designed to be extended for specific use cases:


from cognee.low_level import DataPoint
 
class Person(DataPoint):
    name: str
    age: int
    metadata: dict = {"index_fields": ["name"]}  # Name will be embedded/indexed
 
class Company(DataPoint):
    name: str
    employees: list[Person]  # Relationships to other DataPoints
    metadata: dict = {"index_fields": ["name"]}

2. Embedding and Indexing

The metadata["index_fields"] configuration determines which fields get embedded for semantic search:

Embeddable Data: Fields listed in index_fields are processed for vector embeddings
Automatic Indexing: When DataPoints are stored, specified fields are automatically indexed
Search Optimization: Indexed fields enable fast semantic and vector-based retrieval

3. Relationships and Graph Structure

DataPoints can reference other DataPoints, creating a rich knowledge graph:


class Department(DataPoint):
    name: str
    employees: list[Person]  # List of Person DataPoints
 
class Company(DataPoint):
    name: str
    departments: list[Department]  # Nested DataPoint relationships

Role in the cognee System

1. Data Ingestion Pipeline


Raw Data → DataPoint Creation → Graph Conversion → Storage → Indexing

Input Processing: Raw data (documents, code, JSON) is converted into specific DataPoint types
Graph Generation: DataPoints and their relationships are converted to nodes and edges
Storage: DataPoints are stored in the graph database
Vector Indexing: Embeddable fields are indexed in the vector database for search

2. Knowledge Graph Foundation

DataPoints serve as the nodes in cognee’s knowledge graph:

Each DataPoint becomes a node with its properties
Relationships between DataPoints become edges
The graph structure enables complex queries and reasoning

3. Search and Retrieval

DataPoints enable multiple search strategies:

Semantic Search: Through embedded index_fields
Graph Traversal: Following relationships between DataPoints
Hybrid Queries: Combining vector similarity with graph structure

Adding DataPoints to the Graph

The add_data_points function transforms structured data into a fully indexed, queryable knowledge graph:


import cognee
 
# Create DataPoint instances
data_points = [alice, book, park, purchase_event, reading_event]
 
# Add to the knowledge graph
await cognee.add_data_points(data_points)

What Happens Under the Hood

When you call add_data_points, cognee:

Recursive Extraction: Traverses all connected DataPoints, extracting nodes and edges using the ontology-like structure
Deduplication: Removes duplicate nodes and edges to keep the graph clean
Graph Storage: Adds cleaned nodes and edges to cognee’s graph engine
Indexing: Creates indexes for fast lookups and efficient graph traversal

Built-in DataPoint Types

Cognee includes several specialized DataPoint subclasses:

Entity: Named concepts with descriptions and relationships
DocumentChunk: Text fragments linked to their source documents
Document: Source document metadata and content
NodeSet: Collections of related data points for organization

Best Practices

Design Guidelines

Single Responsibility: Each DataPoint should represent one clear concept
Clear Relationships: Use descriptive field names for relationships (by, of, at, etc.)
Consistent Metadata: Always include proper indexing metadata
Optional Fields: Use Optional types for fields that may not always be present

Performance Considerations

Index Strategy: Only index fields you’ll frequently query
Relationship Depth: Consider the complexity of nested relationships
Batch Operations: Process multiple DataPoints together when possible
Deduplication: Design DataPoints to avoid unnecessary duplicates

Data Modeling

Start Simple: Begin with basic entity types and add complexity gradually
Think in Graphs: Consider how your DataPoints will connect to each other
Version Control: Use the built-in versioning for data evolution
Validation: Leverage Pydantic’s validation capabilities

Examples

1. Define Clear Index Fields


# Good: Specify which fields should be searchable
class Product(DataPoint):
    name: str
    description: str
    price: float
    metadata: dict = {"index_fields": ["name", "description"]}

2. Model Relationships Explicitly


# Good: Use typed relationships
class Author(DataPoint):
    name: str
 
class Book(DataPoint):
    title: str
    author: Author  # Clear relationship

3. Use Descriptive Names


# Good: Clear, descriptive class names
class GitRepository(DataPoint):
class PythonFunction(DataPoint):
class CustomerOrder(DataPoint):

Next Steps

Learn about more about DataPoints from our blogs. You can start with The Building Blocks of Knowledge Graphs and you can read about an example where we built DataPoints here .
Understand how to build complex knowledge graphs with multiple DataPoint types
Discover tasks and pipelines

DataPoints are the foundation that makes Cognee’s knowledge graphs intelligent and interconnected. By understanding this concept, you’re ready to build powerful, relationship-aware knowledge systems.