Skip to Content
Core ConceptsDataPoints

DataPoints: The Building Blocks of Knowledge Graphs

Overview

In cognee, a DataPoint is the core building block of a knowledge graph. Each DataPoint type defines both nodes (entities) and their connections (relationships), providing a structured way to model real-world information.

Think of it as a smart container that represents entities and automatically handles their relationships with other entities. Each DataPoint carries its own metadata, relationships, and embedding capabilities. This makes it easy to store, query, and expand knowledge dynamically, without needing to rely on rigid database schemas.

Core DataPoint Structure

Core Concept

  • Nodes (Entities): Each DataPoint instance becomes a node in the knowledge graph
  • Edges (Relationships): The fields within DataPoints define connections to other DataPoints
  • Automatic Graph Formation: When you create DataPoint instances, they automatically form an interconnected graph

Base DataPoint Structure

Every DataPoint in cognee is built on a common foundation:

class DataPoint(BaseModel): id: UUID # Unique identifier created_at: int # Creation timestamp updated_at: int # Last update timestamp ontology_valid: bool = False # Validation status via ontology version: int = 1 # Version number for tracking changes topological_rank: Optional[int] = 0 # Graph ordering rank metadata: Optional[MetaData] = {"index_fields": []} # Indexing and embedding configuration type: str # Automatically set to class name belongs_to_set: Optional[List["DataPoint"]] = None # Set membership

Key Properties

  • Unique Identity: Every DataPoint has a UUID for precise identification
  • Versioning: Built-in version tracking with automatic timestamp updates
  • Self-Describing: The type field automatically reflects the DataPoint’s class name
  • Metadata-Driven: The metadata field controls how the DataPoint is indexed and embedded
  • Graph-Aware: Can be part of larger node sets and maintain topological relationships

How DataPoints Work

1. Creating Custom DataPoints

DataPoints are designed to be extended for specific use cases:

from cognee.low_level import DataPoint class Person(DataPoint): name: str age: int metadata: dict = {"index_fields": ["name"]} # Name will be embedded/indexed class Company(DataPoint): name: str employees: list[Person] # Relationships to other DataPoints metadata: dict = {"index_fields": ["name"]}

2. Embedding and Indexing

The metadata["index_fields"] configuration determines which fields get embedded for semantic search:

  • Embeddable Data: Fields listed in index_fields are processed for vector embeddings
  • Automatic Indexing: When DataPoints are stored, specified fields are automatically indexed
  • Search Optimization: Indexed fields enable fast semantic and vector-based retrieval

3. Relationships and Graph Structure

DataPoints can reference other DataPoints, creating a rich knowledge graph:

class Department(DataPoint): name: str employees: list[Person] # List of Person DataPoints class Company(DataPoint): name: str departments: list[Department] # Nested DataPoint relationships

Role in the cognee System

1. Data Ingestion Pipeline

Raw Data → DataPoint Creation → Graph Conversion → Storage → Indexing
  1. Input Processing: Raw data (documents, code, JSON) is converted into specific DataPoint types
  2. Graph Generation: DataPoints and their relationships are converted to nodes and edges
  3. Storage: DataPoints are stored in the graph database
  4. Vector Indexing: Embeddable fields are indexed in the vector database for search

2. Knowledge Graph Foundation

DataPoints serve as the nodes in cognee’s knowledge graph:

  • Each DataPoint becomes a node with its properties
  • Relationships between DataPoints become edges
  • The graph structure enables complex queries and reasoning

3. Search and Retrieval

DataPoints enable multiple search strategies:

  • Semantic Search: Through embedded index_fields
  • Graph Traversal: Following relationships between DataPoints
  • Hybrid Queries: Combining vector similarity with graph structure

Adding DataPoints to the Graph

The add_data_points function transforms structured data into a fully indexed, queryable knowledge graph:

import cognee # Create DataPoint instances data_points = [alice, book, park, purchase_event, reading_event] # Add to the knowledge graph await cognee.add_data_points(data_points)

What Happens Under the Hood

When you call add_data_points, cognee:

  1. Recursive Extraction: Traverses all connected DataPoints, extracting nodes and edges using the ontology-like structure
  2. Deduplication: Removes duplicate nodes and edges to keep the graph clean
  3. Graph Storage: Adds cleaned nodes and edges to cognee’s graph engine
  4. Indexing: Creates indexes for fast lookups and efficient graph traversal

Built-in DataPoint Types

Cognee includes several specialized DataPoint subclasses:

  • Entity: Named concepts with descriptions and relationships
  • DocumentChunk: Text fragments linked to their source documents
  • Document: Source document metadata and content
  • NodeSet: Collections of related data points for organization

Best Practices

Design Guidelines

  1. Single Responsibility: Each DataPoint should represent one clear concept
  2. Clear Relationships: Use descriptive field names for relationships (by, of, at, etc.)
  3. Consistent Metadata: Always include proper indexing metadata
  4. Optional Fields: Use Optional types for fields that may not always be present

Performance Considerations

  1. Index Strategy: Only index fields you’ll frequently query
  2. Relationship Depth: Consider the complexity of nested relationships
  3. Batch Operations: Process multiple DataPoints together when possible
  4. Deduplication: Design DataPoints to avoid unnecessary duplicates

Data Modeling

  1. Start Simple: Begin with basic entity types and add complexity gradually
  2. Think in Graphs: Consider how your DataPoints will connect to each other
  3. Version Control: Use the built-in versioning for data evolution
  4. Validation: Leverage Pydantic’s validation capabilities

Examples

1. Define Clear Index Fields

# Good: Specify which fields should be searchable class Product(DataPoint): name: str description: str price: float metadata: dict = {"index_fields": ["name", "description"]}

2. Model Relationships Explicitly

# Good: Use typed relationships class Author(DataPoint): name: str class Book(DataPoint): title: str author: Author # Clear relationship

3. Use Descriptive Names

# Good: Clear, descriptive class names class GitRepository(DataPoint): class PythonFunction(DataPoint): class CustomerOrder(DataPoint):

Next Steps

DataPoints are the foundation that makes Cognee’s knowledge graphs intelligent and interconnected. By understanding this concept, you’re ready to build powerful, relationship-aware knowledge systems.