Project Modules Documentation

This document provides an overview of the modules in the project, their purposes, and the key functionalities they handle.


1. Settings Module

Handles configuration and settings management for the system.

  • save_llm_config.py: Saves configuration for language model settings.
  • save_vector_db_config.py: Saves settings for vector database configurations.
  • get_settings.py: Retrieves general settings.
  • get_current_settings.py: Retrieves the currently active settings.

2. Ingestion Module

Manages data ingestion, classification, and identification processes.

  • save_data_to_file.py: Saves ingested data to files.
  • classify.py: Classifies datasets during ingestion.
  • discover_directory_datasets.py: Discovers datasets in specified directories.
  • get_matched_datasets.py: Retrieves datasets matching specific criteria.
  • identify.py: Identifies data characteristics.
  • Submodule: data_types:
    • TextData.py: Handles text data ingestion.
    • BinaryData.py: Manages binary data ingestion.
    • IngestionData.py: Defines generic ingestion data structures.
  • Submodule: exceptions:
    • exceptions.py: Defines custom exceptions for ingestion-related issues.

3. Graph Module

Focuses on graph-related operations, including graph creation, manipulation, and utility functions.

  • Submodule: utils:
    • Utility scripts for node and edge handling, such as:
      • convert_node_to_data_point.py
      • deduplicate_nodes_and_edges.py
  • Submodule: cognee_graph:
    • Core graph handling, including:
      • CogneeGraph.py: Main graph class.
      • CogneeAbstractGraph.py: Abstract base for graph implementations.
  • Submodule: models:
    • EdgeType.py: Defines edge types within the graph.
  • Submodule: exceptions:
    • exceptions.py: Custom exceptions for graph operations.

4. Pipelines Module

The pipelines module is designed to facilitate the execution and management of workflows, consisting of interconnected tasks. It provides tools for defining tasks, organizing them into pipelines, executing them sequentially or in parallel, and monitoring their execution status.

  • models/: Defines pipeline and task models.
  • operations/: Contains operations for running tasks, parallelization, logging pipeline statuses, and retrieving pipeline states.

5. Chunking Module

Handles text chunking for processing and storage.

  • TextChunker.py: Main chunking logic.
  • models/DocumentChunk.py: Defines the structure of document chunks.

6. Cognify Module

Handles configuration and initialization of the system.

  • config.py: System configuration settings.

7. Search Module

Manages search functionality, including query and result logging.

  • models/: Defines search-related models like Query and Result.
  • operations/: Includes scripts for handling queries and results.

8. Retrieval Module

Handles retrieval operations.

  • description_to_codepart_search.py: Maps descriptions to code parts.
  • brute_force_triplet_search.py: Implements a brute-force approach to triplet searches.

9. Users Module

Manages user-related functionality, including authentication, permissions, and user data.

  • Submodule: methods:
    • Handles user-related operations such as creation, deletion, and authentication.
  • Submodule: models:
    • Defines user-related models, including User, Group, and Permission.
  • Submodule: permissions:
    • Manages permissions on documents and resources.
  • Submodule: exceptions:
    • Custom exceptions for user-related operations.
  • Submodule: authentication:
    • Handles user authentication mechanisms.

10. Data Module

Handles data operations, processing, and management.

  • Submodule: methods:
    • Includes dataset and data management scripts.
  • Submodule: processing:
    • Processes document types like ImageDocument, AudioDocument, and TextDocument.
  • Submodule: operations:
    • Operations like translation, language detection, and metadata handling.
  • Submodule: extraction:
    • Extracts topics, summaries, and categories.
    • Includes knowledge graph extraction utilities.

11. Engine Module

Provides utilities and models for system operations.

  • Submodule: utils:
    • Node and edge generation utilities.
  • Submodule: models:
    • Defines entities and their types.

For further details on each module, refer to the inline documentation and comments within the respective files.