save_data_item_to_storage.py
: Saves individual data items to the storage system.ingest_data_with_metadata.py
: Ingests data along with its associated metadata.save_data_to_storage.py
: Saves bulk data to the storage system.get_dlt_destination.py
: Resolves the destination for data load transformations.resolve_data_directories.py
: Resolves directories for data ingestion.ingest_data.py
: Main task for ingesting data into the system.transform_data.py
: Transforms ingested data for further processing.save_data_item_with_metadata_to_storage.py
: Saves individual data items along with metadata to storage.build_graph_with_temporal_awareness.py
: Constructs graphs with temporal context for nodes and edges.search_graph_with_temporal_awareness.py
: Searches graphs considering temporal aspects.summarize_code.py
: Generates summaries for code files.query_summaries.py
: Queries existing summaries for specific contexts.summarize_text.py
: Summarizes text documents.generate_golden_set.py
: Creates a “golden set” of data for benchmarking or testing.extract_graph_from_code.py
: Extracts graph representations from codebases.infer_data_ontology.py
: Infers ontologies and relationships from graph data.query_graph_connections.py
: Queries connections and relationships within graphs.extract_graph_from_data.py
: Extracts graphs from structured and unstructured data.expand_dependency_graph_checker.py
: Expands existing dependency graphs.get_repo_dependency_graph_checker.py
: Retrieves dependency graphs for repositories.enrich_dependency_graph_checker.py
: Enriches dependency graphs with additional context.get_local_dependencies_checker.py
: Identifies local dependencies within code repositories.query_completion.py
: Processes and handles completion requests.exceptions.py
: Defines exceptions related to completion tasks.chunk_naive_llm_classifier.py
: Classifies text chunks using a naive LLM-based approach.remove_disconnected_chunks.py
: Removes text chunks that are disconnected from the main content.chunk_by_sentence.py
: Splits text into chunks by sentences.chunk_by_word.py
: Splits text into chunks by words.chunk_by_paragraph.py
: Splits text into chunks by paragraphs.query_chunks.py
: Queries existing chunks for specific content.index_data_points.py
: Indexes data points for efficient retrieval.index_graph_edges.py
: Indexes edges within a graph structure.add_data_points.py
: Adds new data points to the storage system.extract_code_parts.py
: Extracts parts of code for analysis.get_repo_file_dependencies.py
: Retrieves file-level dependencies in a repository.top_down_repo_parse.py
: Parses repositories from a top-down perspective.enrich_dependency_graph.py
: Enriches dependency graphs with additional data.get_local_dependencies.py
: Identifies local dependencies in repositories.expand_dependency_graph.py
: Expands the scope of dependency graphs.extract_chunks_from_documents.py
: Extracts text chunks from documents.classify_documents.py
: Classifies documents into categories.check_permissions_on_documents.py
: Verifies permissions for accessing documents.data_chunks: list[DocumentChunk]
: A list of text chunks to be classified. Each chunk represents a piece of text and includes metadata like chunk_id
and document_id
.classification_model: Type[BaseModel]
: The model used to classify each chunk of text. This model is expected to output labels that categorize the text.asyncio.gather
to concurrently classify each chunk of text. extract_categories
is called for each chunk, and the results are collected in chunk_classifications
.
data_points
(representing the classification results) and constructs nodes and edges to represent relationships between chunks and their classifications:
data_chunks
, which can now be used further as needed: