Pipelines in cognee
Cognee uses tasks grouped into pipelines that efficiently populate graph and vector stores. These tasks process the data in order to create a semantic layer that will improve the quality of answers produced by Large Language Models (LLMs).
You can think of tasks as Python functions and pipelines as groups of Python functions executing in an order.
Tasks are managed and executed asynchronously using the run_tasks
and run_tasks_parallel
functions.
pipeline = run_tasks(tasks, documents)
async for result in pipeline:
print(result)
Main pipeline: cognify
The main pipeline currently implemented in cognee is designed to process data in a structured way and populate the graph and vector stores.
cognee pipeline description
The pipeline consists of several tasks, each responsible for different parts of the processing:
-
classify_documents: Converts each document into a specific Document type: PdfDocument, AudioDocument, ImageDocument or TextDocument
-
apply_ontology: Creates an Ontology mapper for the graph
-
check_permissions_on_documents: Checks if the user has the necessary permissions to access the documents
-
extract_chunks_from_documents: Extracts text chunks based on the document type
-
add_data_points: Creates nodes and edges from the chunks and their properties and adds them to the graph engine
-
extract_graph_from_data: Generates knowledge graphs from the document chunks
-
summarize_text: Extracts a summary for each chunk using an LLM
Cognee allows you to make your own custom pipelines you might need one.
Join the Conversation!
Have questions? Join our community now to connect with professionals, share insights, and get your questions answered!