Cognify
Transform datasets into structured knowledge graphs through cognitive processing.
This endpoint is the core of Cognee’s intelligence layer, responsible for converting raw text, documents, and data added through the add endpoint into semantic knowledge graphs. It performs deep analysis to extract entities, relationships, and insights from ingested content.
Processing Pipeline
- Document classification and permission validation
- Text chunking and semantic segmentation
- Entity extraction using LLM-powered analysis
- Relationship detection and graph construction
- Vector embeddings generation for semantic search
- Content summarization and indexing
Request Parameters
- datasets (Optional[List[str]]): List of dataset names to process. Dataset names are resolved to datasets owned by the authenticated user.
- dataset_ids (Optional[List[UUID]]): List of existing dataset UUIDs to process. UUIDs allow processing of datasets not owned by the user (if permitted).
- run_in_background (Optional[bool]): Whether to execute processing asynchronously. Defaults to False (blocking).
- custom_prompt (Optional[str]): Custom prompt for entity extraction and graph generation. If provided, this prompt will be used instead of the default prompts for knowledge graph extraction.
- chunk_size (Optional[int]): Maximum tokens per chunk. If omitted, Cognee chooses a size from the configured LLM and embedding limits.
- ontology_key (Optional[List[str]]): Reference to one or more previously uploaded ontology files to use for knowledge graph construction.
- data_per_batch (Optional[int]): Maximum number of data items to process concurrently within a dataset. Defaults to 20.
Response
- Blocking execution: Complete pipeline run information with entity counts, processing duration, and success/failure status
- Background execution: Pipeline run metadata including pipeline_run_id for status monitoring via WebSocket subscription
Error Codes
- 400 Bad Request: When neither datasets nor dataset_ids are provided, or when specified datasets don’t exist
- 409 Conflict: When processing fails due to system errors, missing LLM API keys, database connection failures, or corrupted content
Example Request
{
"datasets": ["research_papers", "documentation"],
"run_in_background": false,
"custom_prompt": "Extract entities focusing on technical concepts and their relationships. Identify key technologies, methodologies, and their interconnections.",
"ontology_key": ["medical_ontology_v1"]
}
Notes
To cognify data in datasets not owned by the user and for which the current user has write permission, the dataset_id must be used (when ENABLE_BACKEND_ACCESS_CONTROL is set to True).
Next Steps
After successful processing, use the search endpoints to query the generated knowledge graph for insights, relationships, and semantic search.
Documentation Index
Fetch the complete documentation index at: https://docs.cognee.ai/llms.txt
Use this file to discover all available pages before exploring further.
Authorizations
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Body
[]{}Custom prompt for entity extraction and graph generation
Maximum tokens per chunk. Defaults to automatic model-based sizing.
4096
Reference to one or more previously uploaded ontologies
[]Number of chunks to process per task batch in Cognify (overrides default).
36
Maximum number of data items to process concurrently within a dataset.
20
Response
Successful Response
The response is of type Response Cognify Api V1 Cognify Post · object.