cognify

Cognify

Transform datasets into structured knowledge graphs through cognitive processing.

This endpoint is the core of Cognee’s intelligence layer, responsible for converting raw text, documents, and data added through the add endpoint into semantic knowledge graphs. It performs deep analysis to extract entities, relationships, and insights from ingested content.

Processing Pipeline

Document classification and permission validation
Text chunking and semantic segmentation
Entity extraction using LLM-powered analysis
Relationship detection and graph construction
Vector embeddings generation for semantic search
Content summarization and indexing

Request Parameters

datasets (Optional[List[str]]): List of dataset names to process. Dataset names are resolved to datasets owned by the authenticated user.
dataset_ids (Optional[List[UUID]]): List of existing dataset UUIDs to process. UUIDs allow processing of datasets not owned by the user (if permitted).
run_in_background (Optional[bool]): Whether to execute processing asynchronously. Defaults to False (blocking).
custom_prompt (Optional[str]): Custom prompt for entity extraction and graph generation. If provided, this prompt will be used instead of the default prompts for knowledge graph extraction.
chunk_size (Optional[int]): Maximum tokens per chunk. If omitted, Cognee chooses a size from the configured LLM and embedding limits.
ontology_key (Optional[List[str]]): Reference to one or more previously uploaded ontology files to use for knowledge graph construction.
data_per_batch (Optional[int]): Maximum number of data items to process concurrently within a dataset. Defaults to 20.

Response

Blocking execution: Complete pipeline run information with entity counts, processing duration, and success/failure status
Background execution: Pipeline run metadata including pipeline_run_id for status monitoring via WebSocket subscription

Error Codes

400 Bad Request: When neither datasets nor dataset_ids are provided, or when specified datasets don’t exist
409 Conflict: When processing fails due to system errors, missing LLM API keys, database connection failures, or corrupted content

Example Request

{
    "datasets": ["research_papers", "documentation"],
    "run_in_background": false,
    "custom_prompt": "Extract entities focusing on technical concepts and their relationships. Identify key technologies, methodologies, and their interconnections.",
    "ontology_key": ["medical_ontology_v1"]
}

Notes

To cognify data in datasets not owned by the user and for which the current user has write permission, the dataset_id must be used (when ENABLE_BACKEND_ACCESS_CONTROL is set to True).

Next Steps

After successful processing, use the search endpoints to query the generated knowledge graph for insights, relationships, and semantic search.

POST

api

cognify

Cognify

curl --request POST \
  --url https://api.cognee.ai/api/v1/cognify \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "datasets": [
    "main_dataset"
  ]
}
'

{}

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

datasets

string[] | null

datasetIds

string<uuid>[] | null

Example:

[]

runInBackground

boolean | null

default:false

graphModel

Graphmodel · object

Example:

{}

customPrompt

string | null

default:""

Custom prompt for entity extraction and graph generation

chunkSize

integer | null

Maximum tokens per chunk. Defaults to automatic model-based sizing.

Example:

4096

ontologyKey

string[] | null

Reference to one or more previously uploaded ontologies

Example:

[]

chunksPerBatch

integer | null

Number of chunks to process per task batch in Cognify (overrides default).

Example:

36

dataPerBatch

integer | null

default:20

Maximum number of data items to process concurrently within a dataset.

Example:

20

Response

Successful Response

The response is of type Response Cognify Api V1 Cognify Post · object.

MemifyEnrichment pipeline in Cognee, can work with already built graphs. If no data is provided existing knowledge graph will be used as data, custom data can also be provided instead which can be processed with provided extraction and enrichment tasks. Provided tasks and data will be arranged to run the Cognee pipeline and execute graph enrichment/creation. ## Request Parameters - **extractionTasks** Optional[List[str]]: List of available Cognee Tasks to execute for graph/data extraction. - **enrichmentTasks** Optional[List[str]]: List of available Cognee Tasks to handle enrichment of provided graph/data from extraction tasks. - **data** Optional[List[str]]: The data to ingest. Can be any text data when custom extraction and enrichment tasks are used. Data provided here will be forwarded to the first extraction task in the pipeline as input. If no data is provided the whole graph (or subgraph if node_name/node_type is specified) will be forwarded - **dataset_name** (Optional[str]): Name of the datasets to memify - **dataset_id** (Optional[UUID]): List of UUIDs of an already existing dataset - **node_name** (Optional[List[str]]): Filter graph to specific named entities (for targeted search). Used when no data is provided. - **run_in_background** (Optional[bool]): Whether to execute processing asynchronously. Defaults to False (blocking). Either datasetName or datasetId must be provided. ## Response Returns information about the add operation containing: - Status of the operation - Details about the processed data - Any relevant metadata from the ingestion process ## Error Codes - **400 Bad Request**: Neither datasetId nor datasetName provided - **409 Conflict**: Error during memify operation - **403 Forbidden**: User doesn't have permission to use dataset ## Notes - To memify datasets not owned by the user, use dataset_id (when ENABLE_BACKEND_ACCESS_CONTROL is set to True) - datasetId value can only be the UUID of an already existing dataset

⌘I

Cognify

curl --request POST \
  --url https://api.cognee.ai/api/v1/cognify \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "datasets": [
    "main_dataset"
  ]
}
'

{}

Documentation Index

Authorizations

Body

Response