cognee.update()

async def update(
    data_id: UUID,
    data: Union[BinaryIO, list[BinaryIO], str, list[str]],
    dataset_id: UUID,
    user: User = None,
    node_set: Optional[List[str]] = None,
    vector_db_config: dict = None,
    graph_db_config: dict = None,
    preferred_loaders: dict[str, dict[str, Any]] = None,
    incremental_loading: bool = True,
) -> Union[Dict[str, PipelineRunInfo], List[PipelineRunInfo]]

Description

Update existing data in Cognee. Supported Input Types:

Text strings: Direct text content (str) - any string not starting with ”/” or “file://”
File paths: Local file paths as strings in these formats:
- Absolute paths: “/path/to/document.pdf”
- File URLs: “file:///path/to/document.pdf” or “file://relative/path.txt”
- S3 paths: “s3://bucket-name/path/to/file.pdf”
Binary file objects: File handles/streams (BinaryIO)
Lists: Multiple files or text strings in a single call

Supported File Formats:

Text files (.txt, .md, .csv)
PDFs (.pdf)
Images (.png, .jpg, .jpeg) - extracted via OCR/vision models
Audio files (.mp3, .wav) - transcribed to text
Code files (.py, .js, .ts, etc.) - parsed for structure and content
Office documents (.docx, .pptx) Workflow:

Data Resolution: Resolves file paths and validates accessibility
Content Extraction: Extracts text content from various file formats
Dataset Storage: Stores processed content in the specified dataset
Metadata Tracking: Records file metadata, timestamps, and user permissions
Permission Assignment: Grants user read/write/delete/share permissions on dataset

Args: data_id: UUID of existing data to update data: The latest version of the data. Can be:

Single text string: “Your text content here”
Absolute file path: “/path/to/document.pdf”
File URL: “file:///absolute/path/to/document.pdf” or “file://relative/path.txt”
S3 path: “s3://my-bucket/documents/file.pdf”
List of mixed types: [“text content”, “/path/file.pdf”, “file://doc.txt”, file_handle]
Binary file object: open(“file.txt”, “rb”) dataset_name: Name of the dataset to store data in. Defaults to “main_dataset”. Create separate datasets to organize different knowledge domains. user: User object for authentication and permissions. Uses default user if None. Default user: “default_user@example.com” (created automatically on first use). Users can only access datasets they have permissions for. node_set: Optional list of node identifiers for graph organization and access control. Used for grouping related data points in the knowledge graph. vector_db_config: Optional configuration for vector database (for custom setups). graph_db_config: Optional configuration for graph database (for custom setups). dataset_id: Optional specific dataset UUID to use instead of dataset_name.

Returns: PipelineRunInfo: Information about the ingestion pipeline execution including:

Pipeline run ID for tracking
Dataset ID where data was stored
Processing status and any errors
Execution timestamps and metadata

Parameters

data_id

UUID

required

UUID of the data item to update.

data

Union[BinaryIO, list[BinaryIO], str, list[str]]

required

New data to replace the existing data.

dataset_id

UUID

required

UUID of the dataset containing the data.

user

User

default:"None"

User performing the operation.

node_set

Optional[List[str]]

default:"None"

List of node set names to associate.

vector_db_config

dict

default:"None"

Override vector database configuration.

graph_db_config

dict

default:"None"

Override graph database configuration.

preferred_loaders

dict[str, dict[str, Any]]

default:"None"

Custom loader configuration.

incremental_loading

bool

default:"True"

If true, skip unchanged data.

Returns

Union[Dict[str, PipelineRunInfo], List[PipelineRunInfo]]

How it works

update() performs a full delete-then-re-add cycle for the specified data item:

Delete — removes the old data item’s graph nodes, edges, and vector embeddings. Entities that are also referenced by other documents in the same dataset are preserved (shared nodes are not deleted).
Add — ingests the new version of the data into the dataset.
Cognify — re-runs the knowledge graph construction pipeline on the dataset, extracting entities and relationships from the updated content.

After update() completes, all graph nodes and relationships derived from the old content are removed and replaced with ones extracted from the new content. Relationships that no longer appear in the updated document are gone; new relationships found in the updated content are added.

Examples

import cognee

# Update a data item with new content
await cognee.update(
    data_id=uuid_of_item,
    data="Updated content for this document.",
    dataset_id=uuid_of_dataset,
)

Further details

What incremental_loading skips

incremental_loading=True (the default) tells the cognify step to skip data items that have already been processed successfully. Because update() deletes the old data item and adds a new one, the new item has no prior processing record and is always re-cognified. Unchanged documents already in the dataset retain their completed status and are skipped, so only the updated document is re-processed.Set incremental_loading=False to force a full re-cognify of every document in the dataset — useful when you have changed your graph model or extraction prompt and want all content reprocessed.

# Force re-processing of every document in the dataset
await cognee.update(
    data_id=uuid_of_item,
    data="Updated content.",
    dataset_id=uuid_of_dataset,
    incremental_loading=False,
)

Shared nodes and relationship cleanup

When the old data item is deleted, Cognee checks every node and edge it owned. Nodes that are also referenced by other documents in the same dataset are preserved — deleting one document does not break the rest of the graph. Only nodes and edges unique to the deleted data item are removed from both the graph database and the vector store.After cognify re-runs on the updated content, new entities and relationships are extracted. Any relationship that existed in the old version but is absent from the new version will not be re-created, so the graph always reflects the current state of your data.

Observing graph changes

Use graph visualization or a search query to inspect the knowledge graph before and after an update:

import cognee
from cognee import SearchType

# Query the graph after updating
results = await cognee.search(
    "What entities are related to the updated topic?",
    query_type=SearchType.GRAPH_COMPLETION,
)
print(results)

For a visual diff, launch the built-in graph explorer via cognee-cli -ui and compare the graph snapshots around the update.

Reference

​cognee.update()

​Description

​Parameters

​Returns

​How it works

​Examples

​Further details

cognee.update()

Description

Parameters

Returns

How it works

Examples

Further details