Skip to Content
ContributingAdding ProvidersGraph DbNew Graph Database Integration

Adding a New Graph Database to cognee

This guide describes how to integrate a new graph database engine into cognee, following the same pattern used for existing engines (e.g. Kuzu, Neo4j, FalkorDB, NetworkX).

Overview

Within cognee, graph databases are interchangeable thanks to a shared GraphDBInterface. To support a new graph engine, you must:

  1. Implement the adapter in cognee/infrastructure/databases/graph/<engine_name>/adapter.py.
  2. Add a branch in cognee/infrastructure/databases/graph/get_graph_engine.py to return your new adapter.
  3. Create a test script that confirms your new adapter’s basic functionality in cognee/tests/<engine_name>.py
  4. Create a test workflow in .github/workflows/test_<engine_name>.yml for CI.
  5. Update pyproject.toml

Below are the recommended steps in more detail.


1. Implement the Adapter

File: cognee/infrastructure/databases/graph/engine_name/adapter.py

Your adapter must subclass GraphDBInterface, implementing all required CRUD and utility methods (e.g., add_node, add_edge, extract_node, etc.). Here is a sample skeleton with placeholders:

"""Adapter for <engine_name> graph database.""" import json import asyncio from typing import Dict, Any, List, Optional, Tuple from contextlib import asynccontextmanager from concurrent.futures import ThreadPoolExecutor from cognee.shared.logging_utils import get_logger from cognee.infrastructure.databases.graph.graph_db_interface import GraphDBInterface logger = get_logger() class <EngineName>Adapter(GraphDBInterface): """Adapter for <engine_name> graph database operations.""" def __init__( self, db_url: str = "localhost", db_port: int = 1234, username: Optional[str] = None, password: Optional[str] = None, ): self.db_url = db_url self.db_port = db_port self.username = username self.password = password self.connection = None self.executor = ThreadPoolExecutor() self._initialize_connection() def _initialize_connection(self) -> None: """Establish connection to <engine_name>.""" try: # Connect using your <engine_name> driver logger.debug(f"Successfully connected to <engine_name> at {self.db_url}:{self.db_port}") except Exception as e: logger.error(f"Failed to initialize <engine_name> connection: {e}") raise @asynccontextmanager async def get_session(self): """Context manager for a session (if applicable).""" try: yield self.connection finally: pass async def query(self, query: str, params: Optional[Dict[str, Any]] = None) -> List[Tuple]: """Execute an async query. If your graph database library provides an async SDK, call it directly here. If it only provides a synchronous client, you can run the call via `loop.run_in_executor()` or a similar technique to avoid blocking the event loop. """ loop = asyncio.get_running_loop() params = params or {} def blocking_query(): try: # Example usage with your driver # cursor = self.connection.execute(query, params) # results = cursor.fetchall() return [] except Exception as e: logger.error(f"<engine_name> query execution failed: {e}") raise return await loop.run_in_executor(self.executor, blocking_query) # -- Example: Add a node async def add_node(self, node_data: Any) -> None: """Add a single node to <engine_name>.""" # Implement logic: # 1. Extract relevant fields (id, text, type, properties, etc.). # 2. Construct a CREATE query. # 3. Call self.query(query_str, params). pass # -- Example: Retrieve all nodes/edges async def get_graph_data(self) -> Tuple[List, List]: """Retrieve all nodes and edges from <engine_name>.""" # Return (nodes, edges) where each node is `(node_id, properties_dict)` # and each edge is `(source_id, target_id, relationship_label, properties_dict)`. return ([], []) # -- Additional methods (delete_node, add_edge, etc.) ...

Keep the method signatures consistent with GraphDBInterface. Reference the KuzuAdapter or the Neo4jAdapter for a more comprehensive example.

2. Register Your Engine in get_graph_engine.py

File: cognee/infrastructure/databases/graph/get_graph_engine.py

Inside the create_graph_engine function, insert an elif branch for your new provider. For example:

@lru_cache def create_graph_engine( graph_database_provider, graph_database_url, graph_database_username, graph_database_password, graph_database_port, graph_file_path, ): """Factory function to create the appropriate graph client based on the graph type.""" if graph_database_provider == "neo4j": ... elif graph_database_provider == "falkordb": ... elif graph_database_provider == "kuzu": ... elif graph_database_provider == "<engine_name>": # Check required credentials/params if not (graph_database_url and graph_database_port): raise EnvironmentError(f"Missing required <engine_name> credentials.") from .<engine_name>.adapter import <EngineName>Adapter return <EngineName>Adapter( db_url=graph_database_url, db_port=graph_database_port, username=graph_database_username, password=graph_database_password, ) # Fallback to NetworkX if none matched from .networkx.adapter import NetworkXAdapter return NetworkXAdapter(filename=graph_file_path)

Key points:

  1. Match the graph_database_provider string with however you plan to configure cognee (e.g. via .env or cognee.config).
  2. Modify the adapter’s initialization parameters to suit your new database.

3. Test with a Dedicated Script

File: cognee/tests/test_engine_name.py

Create a script that loads cognee, configures it to use your new <engine_name> provider, and runs basic usage checks (e.g., adding data, searching, pruning). For example:

import os import shutil import cognee import pathlib from cognee.shared.logging_utils import get_logger logger = get_logger() async def main(): data_directory = pathlib.Path(__file__).parent / ".data_storage" / "test_<engine_name>" system_directory = pathlib.Path(__file__).parent / ".cognee_system" / "test_<engine_name>" try: cognee.config.set_graph_database_provider("<engine_name>") cognee.config.data_root_directory(str(data_directory)) cognee.config.system_root_directory(str(system_directory)) # Clear old data/system await cognee.prune.prune_data() await cognee.prune.prune_system(metadata=True) # Add something to the new DB dataset_name = "example_data" text_data = "Hello from <engine_name> integration!" await cognee.add([text_data], dataset_name) # Cognify and search (Optional) await cognee.cognify([dataset_name]) # Confirm data is in the DB from cognee.infrastructure.databases.graph import get_graph_engine graph_engine = await get_graph_engine() nodes, edges = await graph_engine.get_graph_data() logger.info(f"Found {len(nodes)} nodes and {len(edges)} edges in <engine_name>.") # Prune everything await cognee.prune.prune_data() await cognee.prune.prune_system(metadata=True) nodes, edges = await graph_engine.get_graph_data() assert len(nodes) == 0 and len(edges) == 0, "<engine_name> is not empty after pruning!" finally: # Cleanup the directories even if test fails if data_directory.exists(): shutil.rmtree(data_directory) if system_directory.exists(): shutil.rmtree(system_directory) if __name__ == "__main__": import asyncio asyncio.run(main())

This script ensures a basic end-to-end test, verifying:

  1. Your new adapter can be selected by cognee.
  2. Data can be added to the DB.
  3. The DB is empty after a prune operation.

4. Create a Test Workflow

File: .github/workflows/test_engine_name.yml

Create a GitHub Actions workflow to run your integration tests. This ensures any pull requests that modify your new engine (or the shared graph code) will be tested automatically. Below is a template:

name: test | <engine_name> on: workflow_dispatch: pull_request: types: [labeled, synchronize] concurrency: group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }} cancel-in-progress: true env: RUNTIME__LOG_LEVEL: ERROR jobs: run_<engine_name>_integration_test: name: test runs-on: ubuntu-22.04 defaults: run: shell: bash steps: - name: Check out uses: actions/checkout@master - name: Setup Python uses: actions/setup-python@v5 with: python-version: '3.11.x' - name: Install Poetry uses: snok/install-poetry@v1.4.1 with: virtualenvs-create: true virtualenvs-in-project: true installer-parallel: true - name: Install dependencies # If your pyproject.toml has an extra named '<engine_name>', use: run: poetry install -E <engine_name> --no-interaction - name: Run <engine_name> tests env: ENV: 'dev' LLM_MODEL: ${{ secrets.LLM_MODEL }} LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }} LLM_API_KEY: ${{ secrets.LLM_API_KEY }} LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }} EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }} EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }} EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }} EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }} run: poetry run python ./cognee/tests/test_<engine_name>.py

Tips:

  • Rename <engine_name> appropriately.
  • Ensure your pyproject.toml has an extras entry for any new dependencies.

5. Poetry Extras

If your new graph engine requires a special Python client or system libraries, update:

pyproject.toml:

[tool.poetry.dependencies] python = "^3.11" [tool.poetry.extras] <engine_name> = [ "my-graph-db-driver==<version>" ]

7. Final Checklist

  1. Implement your <EngineName>Adapter in cognee/infrastructure/databases/graph/<engine_name>/adapter.py.
  2. Add a new branch in get_graph_engine.py to return <EngineName>Adapter.
  3. Create a test script test_<engine_name>.py.
  4. Create a test workflow: .github/workflows/test_<engine_name>.yml.
  5. Add required dependencies to pyproject.toml extras.
  6. Open a PR to verify that your new integration passes CI.

That’s all! This approach keeps cognee’s architecture flexible, allowing you to swap in any graph DB provider with minimal changes to the core codebase. If you need more advanced functionality (e.g., custom indexes, triggers, or advanced queries), simply implement them in your adapter class following the same patterns.