Adding a New Graph Database to cognee
This guide describes how to integrate a new graph database engine into cognee, following the same pattern used for existing engines (e.g. Kuzu, Neo4j, FalkorDB, NetworkX).
Overview
Within cognee, graph databases are interchangeable thanks to a shared GraphDBInterface. To support a new graph engine, you must:
- Implement the adapter in
cognee/infrastructure/databases/graph/<engine_name>/adapter.py
. - Add a branch in
cognee/infrastructure/databases/graph/get_graph_engine.py
to return your new adapter. - Create a test script that confirms your new adapter’s basic functionality in
cognee/tests/<engine_name>.py
- Create a test workflow in
.github/workflows/test_<engine_name>.yml
for CI. - Update
pyproject.toml
Below are the recommended steps in more detail.
1. Implement the Adapter
File: cognee/infrastructure/databases/graph/engine_name/adapter.py
Your adapter must subclass GraphDBInterface
, implementing all required CRUD and utility methods (e.g., add_node
, add_edge
, extract_node
, etc.). Here is a sample skeleton with placeholders:
"""Adapter for <engine_name> graph database."""
import json
import asyncio
from typing import Dict, Any, List, Optional, Tuple
from contextlib import asynccontextmanager
from concurrent.futures import ThreadPoolExecutor
from cognee.shared.logging_utils import get_logger
from cognee.infrastructure.databases.graph.graph_db_interface import GraphDBInterface
logger = get_logger()
class <EngineName>Adapter(GraphDBInterface):
"""Adapter for <engine_name> graph database operations."""
def __init__(
self,
db_url: str = "localhost",
db_port: int = 1234,
username: Optional[str] = None,
password: Optional[str] = None,
):
self.db_url = db_url
self.db_port = db_port
self.username = username
self.password = password
self.connection = None
self.executor = ThreadPoolExecutor()
self._initialize_connection()
def _initialize_connection(self) -> None:
"""Establish connection to <engine_name>."""
try:
# Connect using your <engine_name> driver
logger.debug(f"Successfully connected to <engine_name> at {self.db_url}:{self.db_port}")
except Exception as e:
logger.error(f"Failed to initialize <engine_name> connection: {e}")
raise
@asynccontextmanager
async def get_session(self):
"""Context manager for a session (if applicable)."""
try:
yield self.connection
finally:
pass
async def query(self, query: str, params: Optional[Dict[str, Any]] = None) -> List[Tuple]:
"""Execute an async query.
If your graph database library provides an async SDK, call it directly here.
If it only provides a synchronous client, you can run the call via
`loop.run_in_executor()` or a similar technique to avoid blocking the event loop.
"""
loop = asyncio.get_running_loop()
params = params or {}
def blocking_query():
try:
# Example usage with your driver
# cursor = self.connection.execute(query, params)
# results = cursor.fetchall()
return []
except Exception as e:
logger.error(f"<engine_name> query execution failed: {e}")
raise
return await loop.run_in_executor(self.executor, blocking_query)
# -- Example: Add a node
async def add_node(self, node_data: Any) -> None:
"""Add a single node to <engine_name>."""
# Implement logic:
# 1. Extract relevant fields (id, text, type, properties, etc.).
# 2. Construct a CREATE query.
# 3. Call self.query(query_str, params).
pass
# -- Example: Retrieve all nodes/edges
async def get_graph_data(self) -> Tuple[List, List]:
"""Retrieve all nodes and edges from <engine_name>."""
# Return (nodes, edges) where each node is `(node_id, properties_dict)`
# and each edge is `(source_id, target_id, relationship_label, properties_dict)`.
return ([], [])
# -- Additional methods (delete_node, add_edge, etc.) ...
Keep the method signatures consistent with GraphDBInterface
. Reference the KuzuAdapter or the Neo4jAdapter for a more comprehensive example.
2. Register Your Engine in get_graph_engine.py
File: cognee/infrastructure/databases/graph/get_graph_engine.py
Inside the create_graph_engine
function, insert an elif
branch for your new provider. For example:
@lru_cache
def create_graph_engine(
graph_database_provider,
graph_database_url,
graph_database_username,
graph_database_password,
graph_database_port,
graph_file_path,
):
"""Factory function to create the appropriate graph client based on the graph type."""
if graph_database_provider == "neo4j":
...
elif graph_database_provider == "falkordb":
...
elif graph_database_provider == "kuzu":
...
elif graph_database_provider == "<engine_name>":
# Check required credentials/params
if not (graph_database_url and graph_database_port):
raise EnvironmentError(f"Missing required <engine_name> credentials.")
from .<engine_name>.adapter import <EngineName>Adapter
return <EngineName>Adapter(
db_url=graph_database_url,
db_port=graph_database_port,
username=graph_database_username,
password=graph_database_password,
)
# Fallback to NetworkX if none matched
from .networkx.adapter import NetworkXAdapter
return NetworkXAdapter(filename=graph_file_path)
Key points:
- Match the
graph_database_provider
string with however you plan to configure cognee (e.g. via.env
orcognee.config
). - Modify the adapter’s initialization parameters to suit your new database.
3. Test with a Dedicated Script
File: cognee/tests/test_engine_name.py
Create a script that loads cognee, configures it to use your new <engine_name>
provider, and runs basic usage checks (e.g., adding data, searching, pruning). For example:
import os
import shutil
import cognee
import pathlib
from cognee.shared.logging_utils import get_logger
logger = get_logger()
async def main():
data_directory = pathlib.Path(__file__).parent / ".data_storage" / "test_<engine_name>"
system_directory = pathlib.Path(__file__).parent / ".cognee_system" / "test_<engine_name>"
try:
cognee.config.set_graph_database_provider("<engine_name>")
cognee.config.data_root_directory(str(data_directory))
cognee.config.system_root_directory(str(system_directory))
# Clear old data/system
await cognee.prune.prune_data()
await cognee.prune.prune_system(metadata=True)
# Add something to the new DB
dataset_name = "example_data"
text_data = "Hello from <engine_name> integration!"
await cognee.add([text_data], dataset_name)
# Cognify and search (Optional)
await cognee.cognify([dataset_name])
# Confirm data is in the DB
from cognee.infrastructure.databases.graph import get_graph_engine
graph_engine = await get_graph_engine()
nodes, edges = await graph_engine.get_graph_data()
logger.info(f"Found {len(nodes)} nodes and {len(edges)} edges in <engine_name>.")
# Prune everything
await cognee.prune.prune_data()
await cognee.prune.prune_system(metadata=True)
nodes, edges = await graph_engine.get_graph_data()
assert len(nodes) == 0 and len(edges) == 0, "<engine_name> is not empty after pruning!"
finally:
# Cleanup the directories even if test fails
if data_directory.exists():
shutil.rmtree(data_directory)
if system_directory.exists():
shutil.rmtree(system_directory)
if __name__ == "__main__":
import asyncio
asyncio.run(main())
This script ensures a basic end-to-end test, verifying:
- Your new adapter can be selected by cognee.
- Data can be added to the DB.
- The DB is empty after a prune operation.
4. Create a Test Workflow
File: .github/workflows/test_engine_name.yml
Create a GitHub Actions workflow to run your integration tests. This ensures any pull requests that modify your new engine (or the shared graph code) will be tested automatically. Below is a template:
name: test | <engine_name>
on:
workflow_dispatch:
pull_request:
types: [labeled, synchronize]
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
env:
RUNTIME__LOG_LEVEL: ERROR
jobs:
run_<engine_name>_integration_test:
name: test
runs-on: ubuntu-22.04
defaults:
run:
shell: bash
steps:
- name: Check out
uses: actions/checkout@master
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: '3.11.x'
- name: Install Poetry
uses: snok/install-poetry@v1.4.1
with:
virtualenvs-create: true
virtualenvs-in-project: true
installer-parallel: true
- name: Install dependencies
# If your pyproject.toml has an extra named '<engine_name>', use:
run: poetry install -E <engine_name> --no-interaction
- name: Run <engine_name> tests
env:
ENV: 'dev'
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
run: poetry run python ./cognee/tests/test_<engine_name>.py
Tips:
- Rename
<engine_name>
appropriately. - Ensure your
pyproject.toml
has an extras entry for any new dependencies.
5. Poetry Extras
If your new graph engine requires a special Python client or system libraries, update:
pyproject.toml
:
[tool.poetry.dependencies]
python = "^3.11"
[tool.poetry.extras]
<engine_name> = [
"my-graph-db-driver==<version>"
]
7. Final Checklist
- Implement your
<EngineName>Adapter
incognee/infrastructure/databases/graph/<engine_name>/adapter.py
. - Add a new branch in
get_graph_engine.py
to return<EngineName>Adapter
. - Create a test script
test_<engine_name>.py
. - Create a test workflow:
.github/workflows/test_<engine_name>.yml
. - Add required dependencies to
pyproject.toml
extras. - Open a PR to verify that your new integration passes CI.
That’s all! This approach keeps cognee’s architecture flexible, allowing you to swap in any graph DB provider with minimal changes to the core codebase. If you need more advanced functionality (e.g., custom indexes, triggers, or advanced queries), simply implement them in your adapter class following the same patterns.