Skip to main content
Experimental. Cognee-RS is a Rust port of the Python cognee SDK, built for on-device AI memory (phone, smartwatch, embedded) and aiming for behavioral parity with Python cognee. The source lives at github.com/topoteretes/cognee-rs. There is no pip install / hosted step — you build the CLI from source with Cargo.
Cognee-RS exposes the same four-verb memory API as Python cognee — remember, recall, improve, forget — composing the add → cognify → search pipeline. The fastest way in is the cognee-cli binary.

Prerequisites

  • A Rust toolchain (edition 2024, MSRV 1.89) — install via rustup.
  • An OpenAI-compatible LLM API key. The CLI hard-fails at startup if no LLM key is configured. A local endpoint (e.g. Ollama) works too — you still pass a dummy key.

Build the CLI

git clone https://github.com/topoteretes/cognee-rs
cd cognee-rs

cargo build --release -p cognee-cli   # -> target/release/cognee-cli

# put it on your PATH for the snippets below
export PATH="$PWD/target/release:$PATH"
The default feature set wires a fully embedded, no-external-service stack: SQLite (relational), Ladybug (graph), and an in-memory brute-force vector index. Nothing else to install.

Configure the LLM

A .env file in the working directory is auto-loaded. The only required setting is the LLM API key:
export LLM_API_KEY="sk-..."        # canonical name (OPENAI_TOKEN is an accepted alias)
# optional overrides:
export LLM_MODEL="gpt-4o-mini"     # the compiled default is openai/gpt-5-mini
export LLM_ENDPOINT="https://..."  # alias: OPENAI_URL; empty -> OpenAI's API
Embeddings need a key by default too. On desktop/server the default embedding provider is OpenAI (text-embedding-3-small), reusing LLM_API_KEY / LLM_ENDPOINT — so setting LLM_API_KEY alone is enough for the full pipeline. To run embeddings fully local, set EMBEDDING_PROVIDER=onnx (or ollama).
ollama serve &
ollama pull llama3.2:3b

export OPENAI_URL=http://localhost:11434/v1
export OPENAI_TOKEN=not-needed      # dummy value still required — startup checks for a non-empty key
export OPENAI_MODEL=llama3.2:3b
export EMBEDDING_PROVIDER=ollama    # or onnx — otherwise embeddings still call OpenAI

Your first memory

# store, then ask — this is the whole loop
cognee-cli remember "Cognee turns raw data into a queryable knowledge graph."
cognee-cli recall   "what does cognee do?"
  • remember ingests the data, builds the knowledge graph, and runs a self-improvement pass (disable with --no-improve).
  • recall auto-routes the search type for you when --query-type is omitted.
The default vector index is in-memory and non-persistent — it lives only for the duration of one process. The relational and graph stores persist to disk, but for recall to retrieve vectors across separate CLI invocations, build with the pgvector feature and point VECTOR_DB_PROVIDER=pgvector at a Postgres instance, or drive remember + recall from a single long-lived process via one of the language bindings.

Lower-level pipeline

remember / recall wrap the explicit stages, which exist as separate subcommands for fine-grained control:
# 1. Ingest data into a dataset (defaults to "main_dataset")
cognee-cli add ./notes.txt "some inline text" -d my_dataset

# 2. Build the knowledge graph
cognee-cli cognify -d my_dataset

# 3. Query it (defaults --query-type to GRAPH_COMPLETION)
cognee-cli search "Alan Turing" -t GRAPH_COMPLETION -k 10 -d my_dataset
Run cognee-cli <command> --help for the full flag list.

Language bindings

The ergonomic Cognee class — new(settings)warm()add() / cognify() / search() / remember() — is exposed by the bindings, which keep the component graph (and its in-memory vector index) alive across calls in one process:
  • Python (PyO3): from cognee_py import Cognee
  • JavaScript/TypeScript (Neon): import { Cognee } from 'cognee-ts'
  • C (FFI): #include "cognee_sdk.h"

Next Steps

Cognee-RS on GitHub

Full README, architecture docs, and the crate-by-crate workspace breakdown.

Python Quickstart

The same remember / recall loop in the Python SDK.