Why DeepEval?
DeepEval is an open-source evaluation framework that provides ready-made metrics (both traditional and LLM-as-a-judge) for LLM pipelines. Compared with hand-rolled evaluation scripts, DeepEval lets you:
- Track Contextual Relevancy, Contextual Precision/Recall, Coverage and more.
- Swap between automatic string-based metrics (EM/F1) and LLM-based scoring with a single flag.
- Re-use the same metrics across different projects and datasets.
DeepEval stores no data – it simply runs metrics locally or via your preferred LLM. That makes it a perfect drop-in evaluator for Cognee’s pipelines.
DeepEval inside Cognee
Cognee ships with a dedicated DeepEvalAdapter
. When enabled, every answer produced by your pipeline is scored with the metrics you choose. Below is an example from our evaluation framework.
evaluating_answers: bool = True
evaluating_contexts: bool = True
evaluation_engine: str = "DeepEval" # Options: 'DeepEval', 'DirectLLM'
evaluation_metrics: list[str] = [
"correctness", # LLM-based correctness
"EM", # Exact-Match
"f1", # Token-level precision / recall
]
deepeval_model: str = "gpt-4o-mini" # Any OpenAI-compatible LLM
Behind the scenes the adapter:
- Transforms Cognee’s
Answer
objects into DeepEval’sLLMTestCase
format. - Runs the selected metrics.
- Stores the raw scores alongside rationales so they appear in Cognee’s HTML dashboard.
Quick Start
- Install Cognee (DeepEval is declared in
pyproject.toml
so you automatically get the dependency). - Set your LLM API key so DeepEval can run LLM-based metrics:
import os
os.environ["LLM_API_KEY"] = "<YOUR_OPENAI_API_KEY>"
You can also export the variable in your shell (export LLM_API_KEY=...
).
3. (Optional) Configure the model DeepEval should call:
export DEEPEVAL_MODEL=gpt-4o
- Run a standard Cognee pipeline (add → cognify → search). The evaluation executor will automatically invoke DeepEval.
Useful Links
- DeepEval integration guide – deepeval.com » Cognee
- DeepEval docs – deepeval.com/docs
Join the conversation on Discord and let us know how DeepEval works for you!