Why DeepEval?
DeepEval is an open-source evaluation framework that provides ready-made metrics (both traditional and LLM-as-a-judge) for LLM pipelines. Compared with hand-rolled evaluation scripts, DeepEval lets you:- Track Contextual Relevancy, Contextual Precision/Recall, Coverage and more.
- Swap between automatic string-based metrics (EM/F1) and LLM-based scoring with a single flag.
- Re-use the same metrics across different projects and datasets.
DeepEval stores no data – it simply runs metrics locally or via your preferred LLM. That makes it a perfect drop-in evaluator for Cognee’s pipelines.
DeepEval inside Cognee
Cognee ships with a dedicatedDeepEvalAdapter
. When enabled, every answer produced by your pipeline is scored with the metrics you choose.
- Transforms Cognee’s
Answer
objects into DeepEval’sLLMTestCase
format. - Runs the selected metrics.
- Stores the raw scores alongside rationales so they appear in Cognee’s HTML dashboard.
Quick Start
- Install Cognee (DeepEval is declared in
pyproject.toml
so you automatically get the dependency). - Set your LLM API key so DeepEval can run LLM-based metrics:
export LLM_API_KEY=...
).
3. (Optional) Configure the model DeepEval should call:
- Run a standard Cognee pipeline (add → cognify → search). The evaluation executor will automatically invoke DeepEval.
Useful Links
- DeepEval integration guide – deepeval.com » Cognee
- DeepEval docs – deepeval.com/docs
Join the conversation on Discord and let us know how DeepEval works for you!