Skillquality 0.45

Regression test LLM apps and agents with metrics, traces, and eval suites using DeepEval

Run repeatable eval suites against prompts, RAG pipelines, and agents so regressions surface before release.

Price
free
Protocol
skill
Verified
no

What it does

Regression test LLM apps and agents with metrics, traces, and eval suites using DeepEval

Run repeatable eval suites against prompts, RAG pipelines, and agents so regressions surface before release.

Prerequisites

Python or Node.js, API access to an LLM judge or compatible local models, CI optional

Installation

Use the upstream install or setup path that matches your environment:

  • pip install -U deepeval

Requirements and caveats from upstream:

  • Deepeval works with Python>=3.9+.
  • python

Basic usage or getting-started notes:

  • <a href="#-quickstart">Getting Started</a> |

  • DeepEval is a simple-to-use, open-source LLM evaluation framework, for evaluating large-language model systems. It is similar to Pytest but specialized for unit testing LLM apps. DeepEval incorporates the latest r...

  • 📐 Large variety of ready-to-use LLM eval metrics (all with explanations) powered by ANY LLM of your choice, statistical methods, or NLP models that run locally on your machine covering all use cases:

  • Source: https://github.com/confident-ai/deepeval

  • Extracted from upstream docs: https://raw.githubusercontent.com/confident-ai/deepeval/HEAD/README.md

Documentation

Source

Capabilities

skillsource-agentskillexchangeskill-regression-test-llm-apps-and-agents-with-metrics-traces-and-eval-suites-using-deepevaltopic-agent-skillstopic-ai-agentstopic-ai-toolstopic-awesome-listtopic-claude-codetopic-codextopic-cursortopic-llmtopic-mcptopic-npx-skillstopic-openclawtopic-skills-catalog

Install

Quality

0.45/ 1.00

deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,420 chars)

Provenance

Indexed fromgithub
Enriched2026-05-18 19:12:03Z · deterministic:skill-github:v1 · v1
First seen2026-05-18
Last seen2026-05-18

Agent access