{"id":"00ddc2a5-d3ee-49f6-8391-27be373dc5c0","shortId":"UkW8tt","kind":"skill","title":"Regression test LLM apps and agents with metrics, traces, and eval suites using DeepEval","tagline":"Run repeatable eval suites against prompts, RAG pipelines, and agents so regressions surface before release.","description":"# Regression test LLM apps and agents with metrics, traces, and eval suites using DeepEval\n\nRun repeatable eval suites against prompts, RAG pipelines, and agents so regressions surface before release.\n\n## Prerequisites\n\nPython or Node.js, API access to an LLM judge or compatible local models, CI optional\n\n## Installation\n\nUse the upstream install or setup path that matches your environment:\n- pip install -U deepeval\n\nRequirements and caveats from upstream:\n- Deepeval works with **Python>=3.9+**.\n- python\n\nBasic usage or getting-started notes:\n- <a href=\"#-quickstart\">Getting Started</a> |\n- **DeepEval** is a simple-to-use, open-source LLM evaluation framework, for evaluating large-language model systems. It is similar to Pytest but specialized for unit testing LLM apps. DeepEval incorporates the latest r...\n- 📐 Large variety of ready-to-use LLM eval metrics (all with explanations) powered by **ANY** LLM of your choice, statistical methods, or NLP models that run **locally on your machine** covering all use cases:\n\n- Source: https://github.com/confident-ai/deepeval\n- Extracted from upstream docs: https://raw.githubusercontent.com/confident-ai/deepeval/HEAD/README.md\n\n## Documentation\n\n- https://docs.confident-ai.com/docs/getting-started\n\n## Source\n\n- [Agent Skill Exchange](https://agentskillexchange.com/skills/regression-test-llm-apps-and-agents-with-metrics-traces-and-eval-suites-using-deepeval/)","tags":["regression","test","llm","apps","and","agents","with","metrics","traces","eval","suites","using"],"capabilities":["skill","source-agentskillexchange","skill-regression-test-llm-apps-and-agents-with-metrics-traces-and-eval-suites-using-deepeval","topic-agent-skills","topic-ai-agents","topic-ai-tools","topic-awesome-list","topic-claude-code","topic-codex","topic-cursor","topic-llm","topic-mcp","topic-npx-skills","topic-openclaw","topic-skills-catalog"],"categories":["skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/agentskillexchange/skills/regression-test-llm-apps-and-agents-with-metrics-traces-and-eval-suites-using-deepeval","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add agentskillexchange/skills","source_repo":"https://github.com/agentskillexchange/skills","install_from":"skills.sh"}},"qualityScore":"0.454","qualityRationale":"deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,420 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T19:12:03.832Z","embedding":null,"createdAt":"2026-05-18T13:18:48.729Z","updatedAt":"2026-05-18T19:12:03.832Z","lastSeenAt":"2026-05-18T19:12:03.832Z","tsv":"'/confident-ai/deepeval':186 '/confident-ai/deepeval/head/readme.md':193 '/docs/getting-started':197 '/skills/regression-test-llm-apps-and-agents-with-metrics-traces-and-eval-suites-using-deepeval/)':204 '3.9':100 'access':64 'agent':6,24,35,53,199 'agentskillexchange.com':203 'agentskillexchange.com/skills/regression-test-llm-apps-and-agents-with-metrics-traces-and-eval-suites-using-deepeval/)':202 'api':63 'app':4,33,142 'basic':102 'case':182 'caveat':93 'choic':167 'ci':73 'compat':70 'cover':179 'deepev':14,43,90,96,111,143 'doc':190 'docs.confident-ai.com':196 'docs.confident-ai.com/docs/getting-started':195 'document':194 'environ':86 'eval':11,17,40,46,156 'evalu':122,125 'exchang':201 'explan':160 'extract':187 'framework':123 'get':106,109 'getting-start':105 'github.com':185 'github.com/confident-ai/deepeval':184 'incorpor':144 'instal':75,79,88 'judg':68 'languag':128 'larg':127,148 'large-languag':126 'latest':146 'llm':3,32,67,121,141,155,164 'local':71,175 'machin':178 'match':84 'method':169 'metric':8,37,157 'model':72,129,172 'nlp':171 'node.js':62 'note':108 'open':119 'open-sourc':118 'option':74 'path':82 'pip':87 'pipelin':22,51 'power':161 'prerequisit':59 'prompt':20,49 'pytest':135 'python':60,99,101 'r':147 'rag':21,50 'raw.githubusercontent.com':192 'raw.githubusercontent.com/confident-ai/deepeval/head/readme.md':191 'readi':152 'ready-to-us':151 'regress':1,26,30,55 'releas':29,58 'repeat':16,45 'requir':91 'run':15,44,174 'setup':81 'similar':133 'simpl':115 'simple-to-us':114 'skill':200 'skill-regression-test-llm-apps-and-agents-with-metrics-traces-and-eval-suites-using-deepeval' 'sourc':120,183,198 'source-agentskillexchange' 'special':137 'start':107,110 'statist':168 'suit':12,18,41,47 'surfac':27,56 'system':130 'test':2,31,140 'topic-agent-skills' 'topic-ai-agents' 'topic-ai-tools' 'topic-awesome-list' 'topic-claude-code' 'topic-codex' 'topic-cursor' 'topic-llm' 'topic-mcp' 'topic-npx-skills' 'topic-openclaw' 'topic-skills-catalog' 'trace':9,38 'u':89 'unit':139 'upstream':78,95,189 'usag':103 'use':13,42,76,117,154,181 'varieti':149 'work':97","prices":[{"id":"ff2cbbe5-7946-46e4-a8bd-0016b82a2741","listingId":"00ddc2a5-d3ee-49f6-8391-27be373dc5c0","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"agentskillexchange","category":"skills","install_from":"skills.sh"},"createdAt":"2026-05-18T13:18:48.729Z"}],"sources":[{"listingId":"00ddc2a5-d3ee-49f6-8391-27be373dc5c0","source":"github","sourceId":"agentskillexchange/skills/regression-test-llm-apps-and-agents-with-metrics-traces-and-eval-suites-using-deepeval","sourceUrl":"https://github.com/agentskillexchange/skills/tree/main/skills/regression-test-llm-apps-and-agents-with-metrics-traces-and-eval-suites-using-deepeval","isPrimary":false,"firstSeenAt":"2026-05-18T13:18:48.729Z","lastSeenAt":"2026-05-18T19:12:03.832Z"}],"details":{"listingId":"00ddc2a5-d3ee-49f6-8391-27be373dc5c0","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"agentskillexchange","slug":"regression-test-llm-apps-and-agents-with-metrics-traces-and-eval-suites-using-deepeval","github":{"repo":"agentskillexchange/skills","stars":8,"topics":["agent-skills","ai-agents","ai-tools","awesome-list","claude-code","codex","cursor","llm","mcp","npx-skills","openclaw","skills-catalog"],"license":"mit","html_url":"https://github.com/agentskillexchange/skills","pushed_at":"2026-05-18T19:02:17Z","description":"The open catalog of AI agent skills — 2,000+ security-scanned skills for Claude Code, Cursor, Codex, and more.","skill_md_sha":"132fef737bb2f5127652805141d69312594431e9","skill_md_path":"skills/regression-test-llm-apps-and-agents-with-metrics-traces-and-eval-suites-using-deepeval/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/agentskillexchange/skills/tree/main/skills/regression-test-llm-apps-and-agents-with-metrics-traces-and-eval-suites-using-deepeval"},"layout":"multi","source":"github","category":"skills","frontmatter":{"name":"Regression test LLM apps and agents with metrics, traces, and eval suites using DeepEval","description":"Run repeatable eval suites against prompts, RAG pipelines, and agents so regressions surface before release."},"skills_sh_url":"https://skills.sh/agentskillexchange/skills/regression-test-llm-apps-and-agents-with-metrics-traces-and-eval-suites-using-deepeval"},"updatedAt":"2026-05-18T19:12:03.832Z"}}