Catch silent agent regressions by diffing outputs and tool traces in CI with eval-view
Snapshot agent behavior, compare outputs and tool-call paths, and block releases when a model or prompt change quietly shifts behavior.
What it does
Catch silent agent regressions by diffing outputs and tool traces in CI with eval-view
Snapshot agent behavior, compare outputs and tool-call paths, and block releases when a model or prompt change quietly shifts behavior.
Prerequisites
Python environment, eval-view installation, repeatable agent scenarios or tests, CI runner or local shell, supported agent stack under test
Installation
Basic usage or getting-started notes:
-
The loop closes: detection → investigation → graded verdict → quarantine governance → broadcast. You wake up, run progress, triage with drift, confirm with check --statistical, and the team sees the digest before...
-
| 📉 DRIFTING | Trend sliding with graded confidence (low/med/high) | Run evalview drift <test> |
-
| 🔎 INVESTIGATE | Verdict layer wants statistical replay | Run evalview check --statistical 5 |
-
Extracted from upstream docs: https://raw.githubusercontent.com/hidai25/eval-view/HEAD/README.md
Documentation
Source
Capabilities
Install
Quality
deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,238 chars)