Eval Runner
Standardized testing harness for running YAML-defined eval fixtures against live or simulated servers with composable...
What it does
Standardized testing harness for running YAML-defined eval fixtures against live or simulated servers with composable assertions and regression reports.
A testing harness for validating server and agent workflows. Test cases are defined as YAML fixtures with steps, expected tool calls, and composable assertions (output matching, schema validation, latency thresholds). Supports live mode that spawns a real server via stdio and simulation mode for CI dry-runs. Includes step output piping between fixture steps, regression reporting that diffs current results against past runs, watch mode for automatic re-execution on file changes, and a CI gate tool for deployment readiness checks.
Capabilities
Server
Quality
deterministic score 0.55 from registry signals: · indexed on pulsemcp · has source repo · registry-generated description present