Score model outputs with reusable evaluator prompts and metrics using autoevals
Apply reusable evaluators to model outputs when you need lightweight scoring, rationale capture, or quick eval loops in code.
What it does
Score model outputs with reusable evaluator prompts and metrics using autoevals
Apply reusable evaluators to model outputs when you need lightweight scoring, rationale capture, or quick eval loops in code.
Prerequisites
Python or Node.js, access to an OpenAI-compatible model endpoint or Braintrust proxy
Installation
Use the upstream install or setup path that matches your environment:
- npm install autoevals
- pip install autoevals
- npx braintrust run example.eval.js
- To install the development dependencies, run make develop, and run source env.sh to activate the environment. Make a .env file from the .env.example file and set the environment variables. Run direnv allow to load the...
Requirements and caveats from upstream:
- Python 3.9 or higher
- Compatible with both OpenAI Python SDK v0.x and v1.x
-
Python
Basic usage or getting-started notes:
-
project but are implemented so you can flexibly run them on individual examples, tweak the prompts, and debug
- </div>
- <div className="tabs">
-
Extracted from upstream docs: https://raw.githubusercontent.com/braintrustdata/autoevals/HEAD/README.md
Documentation
Source
Capabilities
Install
Quality
deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,408 chars)