Run repeatable agent evaluation suites with trajectory and simulator coverage using Strands Evals
Build repeatable evaluation experiments for agents and LLM apps with output checks, trajectory scoring, simulators, and trace-based review.
What it does
Run repeatable agent evaluation suites with trajectory and simulator coverage using Strands Evals
Build repeatable evaluation experiments for agents and LLM apps with output checks, trajectory scoring, simulators, and trace-based review.
Prerequisites
Python 3.10+, pip, optional judge-model access
Installation
Use the upstream install or setup path that matches your environment:
- pip install strands-agents-evals
- pip install -e .
- pip install -e ".[test]"
- pip install -e ".[test,dev]"
Requirements and caveats from upstream:
- <a href="https://python.org"><img alt="Python versions" src="https://img.shields.io/pypi/pyversions/strands-agents-evals"/></a>
- ◆ <a href="https://github.com/strands-agents/sdk-python">Python SDK</a>
- python
Basic usage or getting-started notes:
-
Multiple Evaluation Types: Output evaluation, trajectory analysis, tool usage assessment, and interaction evaluation
-
bash
-
from strands import Agent
-
Extracted from upstream docs: https://raw.githubusercontent.com/strands-agents/evals/HEAD/README.md
Documentation
Source
Capabilities
Install
Quality
deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,346 chars)