Grade agent trajectories and tool-use decisions with AgentEvals
Score whether an agent took a sensible intermediate path, called tools correctly, and reached the outcome without relying only on final-answer checks.
What it does
Grade agent trajectories and tool-use decisions with AgentEvals
Score whether an agent took a sensible intermediate path, called tools correctly, and reached the outcome without relying only on final-answer checks.
Prerequisites
Python or TypeScript runtime, agent run outputs or trajectories, optional LLM judge provider
Installation
Use the upstream install or setup path that matches your environment:
- pip install agentevals
- npm install agentevals @langchain/core
- pip install openai
- npm install openai
Requirements and caveats from upstream:
- <summary>Python</summary>
- python
- Python Async Support
Basic usage or getting-started notes:
-
To get started, install agentevals:
- <details open>
-
bash
-
Extracted from upstream docs: https://raw.githubusercontent.com/langchain-ai/agentevals/HEAD/README.md
Documentation
Source
Capabilities
Install
Quality
deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,116 chars)