{"id":"949d532b-a0f6-42de-92c3-0924894f43c9","shortId":"TfM8Ua","kind":"skill","title":"Run repeatable agent evaluation suites with trajectory and simulator coverage using Strands Evals","tagline":"Build repeatable evaluation experiments for agents and LLM apps with output checks, trajectory scoring, simulators, and trace-based review.","description":"# Run repeatable agent evaluation suites with trajectory and simulator coverage using Strands Evals\n\nBuild repeatable evaluation experiments for agents and LLM apps with output checks, trajectory scoring, simulators, and trace-based review.\n\n## Prerequisites\n\nPython 3.10+, pip, optional judge-model access\n\n## Installation\n\nUse the upstream install or setup path that matches your environment:\n- pip install strands-agents-evals\n- pip install -e .\n- pip install -e \".[test]\"\n- pip install -e \".[test,dev]\"\n\nRequirements and caveats from upstream:\n- <a href=\"https://python.org\"><img alt=\"Python versions\" src=\"https://img.shields.io/pypi/pyversions/strands-agents-evals\"/></a>\n- ◆ <a href=\"https://github.com/strands-agents/sdk-python\">Python SDK</a>\n- python\n\nBasic usage or getting-started notes:\n- **Multiple Evaluation Types**: Output evaluation, trajectory analysis, tool usage assessment, and interaction evaluation\n- bash\n- from strands import Agent\n\n- Source: https://github.com/strands-agents/evals\n- Extracted from upstream docs: https://raw.githubusercontent.com/strands-agents/evals/HEAD/README.md\n\n## Documentation\n\n- https://github.com/strands-agents/evals\n\n## Source\n\n- [Agent Skill Exchange](https://agentskillexchange.com/skills/run-repeatable-agent-evaluation-suites-with-trajectory-and-simulator-coverage-using-strands-evals/)","tags":["run","repeatable","agent","evaluation","suites","with","trajectory","and","simulator","coverage","using","strands"],"capabilities":["skill","source-agentskillexchange","skill-run-repeatable-agent-evaluation-suites-with-trajectory-and-simulator-coverage-using-strands-evals","topic-agent-skills","topic-ai-agents","topic-ai-tools","topic-awesome-list","topic-claude-code","topic-codex","topic-cursor","topic-llm","topic-mcp","topic-npx-skills","topic-openclaw","topic-skills-catalog"],"categories":["skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/agentskillexchange/skills/run-repeatable-agent-evaluation-suites-with-trajectory-and-simulator-coverage-using-strands-evals","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add agentskillexchange/skills","source_repo":"https://github.com/agentskillexchange/skills","install_from":"skills.sh"}},"qualityScore":"0.454","qualityRationale":"deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,346 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T19:12:13.755Z","embedding":null,"createdAt":"2026-05-18T13:19:02.036Z","updatedAt":"2026-05-18T19:12:13.755Z","lastSeenAt":"2026-05-18T19:12:13.755Z","tsv":"'/skills/run-repeatable-agent-evaluation-suites-with-trajectory-and-simulator-coverage-using-strands-evals/)':160 '/strands-agents/evals':142,153 '/strands-agents/evals/head/readme.md':149 '3.10':69 'access':75 'agent':3,19,36,52,92,138,155 'agentskillexchange.com':159 'agentskillexchange.com/skills/run-repeatable-agent-evaluation-suites-with-trajectory-and-simulator-coverage-using-strands-evals/)':158 'analysi':127 'app':22,55 'assess':130 'base':32,65 'bash':134 'basic':114 'build':14,47 'caveat':108 'check':25,58 'coverag':10,43 'dev':105 'doc':146 'document':150 'e':96,99,103 'environ':87 'eval':13,46,93 'evalu':4,16,37,49,122,125,133 'exchang':157 'experi':17,50 'extract':143 'get':118 'getting-start':117 'github.com':141,152 'github.com/strands-agents/evals':140,151 'import':137 'instal':76,80,89,95,98,102 'interact':132 'judg':73 'judge-model':72 'llm':21,54 'match':85 'model':74 'multipl':121 'note':120 'option':71 'output':24,57,124 'path':83 'pip':70,88,94,97,101 'prerequisit':67 'python':68,111,113 'raw.githubusercontent.com':148 'raw.githubusercontent.com/strands-agents/evals/head/readme.md':147 'repeat':2,15,35,48 'requir':106 'review':33,66 'run':1,34 'score':27,60 'sdk':112 'setup':82 'simul':9,28,42,61 'skill':156 'skill-run-repeatable-agent-evaluation-suites-with-trajectory-and-simulator-coverage-using-strands-evals' 'sourc':139,154 'source-agentskillexchange' 'start':119 'strand':12,45,91,136 'strands-agents-ev':90 'suit':5,38 'test':100,104 'tool':128 'topic-agent-skills' 'topic-ai-agents' 'topic-ai-tools' 'topic-awesome-list' 'topic-claude-code' 'topic-codex' 'topic-cursor' 'topic-llm' 'topic-mcp' 'topic-npx-skills' 'topic-openclaw' 'topic-skills-catalog' 'trace':31,64 'trace-bas':30,63 'trajectori':7,26,40,59,126 'type':123 'upstream':79,110,145 'usag':115,129 'use':11,44,77","prices":[{"id":"84e2b530-2890-4b16-89d2-df783ad1043e","listingId":"949d532b-a0f6-42de-92c3-0924894f43c9","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"agentskillexchange","category":"skills","install_from":"skills.sh"},"createdAt":"2026-05-18T13:19:02.036Z"}],"sources":[{"listingId":"949d532b-a0f6-42de-92c3-0924894f43c9","source":"github","sourceId":"agentskillexchange/skills/run-repeatable-agent-evaluation-suites-with-trajectory-and-simulator-coverage-using-strands-evals","sourceUrl":"https://github.com/agentskillexchange/skills/tree/main/skills/run-repeatable-agent-evaluation-suites-with-trajectory-and-simulator-coverage-using-strands-evals","isPrimary":false,"firstSeenAt":"2026-05-18T13:19:02.036Z","lastSeenAt":"2026-05-18T19:12:13.755Z"}],"details":{"listingId":"949d532b-a0f6-42de-92c3-0924894f43c9","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"agentskillexchange","slug":"run-repeatable-agent-evaluation-suites-with-trajectory-and-simulator-coverage-using-strands-evals","github":{"repo":"agentskillexchange/skills","stars":8,"topics":["agent-skills","ai-agents","ai-tools","awesome-list","claude-code","codex","cursor","llm","mcp","npx-skills","openclaw","skills-catalog"],"license":"mit","html_url":"https://github.com/agentskillexchange/skills","pushed_at":"2026-05-18T19:02:17Z","description":"The open catalog of AI agent skills — 2,000+ security-scanned skills for Claude Code, Cursor, Codex, and more.","skill_md_sha":"6ab2126296adcec4ad30f3c33d1412222fa6122c","skill_md_path":"skills/run-repeatable-agent-evaluation-suites-with-trajectory-and-simulator-coverage-using-strands-evals/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/agentskillexchange/skills/tree/main/skills/run-repeatable-agent-evaluation-suites-with-trajectory-and-simulator-coverage-using-strands-evals"},"layout":"multi","source":"github","category":"skills","frontmatter":{"name":"Run repeatable agent evaluation suites with trajectory and simulator coverage using Strands Evals","description":"Build repeatable evaluation experiments for agents and LLM apps with output checks, trajectory scoring, simulators, and trace-based review."},"skills_sh_url":"https://skills.sh/agentskillexchange/skills/run-repeatable-agent-evaluation-suites-with-trajectory-and-simulator-coverage-using-strands-evals"},"updatedAt":"2026-05-18T19:12:13.755Z"}}