Skillquality 0.45

evaluate-trace

Evaluate one or more traces against an existing Truesight live evaluation. Use when a deployed live evaluation already exists and the user wants run outputs with optional handoff to review and promotion.

Price

free

Protocol

skill

Verified

Endpoint

https://skills.sh/Goodeye-Labs/truesight-mcp-skills/evaluate-trace

What it does

Evaluate Trace

Use this skill when the user wants to evaluate traces with an existing live evaluation endpoint.

Interactive Q&A protocol (mandatory)

<HARD-GATE> BEFORE the first scoping question, search for a structured question tool (e.g., `AskUserQuestion` or similar interactive widget) and load it. Use that tool for EVERY scoping question. Fall back to plain-text lettered options ONLY if no such tool exists in the environment. </HARD-GATE>

If context does not make scope clear, ask one question at a time using the structured question tool (loaded per the HARD-GATE above).

Example question structure:

Do you want to evaluate one trace or a batch?
A) One trace now
B) Small batch (up to 25)
C) Full batch loop

Rules:

Ask exactly one clarifying question per message.
Use the structured question tool for every question. Structure each with a short header, 2-4 options with labels and descriptions, and place the recommended option first. Do not add "(Recommended)" or similar annotations to option labels.
Ask a single follow-up if needed, then proceed.

Workflow

Identify target live evaluation:
- If live evaluation id is unknown, call list_live_evaluations.
- Select public_id and verify required input_columns.
Prepare inputs:
- Ensure inputs keys exactly match input_columns.
- Include media_url for multimodal evaluations when needed.
Execute evaluation:
- Use the run_eval tool with live_evaluation_id and inputs for each trace.
Return useful outputs:
- run_id
- per-judgment scores/outcomes
- brief interpretation for next action
Optional handoff:
- If human judgment is needed, route to review-and-promote-traces.

Batch mode guidance

Use deterministic trace ordering and log run_id for each input.
Apply retries with stable idempotency context in caller logic if needed.
Summarize failures by category or threshold, then propose review handoff.

Scopes reference

list_live_evaluations requires live-evaluations:read
run_eval requires live-evaluations:execute

If a scope error occurs, ask the user to create an API key with the missing scope in Truesight Settings.

Capabilities

skillsource-goodeye-labsskill-evaluate-tracetopic-agent-skillstopic-ai-evaluationtopic-chatgpttopic-claudetopic-cursortopic-llmtopic-mcptopic-truesighttopic-vscodetopic-windsurf

Install

Installnpx skills add Goodeye-Labs/truesight-mcp-skills

Sourcehttps://github.com/Goodeye-Labs/truesight-mcp-skills/tree/main/skills/evaluate-trace

skills.shhttps://skills.sh/Goodeye-Labs/truesight-mcp-skills/evaluate-trace

Transportskills-sh

Protocolskill

Quality

0.45/ 1.00

deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 6 github stars · SKILL.md body (2,202 chars)

Provenance

Indexed fromgithub

Enriched2026-05-18 13:22:57Z · deterministic:skill-github:v1 · v1

First seen2026-05-18

Last seen2026-05-18

Agent access

JSONhttps://clawmart.sh/api/listings/KWJdZ5