Skillquality 0.45
evaluate-trace
Evaluate one or more traces against an existing Truesight live evaluation. Use when a deployed live evaluation already exists and the user wants run outputs with optional handoff to review and promotion.
Price
free
Protocol
skill
Verified
no
What it does
Evaluate Trace
Use this skill when the user wants to evaluate traces with an existing live evaluation endpoint.
Interactive Q&A protocol (mandatory)
<HARD-GATE> BEFORE the first scoping question, search for a structured question tool (e.g., `AskUserQuestion` or similar interactive widget) and load it. Use that tool for EVERY scoping question. Fall back to plain-text lettered options ONLY if no such tool exists in the environment. </HARD-GATE>If context does not make scope clear, ask one question at a time using the structured question tool (loaded per the HARD-GATE above).
Example question structure:
Do you want to evaluate one trace or a batch?
A) One trace now
B) Small batch (up to 25)
C) Full batch loop
Rules:
- Ask exactly one clarifying question per message.
- Use the structured question tool for every question. Structure each with a short header, 2-4 options with labels and descriptions, and place the recommended option first. Do not add "(Recommended)" or similar annotations to option labels.
- Ask a single follow-up if needed, then proceed.
Workflow
- Identify target live evaluation:
- If live evaluation id is unknown, call
list_live_evaluations. - Select
public_idand verify requiredinput_columns.
- If live evaluation id is unknown, call
- Prepare inputs:
- Ensure
inputskeys exactly matchinput_columns. - Include
media_urlfor multimodal evaluations when needed.
- Ensure
- Execute evaluation:
- Use the
run_evaltool withlive_evaluation_idandinputsfor each trace.
- Use the
- Return useful outputs:
run_id- per-judgment scores/outcomes
- brief interpretation for next action
- Optional handoff:
- If human judgment is needed, route to
review-and-promote-traces.
- If human judgment is needed, route to
Batch mode guidance
- Use deterministic trace ordering and log
run_idfor each input. - Apply retries with stable idempotency context in caller logic if needed.
- Summarize failures by category or threshold, then propose review handoff.
Scopes reference
list_live_evaluationsrequireslive-evaluations:readrun_evalrequireslive-evaluations:execute
If a scope error occurs, ask the user to create an API key with the missing scope in Truesight Settings.
Capabilities
skillsource-goodeye-labsskill-evaluate-tracetopic-agent-skillstopic-ai-evaluationtopic-chatgpttopic-claudetopic-cursortopic-llmtopic-mcptopic-truesighttopic-vscodetopic-windsurf
Install
Installnpx skills add Goodeye-Labs/truesight-mcp-skills
Transportskills-sh
Protocolskill
Quality
0.45/ 1.00
deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 6 github stars · SKILL.md body (2,202 chars)
Provenance
Indexed fromgithub
Enriched2026-05-18 13:22:57Z · deterministic:skill-github:v1 · v1
First seen2026-05-18
Last seen2026-05-18