Skillquality 0.45
review-and-promote-traces
Judge flagged trace outputs and promote judged items back to datasets. Use when an evaluation run requires human judgment or when review queue items need to be judged for promotion into the dataset.
Price
free
Protocol
skill
Verified
no
What it does
Review and Promote Traces
Use this skill for the judgment and promotion loop after trace evaluation.
Interactive Q&A protocol (mandatory)
<HARD-GATE> BEFORE the first scoping question, search for a structured question tool (e.g., `AskUserQuestion` or similar interactive widget) and load it. Use that tool for EVERY scoping question. Fall back to plain-text lettered options ONLY if no such tool exists in the environment. </HARD-GATE>If context is unclear, ask one question at a time using the structured question tool (loaded per the HARD-GATE above).
Example question structure:
What do you want to review first?
A) One specific run
B) Pending queue triage across runs
Rules:
- Ask one question per message.
- Use the structured question tool for every question. Structure each with a short header, 2-4 options with labels and descriptions, and place the recommended option first. Do not add "(Recommended)" or similar annotations to option labels.
- Ask one follow-up when needed, then continue.
Workflow
- If needed, flag evaluated run:
- Use
flag_review_itemwithrun_id.
- Use
- Retrieve review items:
- Use
list_review_itemswith pagination.
- Use
- Collect human judgments:
- Present review items to the user and ask for judgment one item at a time when needed.
- Capture
judgment_valueand optionalnotesfor each item. - Use the structured question tool (loaded per the HARD-GATE above) to present options that match the live evaluation setup:
- Binary example:
- A) Pass
- B) Fail
- Categorical example:
- A) <option 1>
- B) <option 2>
- C) <option 3>
- Continuous example:
- A) Enter numeric value (within configured range)
- B) Skip this item for now
- Binary example:
- If valid options are unclear, look them up before asking:
- Inspect
list_review_itemspayload for item result details and expected value hints. - Use
get_result(run_id)for run-level context and chain outputs. - If evaluation id is available in run metadata, call
get_evaluation(evaluation_id)and read config/judgment criteria to derive valid judgment options.
- Inspect
- When options remain ambiguous after lookup, ask one clarification question using the structured question tool before proceeding.
- Submit judgments:
- For each target item, call
judge_review_item. - Pass the user-provided
judgment_valueand optionalnotesintojudge_review_item. - Include notes when judgment context matters.
- For each target item, call
- Promote judged outputs:
- Use
add_reviewed_items_to_datasetwithrun_id.
- Use
- Report result:
- number of items judged
- promotion status and row counts
- any skipped or blocked items
Queue-wide triage guidance
- Group by run and status first.
- Prioritize high-impact runs or oldest pending runs.
- Keep an audit trail of judgment rationale in notes.
Scopes reference
list_review_itemsrequiresreview:readflag_review_item,judge_review_item, andadd_reviewed_items_to_datasetrequirereview:write
If a scope error occurs, ask the user to create a key with the missing scope in Truesight Settings.
Capabilities
skillsource-goodeye-labsskill-review-and-promote-tracestopic-agent-skillstopic-ai-evaluationtopic-chatgpttopic-claudetopic-cursortopic-llmtopic-mcptopic-truesighttopic-vscodetopic-windsurf
Install
Installnpx skills add Goodeye-Labs/truesight-mcp-skills
Sourcehttps://github.com/Goodeye-Labs/truesight-mcp-skills/tree/main/skills/review-and-promote-traces
Transportskills-sh
Protocolskill
Quality
0.45/ 1.00
deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 6 github stars · SKILL.md body (3,154 chars)
Provenance
Indexed fromgithub
Enriched2026-05-18 13:22:57Z · deterministic:skill-github:v1 · v1
First seen2026-05-18
Last seen2026-05-18