Skillquality 0.45

error-analysis

Systematically identify and categorize failure modes in evaluated traces using Truesight datasets and error-analysis tools. Use when quality issues are unclear, after major pipeline changes, or when incidents indicate drift.

Price

free

Protocol

skill

Verified

Endpoint

https://skills.sh/Goodeye-Labs/truesight-mcp-skills/error-analysis

What it does

Error Analysis

Guide the user through trace-grounded failure analysis and dataset labeling.

Interactive Q&A protocol (mandatory)

<HARD-GATE> BEFORE the first scoping question, search for a structured question tool (e.g., `AskUserQuestion` or similar interactive widget) and load it. Use that tool for EVERY scoping question. Fall back to plain-text lettered options ONLY if no such tool exists in the environment. </HARD-GATE>

Ask one question at a time using the structured question tool (loaded per the HARD-GATE above).

Example question structure:

Which data source should we analyze first?
A) Existing Truesight dataset
B) New dataset to upload
C) Unsure, list datasets first

Rules:

One question per message during setup.
Use the structured question tool for every question. Structure each with a short header, 2-4 options with labels and descriptions, and place the recommended option first. Do not add "(Recommended)" or similar annotations to option labels.
Ask one follow-up if response is ambiguous.

Core workflow

Select or create dataset:
- If dataset exists, use list_datasets.
- If not, use upload_dataset.
Collect representative traces:
- Target approximately 100 traces when possible.
- Use random plus stratified coverage when volume is high.
Analyze row by row:
- Use get_dataset_rows with pagination.
- For each row, call suggest_error_notes.
Persist annotations:
- Save _ts_error_notes and _ts_error_category with update_dataset_row.
Consolidate categories:
- Run consolidate_error_categories.
- Review mapping proposals, then apply with apply_category_mappings.
Prioritize fixes:
- Report most frequent categories first.
- Recommend next skill based on failure type:
  - create-evaluation for new evaluation coverage
  - review-and-promote-traces for judgment backlog
  - eval-audit for broader process gaps

Analysis heuristics

Focus on first root failure in each trace, not every downstream symptom.
Let categories emerge from observed traces, not pre-baked labels.
Iterate categories after 20 traces, then relabel for consistency.
Stop when recent traces no longer reveal new failure categories.

Anti-patterns

Defining categories before reading traces.
Treating output quality labels as generic scores without concrete failure modes.
Skipping relabel after category definitions change.
Building new evaluators before fixing obvious prompt/tooling/engineering gaps.

Scopes reference

list_datasets, get_dataset_rows require datasets:read
upload_dataset, update_dataset_row, apply_category_mappings require datasets:write
suggest_error_notes, consolidate_error_categories require error-analysis:execute

Capabilities

skillsource-goodeye-labsskill-error-analysistopic-agent-skillstopic-ai-evaluationtopic-chatgpttopic-claudetopic-cursortopic-llmtopic-mcptopic-truesighttopic-vscodetopic-windsurf

Install

Installnpx skills add Goodeye-Labs/truesight-mcp-skills

Sourcehttps://github.com/Goodeye-Labs/truesight-mcp-skills/tree/main/skills/error-analysis

skills.shhttps://skills.sh/Goodeye-Labs/truesight-mcp-skills/error-analysis

Transportskills-sh

Protocolskill

Quality

0.45/ 1.00

deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 6 github stars · SKILL.md body (2,792 chars)

Provenance

Indexed fromgithub

Enriched2026-05-18 13:22:57Z · deterministic:skill-github:v1 · v1

First seen2026-05-18

Last seen2026-05-18

Agent access

JSONhttps://clawmart.sh/api/listings/KvEzxv