{"id":"f30087ed-9230-4ea3-8c73-67aa01fda3b8","shortId":"KvEzxv","kind":"skill","title":"error-analysis","tagline":"Systematically identify and categorize failure modes in evaluated traces using Truesight datasets and error-analysis tools. Use when quality issues are unclear, after major pipeline changes, or when incidents indicate drift.","description":"# Error Analysis\n\nGuide the user through trace-grounded failure analysis and dataset labeling.\n\n## Interactive Q&A protocol (mandatory)\n\n<HARD-GATE>\nBEFORE the first scoping question, search for a structured question tool (e.g., `AskUserQuestion` or similar interactive widget) and load it. Use that tool for EVERY scoping question. Fall back to plain-text lettered options ONLY if no such tool exists in the environment.\n</HARD-GATE>\n\nAsk one question at a time using the structured question tool (loaded per the HARD-GATE above).\n\nExample question structure:\n\n```\nWhich data source should we analyze first?\nA) Existing Truesight dataset\nB) New dataset to upload\nC) Unsure, list datasets first\n```\n\nRules:\n- One question per message during setup.\n- Use the structured question tool for every question. Structure each with a short header, 2-4 options with labels and descriptions, and place the recommended option first. Do not add \"(Recommended)\" or similar annotations to option labels.\n- Ask one follow-up if response is ambiguous.\n\n## Core workflow\n\n1. Select or create dataset:\n   - If dataset exists, use `list_datasets`.\n   - If not, use `upload_dataset`.\n2. Collect representative traces:\n   - Target approximately 100 traces when possible.\n   - Use random plus stratified coverage when volume is high.\n3. Analyze row by row:\n   - Use `get_dataset_rows` with pagination.\n   - For each row, call `suggest_error_notes`.\n4. Persist annotations:\n   - Save `_ts_error_notes` and `_ts_error_category` with `update_dataset_row`.\n5. Consolidate categories:\n   - Run `consolidate_error_categories`.\n   - Review mapping proposals, then apply with `apply_category_mappings`.\n6. Prioritize fixes:\n   - Report most frequent categories first.\n   - Recommend next skill based on failure type:\n     - `create-evaluation` for new evaluation coverage\n     - `review-and-promote-traces` for judgment backlog\n     - `eval-audit` for broader process gaps\n\n## Analysis heuristics\n\n- Focus on first root failure in each trace, not every downstream symptom.\n- Let categories emerge from observed traces, not pre-baked labels.\n- Iterate categories after 20 traces, then relabel for consistency.\n- Stop when recent traces no longer reveal new failure categories.\n\n## Anti-patterns\n\n- Defining categories before reading traces.\n- Treating output quality labels as generic scores without concrete failure modes.\n- Skipping relabel after category definitions change.\n- Building new evaluators before fixing obvious prompt/tooling/engineering gaps.\n\n## Scopes reference\n\n- `list_datasets`, `get_dataset_rows` require `datasets:read`\n- `upload_dataset`, `update_dataset_row`, `apply_category_mappings` require `datasets:write`\n- `suggest_error_notes`, `consolidate_error_categories` require `error-analysis:execute`","tags":["error","analysis","truesight","mcp","skills","goodeye-labs","agent-skills","ai-evaluation","chatgpt","claude","cursor","llm"],"capabilities":["skill","source-goodeye-labs","skill-error-analysis","topic-agent-skills","topic-ai-evaluation","topic-chatgpt","topic-claude","topic-cursor","topic-llm","topic-mcp","topic-truesight","topic-vscode","topic-windsurf"],"categories":["truesight-mcp-skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/Goodeye-Labs/truesight-mcp-skills/error-analysis","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add Goodeye-Labs/truesight-mcp-skills","source_repo":"https://github.com/Goodeye-Labs/truesight-mcp-skills","install_from":"skills.sh"}},"qualityScore":"0.453","qualityRationale":"deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 6 github stars · SKILL.md body (2,792 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T13:22:57.112Z","embedding":null,"createdAt":"2026-05-18T13:22:57.112Z","updatedAt":"2026-05-18T13:22:57.112Z","lastSeenAt":"2026-05-18T13:22:57.112Z","tsv":"'-4':163 '1':196 '100':218 '2':162,212 '20':345 '3':231 '4':249 '5':264 '6':280 'add':177 'ambigu':193 'analysi':3,19,37,46,317,424 'analyz':125,232 'annot':181,251 'anti':362 'anti-pattern':361 'appli':275,277,409 'approxim':217 'ask':99,185 'askuserquest':67 'audit':312 'b':131 'back':83 'backlog':309 'bake':340 'base':291 'broader':314 'build':386 'c':136 'call':245 'categor':7 'categori':259,266,270,278,286,332,343,360,365,383,410,420 'chang':30,385 'collect':213 'concret':377 'consist':350 'consolid':265,268,418 'core':194 'coverag':226,301 'creat':199,296 'create-evalu':295 'data':121 'dataset':15,48,130,133,139,200,202,206,211,238,262,397,399,402,405,407,413 'defin':364 'definit':384 'descript':168 'downstream':329 'drift':35 'e.g':66 'emerg':333 'environ':98 'error':2,18,36,247,254,258,269,416,419,423 'error-analysi':1,17,422 'eval':311 'eval-audit':310 'evalu':11,297,300,388 'everi':79,154,328 'exampl':117 'execut':425 'exist':95,128,203 'failur':8,45,293,323,359,378 'fall':82 'first':57,126,140,174,287,321 'fix':282,390 'focus':319 'follow':188 'follow-up':187 'frequent':285 'gap':316,393 'gate':115 'generic':374 'get':237,398 'ground':44 'guid':38 'hard':114 'hard-gat':113 'header':161 'heurist':318 'high':230 'identifi':5 'incid':33 'indic':34 'interact':50,70 'issu':24 'iter':342 'judgment':308 'label':49,166,184,341,372 'let':331 'letter':88 'list':138,205,396 'load':73,110 'longer':356 'major':28 'mandatori':54 'map':272,279,411 'messag':145 'mode':9,379 'new':132,299,358,387 'next':289 'note':248,255,417 'observ':335 'obvious':391 'one':100,142,186 'option':89,164,173,183 'output':370 'pagin':241 'pattern':363 'per':111,144 'persist':250 'pipelin':29 'place':170 'plain':86 'plain-text':85 'plus':224 'possibl':221 'pre':339 'pre-bak':338 'priorit':281 'process':315 'promot':305 'prompt/tooling/engineering':392 'propos':273 'protocol':53 'q':51 'qualiti':23,371 'question':59,64,81,101,108,118,143,151,155 'random':223 'read':367,403 'recent':353 'recommend':172,178,288 'refer':395 'relabel':348,381 'report':283 'repres':214 'requir':401,412,421 'respons':191 'reveal':357 'review':271,303 'review-and-promote-trac':302 'root':322 'row':233,235,239,244,263,400,408 'rule':141 'run':267 'save':252 'scope':58,80,394 'score':375 'search':60 'select':197 'setup':147 'short':160 'similar':69,180 'skill':290 'skill-error-analysis' 'skip':380 'sourc':122 'source-goodeye-labs' 'stop':351 'stratifi':225 'structur':63,107,119,150,156 'suggest':246,415 'symptom':330 'systemat':4 'target':216 'text':87 'time':104 'tool':20,65,77,94,109,152 'topic-agent-skills' 'topic-ai-evaluation' 'topic-chatgpt' 'topic-claude' 'topic-cursor' 'topic-llm' 'topic-mcp' 'topic-truesight' 'topic-vscode' 'topic-windsurf' 'trace':12,43,215,219,306,326,336,346,354,368 'trace-ground':42 'treat':369 'truesight':14,129 'ts':253,257 'type':294 'unclear':26 'unsur':137 'updat':261,406 'upload':135,210,404 'use':13,21,75,105,148,204,209,222,236 'user':40 'volum':228 'widget':71 'without':376 'workflow':195 'write':414","prices":[{"id":"ab258a75-4395-42e6-b5c7-bf7b1934a6b8","listingId":"f30087ed-9230-4ea3-8c73-67aa01fda3b8","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"Goodeye-Labs","category":"truesight-mcp-skills","install_from":"skills.sh"},"createdAt":"2026-05-18T13:22:57.112Z"}],"sources":[{"listingId":"f30087ed-9230-4ea3-8c73-67aa01fda3b8","source":"github","sourceId":"Goodeye-Labs/truesight-mcp-skills/error-analysis","sourceUrl":"https://github.com/Goodeye-Labs/truesight-mcp-skills/tree/main/skills/error-analysis","isPrimary":false,"firstSeenAt":"2026-05-18T13:22:57.112Z","lastSeenAt":"2026-05-18T13:22:57.112Z"}],"details":{"listingId":"f30087ed-9230-4ea3-8c73-67aa01fda3b8","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"Goodeye-Labs","slug":"error-analysis","github":{"repo":"Goodeye-Labs/truesight-mcp-skills","stars":6,"topics":["agent-skills","ai-evaluation","chatgpt","claude","cursor","llm","mcp","truesight","vscode","windsurf"],"license":"mit","html_url":"https://github.com/Goodeye-Labs/truesight-mcp-skills","pushed_at":"2026-03-26T06:15:56Z","description":"Agent skills for the Truesight MCP. Step-by-step workflow playbooks for scoring inputs, building live evaluations, error analysis, and the review loop. Works with Claude Code, Cursor, ChatGPT, VS Code, Windsurf, and any client that supports the agent skills standard.","skill_md_sha":"0856e6a78d9db3949fc830bd4a6733bb36624978","skill_md_path":"skills/error-analysis/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/Goodeye-Labs/truesight-mcp-skills/tree/main/skills/error-analysis"},"layout":"multi","source":"github","category":"truesight-mcp-skills","frontmatter":{"name":"error-analysis","description":"Systematically identify and categorize failure modes in evaluated traces using Truesight datasets and error-analysis tools. Use when quality issues are unclear, after major pipeline changes, or when incidents indicate drift."},"skills_sh_url":"https://skills.sh/Goodeye-Labs/truesight-mcp-skills/error-analysis"},"updatedAt":"2026-05-18T13:22:57.112Z"}}