{"id":"7f4accb3-0417-4f3a-8159-c3e041730b34","shortId":"ZxNDHZ","kind":"skill","title":"monte-carlo-incident-response","tagline":"Orchestrate incident response — triage, root cause, remediate, prevent recurrence. USE WHEN active alerts, data broken, stale, pipeline failure, or investigate and fix a data incident.","description":"# Monte Carlo Incident Response Workflow\n\nThis workflow orchestrates the full lifecycle of a data incident by sequencing\nexisting Monte Carlo skills. It does not contain investigation or remediation\nlogic itself — each step loads the relevant skill's SKILL.md which has the\nactual instructions.\n\n## When to activate this workflow\n\nActivate when:\n\n- Context detection routes here (active alerts detected + incident intent)\n- User invokes `/mc-incident-response`\n- User asks to \"respond to an incident\", \"handle this alert\", \"triage and fix\"\n- User describes a data quality problem: \"data is broken\", \"table is stale\", \"alert firing\"\n\n## When NOT to activate this workflow\n\n- User wants to create monitors or check coverage without an active incident — use proactive monitoring workflow\n- User is editing a dbt model — defer to `prevent` skill (auto-activates via hooks)\n- User wants to check table health without an incident context — use `asset-health` directly\n- A skill is already active and handling the user's request\n\n---\n\n## Workflow Steps\n\n```\nStep 1 (conditional): Triage — when user has multiple/unknown alerts\nStep 2: Root Cause Analysis — the core investigation\nStep 3: Remediation — fix or escalate\nStep 4 (optional): Prevent Recurrence — add monitoring\n```\n\n### Determine entry point\n\nBefore starting, determine which step to enter based on the user's context:\n\n- **User has no specific alert** (\"I have alerts firing\", \"what's going on?\") → Start at **Step 1: Triage**\n- **User has a specific alert ID or table** (\"alert ABC-123\", \"stg_payments is stale\") → Skip to **Step 2: Root Cause Analysis**\n- **User knows the root cause** (\"the ETL job failed, help me fix it\") → Skip to **Step 3: Remediation**\n- **Ambiguous** → Ask: \"Do you have a specific alert or table you want to investigate, or should I check your recent alerts first?\"\n\n---\n\n### Step 1: Triage (conditional)\n\n**Skill:** Read and follow `../automated-triage/SKILL.md`\n\n**Goal:** Fetch recent alerts, score them by confidence and impact, identify which ones need investigation.\n\n**When to run:** Only when the user doesn't already have a specific alert or incident to investigate. This step helps narrow down \"I have alerts\" into \"these specific alerts need attention.\"\n\n**Scope MCP calls tightly.** On large accounts, broad queries return hundreds of results, overflow the tool-result token limit, spill to disk, and force chunk reads — burning user tokens and exhausting the turn budget. Minimum scoping for tools this workflow touches:\n\n- `get_alerts` → time filter (`created_after`, default last 7 days) + at least one of `warehouse`, `table_names`, `severity`\n- `search` → needed to resolve a table name to its MCON (`get_table` requires MCON). Always pass `limit` (e.g. 5), the table name as `query`, and filter by `warehouse_uuid` or `database`/`schema`. `warehouse_types` alone is too broad. If multiple matches return: (1) auto-pick the match whose `warehouse_display_name` matches the user's named warehouse — do NOT stop to ask; (2) failing that, prefer the `is_key_asset: true` match; (3) only ask the user when none of these resolve it\n- `get_monitors` → filter by `mcons` or `warehouse_uuid`\n\nIf scope is missing, ask the user before calling: \"Which warehouse?\", \"How far back — today, this week?\", \"Any specific severity?\".\n\n**Transition to Step 2:** Once high-priority alert(s) are identified, tell the user:\n\n> \"I've identified [N] high-priority alerts. Let me investigate the root cause of [specific alert/table]. Moving to root cause analysis.\"\n\nThen proceed to Step 2 with the identified alert context.\n\n---\n\n### Step 2: Root Cause Analysis\n\n**Skill:** Read and follow `../analyze-root-cause/SKILL.md`\n\n**Goal:** Investigate why the issue occurred — trace lineage, check ETL changes, analyze query modifications, profile data.\n\n**This is the core step.** Most workflow entries start here.\n\n**Investigate linearly — do not re-call tools.** Walk through the investigation once: (1) find the table, (2) fetch its alerts and freshness, (3) check lineage, (4) check recent queries/ETL. Call each tool at most once per table. If a tool result is insufficient, move to the next signal rather than re-calling with different params — burning turns on redundant calls exhausts the budget before the root cause is reached.\n\n**Transition to Step 3:** When the root cause is identified (or the investigation reaches its limit), summarize findings and tell the user:\n\n> \"Root cause identified: [summary]. Would you like me to help remediate this, or is the investigation sufficient?\"\n\nIf the user wants to proceed, move to Step 3. If they say \"that's enough\", stop.\n\n---\n\n### Step 3: Remediation\n\n**Skill:** Read and follow `../remediation/SKILL.md`\n\n**Goal:** Fix the issue using available tools, or escalate with full context if the fix requires actions outside the agent's capability.\n\n**Transition to Step 4:** After remediation is complete (fix applied or escalation documented), offer prevention:\n\n> \"The issue has been [fixed/escalated]. The root cause was [X]. Want me to help add a monitor to detect this type of issue earlier next time?\"\n\nIf the user says yes, move to Step 4. If no, the workflow is complete.\n\n---\n\n### Step 4: Prevent Recurrence (optional)\n\n**Skill:** Read and follow `../monitoring-advisor/SKILL.md`\n\nWhen loading monitoring-advisor for this step, frame the request as direct monitor creation — not coverage analysis. The user already knows what they want to monitor (the thing that just broke). Example framing:\n\n> \"Based on the incident, I recommend adding a [freshness/volume/validation] monitor on [table]. Let me create the monitor configuration.\"\n\n**Goal:** Add or update a monitor to catch this class of issue in the future.\n\n**Do not force this step.** It is optional — offer it after remediation, and respect if the user declines.\n\n---\n\n## Orchestration Rules\n\n- **Users can enter at any step.** The entry point section above determines where to start.\n- **Each step loads the actual skill's SKILL.md** via relative path. This workflow does not replicate skill logic — it sequences it.\n- **Context carries forward** through conversation naturally. Alert IDs, table names, root cause findings from earlier steps are available to later steps without explicit state passing.\n- **No state tracking or hooks.** This is purely prompt-driven sequencing.\n- **User can exit anytime.** If they say \"that's enough\" or \"stop\", respect it immediately.\n- **Do not skip back.** The workflow moves forward. If the user wants to re-investigate after remediation, they can start a new workflow or invoke a skill directly.","tags":["incident","response","agent","toolkit","monte-carlo-data","agent-observability","agent-skills","ai-agents","claude-code","codex-skills","cursor","data-observability"],"capabilities":["skill","source-monte-carlo-data","skill-incident-response","topic-agent-observability","topic-agent-skills","topic-ai-agents","topic-claude-code","topic-codex-skills","topic-cursor","topic-data-observability","topic-data-quality","topic-mcp","topic-monte-carlo","topic-opencode","topic-skill-md"],"categories":["mc-agent-toolkit"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/monte-carlo-data/mc-agent-toolkit/incident-response","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add monte-carlo-data/mc-agent-toolkit","source_repo":"https://github.com/monte-carlo-data/mc-agent-toolkit","install_from":"skills.sh"}},"qualityScore":"0.488","qualityRationale":"deterministic score 0.49 from registry signals: · indexed on github topic:agent-skills · 76 github stars · SKILL.md body (6,625 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-04-22T06:55:42.277Z","embedding":null,"createdAt":"2026-04-20T19:31:44.917Z","updatedAt":"2026-04-22T06:55:42.277Z","lastSeenAt":"2026-04-22T06:55:42.277Z","tsv":"'-123':259 '/analyze-root-cause/skill.md':595 '/automated-triage/skill.md':319 '/mc-incident-response':92 '/monitoring-advisor/skill.md':844 '/remediation/skill.md':756 '1':186,247,312,469,635 '2':195,267,490,542,580,587,639 '3':203,287,500,645,696,741,750 '4':209,648,782,828,836 '5':445 '7':417 'abc':258 'account':373 'action':773 'activ':17,76,79,85,123,136,154,176 'actual':72,951 'ad':885 'add':213,808,898 'advisor':849 'agent':776 'alert':18,86,102,118,193,235,238,253,257,296,309,323,348,360,364,410,547,561,584,642,974 'alert/table':570 'alon':461 'alreadi':175,344,865 'alway':441 'ambigu':289 'analysi':198,270,575,590,862 'analyz':607 'anytim':1008 'appli':788 'ask':94,290,489,502,523 'asset':169,497 'asset-health':168 'attent':366 'auto':153,471 'auto-activ':152 'auto-pick':470 'avail':762,985 'back':532,1023 'base':225,879 'broad':374,464 'broke':876 'broken':20,114 'budget':401,686 'burn':394,679 'call':369,527,628,652,675,683 'capabl':778 'carlo':3,32,50 'carri':969 'catch':904 'caus':11,197,269,275,567,574,589,690,700,716,801,979 'chang':606 'check':132,160,306,604,646,649 'chunk':392 'class':906 'complet':786,834 'condit':187,314 'confid':327 'configur':896 'contain':55 'context':81,166,230,585,768,968 'convers':972 'core':200,615 'coverag':133,861 'creat':129,413,893 'creation':859 'data':19,29,44,109,112,611 'databas':457 'day':418 'dbt':146 'declin':929 'default':415 'defer':148 'describ':107 'detect':82,87,812 'determin':215,220,943 'differ':677 'direct':171,857,1048 'disk':389 'display':477 'document':791 'doesn':342 'driven':1003 'e.g':444 'earlier':817,982 'edit':144 'enough':747,1014 'enter':224,934 'entri':216,619,939 'escal':207,765,790 'etl':277,605 'exampl':877 'exhaust':398,684 'exist':48 'exit':1007 'explicit':990 'fail':279,491 'failur':23 'far':531 'fetch':321,640 'filter':412,452,513 'find':636,710,980 'fire':119,239 'first':310 'fix':27,105,205,282,758,771,787 'fixed/escalated':798 'follow':318,594,755,843 'forc':391,914 'forward':970,1027 'frame':853,878 'fresh':644 'freshness/volume/validation':887 'full':40,767 'futur':911 'get':409,437,511 'go':242 'goal':320,596,757,897 'handl':100,178 'health':162,170 'help':280,355,724,807 'high':545,559 'high-prior':544,558 'hook':156,997 'hundr':377 'id':254,975 'identifi':330,550,556,583,702,717 'immedi':1019 'impact':329 'incid':4,7,30,33,45,88,99,137,165,350,882 'instruct':73 'insuffici':665 'intent':89 'investig':25,56,201,302,334,352,564,597,622,633,705,730,1035 'invok':91,1045 'issu':600,760,795,816,908 'job':278 'key':496 'know':272,866 'larg':372 'last':416 'later':987 'least':420 'let':562,891 'lifecycl':41 'like':721 'limit':386,443,708 'lineag':603,647 'linear':623 'load':63,846,949 'logic':59,964 'match':467,474,479,499 'mcon':436,440,515 'mcp':368 'minimum':402 'miss':522 'model':147 'modif':609 'monitor':130,140,214,512,810,848,858,871,888,895,902 'monitoring-advisor':847 'mont':2,31,49 'monte-carlo-incident-respons':1 'move':571,666,738,825,1026 'multipl':466 'multiple/unknown':192 'n':557 'name':425,433,448,478,483,977 'narrow':356 'natur':973 'need':333,365,428 'new':1042 'next':669,818 'none':506 'occur':601 'offer':792,920 'one':332,421 'option':210,839,919 'orchestr':6,38,930 'outsid':774 'overflow':380 'param':678 'pass':442,992 'path':957 'payment':261 'per':658 'pick':472 'pipelin':22 'point':217,940 'prefer':493 'prevent':13,150,211,793,837 'prioriti':546,560 'proactiv':139 'problem':111 'proceed':577,737 'profil':610 'prompt':1002 'prompt-driven':1001 'pure':1000 'qualiti':110 'queri':375,450,608 'queries/etl':651 'rather':671 're':627,674,1034 're-cal':626,673 're-investig':1033 'reach':692,706 'read':316,393,592,753,841 'recent':308,322,650 'recommend':884 'recurr':14,212,838 'redund':682 'relat':956 'relev':65 'remedi':12,58,204,288,725,751,784,923,1037 'replic':962 'request':182,855 'requir':439,772 'resolv':430,509 'respect':925,1017 'respond':96 'respons':5,8,34 'result':379,384,663 'return':376,468 'root':10,196,268,274,566,573,588,689,699,715,800,978 'rout':83 'rule':931 'run':337 'say':744,823,1011 'schema':458 'scope':367,403,520 'score':324 'search':427 'section':941 'sequenc':47,966,1004 'sever':426,538 'signal':670 'skill':51,66,151,173,315,591,752,840,952,963,1047 'skill-incident-response' 'skill.md':68,954 'skip':264,284,1022 'source-monte-carlo-data' 'specif':234,252,295,347,363,537,569 'spill':387 'stale':21,117,263 'start':219,244,620,946,1040 'state':991,994 'step':62,184,185,194,202,208,222,246,266,286,311,354,541,579,586,616,695,740,749,781,827,835,852,916,937,948,983,988 'stg':260 'stop':487,748,1016 'suffici':731 'summar':709 'summari':718 'tabl':115,161,256,298,424,432,438,447,638,659,890,976 'tell':551,712 'thing':873 'tight':370 'time':411,819 'today':533 'token':385,396 'tool':383,405,629,654,662,763 'tool-result':382 'topic-agent-observability' 'topic-agent-skills' 'topic-ai-agents' 'topic-claude-code' 'topic-codex-skills' 'topic-cursor' 'topic-data-observability' 'topic-data-quality' 'topic-mcp' 'topic-monte-carlo' 'topic-opencode' 'topic-skill-md' 'touch':408 'trace':602 'track':995 'transit':539,693,779 'triag':9,103,188,248,313 'true':498 'turn':400,680 'type':460,814 'updat':900 'use':15,138,167,761 'user':90,93,106,126,142,157,180,190,228,231,249,271,341,395,481,504,525,553,714,734,822,864,928,932,1005,1030 'uuid':455,518 've':555 'via':155,955 'walk':630 'want':127,158,300,735,804,869,1031 'warehous':423,454,459,476,484,517,529 'week':535 'whose':475 'without':134,163,989 'workflow':35,37,78,125,141,183,407,618,832,959,1025,1043 'would':719 'x':803 'yes':824","prices":[{"id":"7630d88e-4e44-4103-85f6-edde292700a8","listingId":"7f4accb3-0417-4f3a-8159-c3e041730b34","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"monte-carlo-data","category":"mc-agent-toolkit","install_from":"skills.sh"},"createdAt":"2026-04-20T19:31:44.917Z"}],"sources":[{"listingId":"7f4accb3-0417-4f3a-8159-c3e041730b34","source":"github","sourceId":"monte-carlo-data/mc-agent-toolkit/incident-response","sourceUrl":"https://github.com/monte-carlo-data/mc-agent-toolkit/tree/main/skills/incident-response","isPrimary":false,"firstSeenAt":"2026-04-20T19:31:44.917Z","lastSeenAt":"2026-04-22T06:55:42.277Z"}],"details":{"listingId":"7f4accb3-0417-4f3a-8159-c3e041730b34","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"monte-carlo-data","slug":"incident-response","github":{"repo":"monte-carlo-data/mc-agent-toolkit","stars":76,"topics":["agent-observability","agent-skills","ai-agents","claude-code","codex-skills","cursor","data-observability","data-quality","mcp","monte-carlo","opencode","skill-md","skillsmp","vscode"],"license":"apache-2.0","html_url":"https://github.com/monte-carlo-data/mc-agent-toolkit","pushed_at":"2026-04-22T00:57:31Z","description":"Official Monte Carlo toolkit for AI coding agents. Skills and plugins that bring data and agent observability — monitoring, triaging, troubleshooting, health checks  — into Claude Code, Cursor, and more.","skill_md_sha":"c769ee9b5c3c2663983f3a02616ff8e26dd66b53","skill_md_path":"skills/incident-response/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/monte-carlo-data/mc-agent-toolkit/tree/main/skills/incident-response"},"layout":"multi","source":"github","category":"mc-agent-toolkit","frontmatter":{"name":"monte-carlo-incident-response","description":"Orchestrate incident response — triage, root cause, remediate, prevent recurrence. USE WHEN active alerts, data broken, stale, pipeline failure, or investigate and fix a data incident."},"skills_sh_url":"https://skills.sh/monte-carlo-data/mc-agent-toolkit/incident-response"},"updatedAt":"2026-04-22T06:55:42.277Z"}}