{"id":"4d0eef44-df72-4701-a4d2-27c4ed7fd8ba","shortId":"CLFrEF","kind":"skill","title":"monte-carlo-performance-diagnosis","tagline":"Diagnoses pipeline performance issues -- slow jobs, expensive queries,\nlatency trends -- using Monte Carlo's cross-platform observability.\nUses a tiered investigation approach: discover problems, bridge to\naffected tables, then drill into root causes. Activates when a user\nasks a","description":"# Monte Carlo Performance Diagnosis Skill\n\nThis skill helps diagnose data pipeline performance issues using Monte Carlo's cross-platform observability data. It works across Airflow, dbt, Databricks, and warehouse query engines to find bottlenecks, detect regressions, and identify root causes.\n\nReference files live next to this skill file. **Use the Read tool** (not MCP resources) to access them:\n\n- Tiered investigation approach: `references/investigation-tiers.md` (relative to this file)\n- Query analysis patterns: `references/query-analysis.md` (relative to this file)\n\n## When to activate this skill\n\nActivate when the user:\n\n- Asks about slow pipelines, jobs, or queries\n- Wants to find expensive or costly queries\n- Mentions performance regressions or degradation\n- Asks \"why is this pipeline slow?\" or \"what's using the most compute?\"\n- Wants to compare performance over time or find bottleneck tasks\n- Asks about failed or futile query patterns\n\n## When NOT to activate this skill\n\nDo not activate when the user is:\n\n- Investigating data quality issues (use the prevent skill)\n- Looking at storage costs (use the storage-cost-analysis skill)\n- Creating monitors (use the monitoring-advisor skill)\n- Just querying data or exploring table contents\n\n## Prerequisites\n\nThe following MCP tools must be available (connect to Monte Carlo's MCP server):\n\n**Discovery tools (Tier 1):**\n- `get_jobs_performance` -- find slow/failing jobs across Airflow, dbt, Databricks\n- `get_top_slow_queries` -- find slowest query groups by total runtime\n\n**Bridge tool:**\n- `get_tables_for_job` -- convert job MCONs to table MCONs\n\n**Diagnosis tools (Tier 2):**\n- `get_tasks_performance` -- drill into a job's individual tasks\n- `get_change_timeline` -- unified timeline of query changes, volume shifts, Airflow/dbt failures\n- `get_query_rca` -- root cause analysis for failed/futile queries\n- `get_query_latency_distribution` -- latency trend over time\n- `get_asset_lineage` -- trace upstream/downstream impact\n\n**Supporting tools:**\n- `get_warehouses` -- list available warehouses\n\n## Workflow\n\n### Step 1: Identify the scope\n\nDetermine what the user wants to investigate:\n- **Specific job/pipeline**: User mentions a job name or pipeline\n- **Specific table**: User mentions a table that's slow to update\n- **General discovery**: User wants to find what's slow\n\nCall `get_warehouses` to list available warehouses. Match the user's context to a warehouse.\n\n### Step 2: Tier 1 -- Discovery\n\nIf you don't have specific MCONs to investigate, start with discovery:\n\n1. **Find slow jobs**: Call `get_jobs_performance` with optional `integration_type` filter (AIRFLOW, DATABRICKS, DBT) if the user specifies a platform.\n   - Results include: job name, average duration, trend (7-day), run count, failure rate\n   - Look for: high `avgDuration`, negative `runDurationTrend7d`, high failure rates\n\n2. **Find expensive queries**: Call `get_top_slow_queries` with optional `warehouse_id` and `query_type` (\"read\" for SELECTs, \"write\" for INSERT/CREATE/MERGE).\n   - Results include: query hash, total runtime, average runtime, run count\n   - Look for: queries with high total runtime or high individual execution time\n\nPresent the top findings to the user before drilling deeper. A typical investigation needs only 3-7 tool calls.\n\n**If both discovery tools return no results:** Tell the user no performance issues were found in the current time window. Suggest broadening the scope (different warehouse, longer time range, or a different platform filter).\n\n### Step 3: Bridge -- Job to Tables\n\nAfter Tier 1 identifies problematic jobs, convert to table MCONs:\n\nCall `get_tables_for_job(job_mcon=..., integration_type=...)` using the `integration_type` from the job performance results.\n\nThis gives you the table MCONs needed for Tier 2 investigation.\n\n### Step 4: Tier 2 -- Diagnosis\n\nNow drill into root causes using the MCONs from discovery or the bridge:\n\n1. **Task bottleneck**: Call `get_tasks_performance` to find which specific task in a job is the bottleneck.\n\n2. **What changed?** Call `get_change_timeline` -- this is your most powerful tool. It returns a unified timeline of:\n   - Query text changes (schema modifications, new JOINs, filter changes)\n   - Volume shifts (row count spikes/drops)\n   - Airflow task failures\n   - dbt model failures\n   All in one call. Look for correlations: \"query changed on day X, runtime doubled on day X+1.\"\n\n3. **Why are queries failing?** Call `get_query_rca` to get root cause analysis:\n   - **Failed** queries: errors, timeouts, permission issues\n   - **Futile** queries: queries that run but produce no useful output\n   - Patterns are pre-computed -- the tool groups failures by cause\n\n4. **Is latency degrading?** Call `get_query_latency_distribution` to see the trend:\n   - Compare p50 vs p95 -- if p95 >> p50 (>5x), the problem is outlier queries\n   - Look for step-changes in latency (sudden increase = regression)\n\n5. **Trace impact**: Call `get_asset_lineage` with `direction=\"DOWNSTREAM\"` to see what's affected by a slow table, or `direction=\"UPSTREAM\"` to find what feeds it.\n\n### Step 5: Present findings\n\nStructure your response as:\n\n1. **Problem summary**: What's slow and by how much (with exact numbers from tools)\n2. **Root cause**: What changed or what's causing the issue\n3. **Impact**: What downstream systems are affected\n4. **Recommendations**: Specific actions to fix the issue\n\n### Important rules\n\n- **Quote tool numbers exactly.** If a tool returns \"1282 runs, avg 22.5s\", say exactly that. Never round, estimate, or fabricate numbers.\n- **Always compare to baselines.** Use 7-day trend data (`runDurationTrend7d`) to distinguish regressions from normal variance. Flag if trend data has less than 0.1 confidence.\n- **Stop when you have a root cause.** 3-7 tool calls is typical. More than 10 means you're over-investigating.\n- **Read vs write queries**: When the user asks about \"reads\" or \"read queries\", filter with `query_type=\"read\"`. When they ask about \"writes\", use `query_type=\"write\"`. Do NOT mix them.\n- **Never expose MCONs, UUIDs, or internal identifiers** to the user. Use human-readable names.\n- **Cross-platform**: This skill works across Airflow, dbt, and Databricks. Note which platform each finding comes from.","tags":["performance","diagnosis","agent","toolkit","monte-carlo-data","agent-observability","agent-skills","ai-agents","claude-code","codex-skills","cursor","data-observability"],"capabilities":["skill","source-monte-carlo-data","skill-performance-diagnosis","topic-agent-observability","topic-agent-skills","topic-ai-agents","topic-claude-code","topic-codex-skills","topic-cursor","topic-data-observability","topic-data-quality","topic-mcp","topic-monte-carlo","topic-opencode","topic-skill-md"],"categories":["mc-agent-toolkit"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/monte-carlo-data/mc-agent-toolkit/performance-diagnosis","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add monte-carlo-data/mc-agent-toolkit","source_repo":"https://github.com/monte-carlo-data/mc-agent-toolkit","install_from":"skills.sh"}},"qualityScore":"0.489","qualityRationale":"deterministic score 0.49 from registry signals: · indexed on github topic:agent-skills · 78 github stars · SKILL.md body (6,280 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-02T12:55:21.883Z","embedding":null,"createdAt":"2026-04-18T22:12:52.731Z","updatedAt":"2026-05-02T12:55:21.883Z","lastSeenAt":"2026-05-02T12:55:21.883Z","tsv":"'+1':686 '-7':512,897 '0.1':887 '1':244,336,394,408,557,612,799 '10':904 '1282':850 '2':281,392,452,592,597,630,814 '22.5':853 '3':511,550,687,825,896 '4':595,728,832 '5':764,792 '5x':748 '7':437,869 'access':103 'across':70,251,963 'action':835 'activ':40,123,126,182,187 'advisor':217 'affect':33,778,831 'airflow':71,252,421,663,964 'airflow/dbt':302 'alway':864 'analysi':114,209,309,700 'approach':28,107 'ask':44,130,149,172,918,931 'asset':322,769 'avail':233,332,381 'averag':434,480 'avg':852 'avgdur':446 'baselin':867 'bottleneck':80,170,614,629 'bridg':31,266,551,611 'broaden':536 'call':376,412,456,514,565,615,633,672,692,732,767,899 'carlo':3,18,47,61,237 'caus':39,86,308,603,699,727,816,822,895 'chang':293,299,632,635,651,657,677,758,818 'come':973 'compar':164,741,865 'comput':161,721 'confid':888 'connect':234 'content':225 'context':387 'convert':272,561 'correl':675 'cost':142,203,208 'count':440,483,661 'creat':211 'cross':21,64,958 'cross-platform':20,63,957 'current':532 'data':55,67,193,221,872,883 'databrick':73,254,422,967 'day':438,679,684,870 'dbt':72,253,423,666,965 'deeper':505 'degrad':148,731 'detect':81 'determin':340 'diagnos':6,54 'diagnosi':5,49,278,598 'differ':539,546 'direct':772,784 'discov':29 'discoveri':241,368,395,407,517,608 'distinguish':875 'distribut':316,736 'doubl':682 'downstream':773,828 'drill':36,285,504,600 'durat':435 'engin':77 'error':703 'estim':860 'exact':810,845,856 'execut':494 'expens':12,140,454 'explor':223 'expos':943 'fabric':862 'fail':174,691,701 'failed/futile':311 'failur':303,441,450,665,668,725 'feed':789 'file':88,94,112,120 'filter':420,548,656,924 'find':79,139,169,248,259,372,409,453,499,620,787,794,972 'fix':837 'flag':880 'follow':228 'found':529 'futil':176,707 'general':367 'get':245,255,268,282,292,304,313,321,329,377,413,457,566,616,634,693,697,733,768 'give':584 'group':262,724 'hash':477 'help':53 'high':445,449,488,492 'human':954 'human-read':953 'id':464 'identifi':84,337,558,948 'impact':326,766,826 'import':840 'includ':431,475 'increas':762 'individu':290,493 'insert/create/merge':473 'integr':418,572,576 'intern':947 'investig':27,106,192,346,404,508,593,910 'issu':9,58,195,527,706,824,839 'job':11,134,246,250,271,273,288,352,411,414,432,552,560,569,570,580,626 'job/pipeline':348 'join':655 'latenc':14,315,317,730,735,760 'less':885 'lineag':323,770 'list':331,380 'live':89 'longer':541 'look':200,443,484,673,754 'match':383 'mcon':274,277,402,564,571,588,606,944 'mcp':100,229,239 'mean':905 'mention':144,350,359 'mix':940 'model':667 'modif':653 'monitor':212,216 'monitoring-advisor':215 'mont':2,17,46,60,236 'monte-carlo-performance-diagnosi':1 'much':808 'must':231 'name':353,433,956 'need':509,589 'negat':447 'never':858,942 'new':654 'next':90 'normal':878 'note':968 'number':811,844,863 'observ':23,66 'one':671 'option':417,462 'outlier':752 'output':716 'over-investig':908 'p50':742,747 'p95':744,746 'pattern':115,178,717 'perform':4,8,48,57,145,165,247,284,415,526,581,618 'permiss':705 'pipelin':7,56,133,153,355 'platform':22,65,429,547,959,970 'power':641 'pre':720 'pre-comput':719 'prerequisit':226 'present':496,793 'prevent':198 'problem':30,750,800 'problemat':559 'produc':713 'qualiti':194 'queri':13,76,113,136,143,177,220,258,261,298,305,312,314,455,460,466,476,486,649,676,690,694,702,708,709,734,753,914,923,926,935 'quot':842 'rang':543 'rate':442,451 'rca':306,695 're':907 'read':97,468,911,920,922,928 'readabl':955 'recommend':833 'refer':87 'references/investigation-tiers.md':108 'references/query-analysis.md':116 'regress':82,146,763,876 'relat':109,117 'resourc':101 'respons':797 'result':430,474,521,582 'return':519,644,849 'root':38,85,307,602,698,815,894 'round':859 'row':660 'rule':841 'run':439,482,711,851 'rundurationtrend7d':448,873 'runtim':265,479,481,490,681 'say':855 'schema':652 'scope':339,538 'see':738,775 'select':470 'server':240 'shift':301,659 'skill':50,52,93,125,184,199,210,218,961 'skill-performance-diagnosis' 'slow':10,132,154,257,364,375,410,459,781,804 'slow/failing':249 'slowest':260 'source-monte-carlo-data' 'specif':347,356,401,622,834 'specifi':427 'spikes/drops':662 'start':405 'step':335,391,549,594,757,791 'step-chang':756 'stop':889 'storag':202,207 'storage-cost-analysi':206 'structur':795 'sudden':761 'suggest':535 'summari':801 'support':327 'system':829 'tabl':34,224,269,276,357,361,554,563,567,587,782 'task':171,283,291,613,617,623,664 'tell':522 'text':650 'tier':26,105,243,280,393,556,591,596 'time':167,320,495,533,542 'timelin':294,296,636,647 'timeout':704 'tool':98,230,242,267,279,328,513,518,642,723,813,843,848,898 'top':256,458,498 'topic-agent-observability' 'topic-agent-skills' 'topic-ai-agents' 'topic-claude-code' 'topic-codex-skills' 'topic-cursor' 'topic-data-observability' 'topic-data-quality' 'topic-mcp' 'topic-monte-carlo' 'topic-opencode' 'topic-skill-md' 'total':264,478,489 'trace':324,765 'trend':15,318,436,740,871,882 'type':419,467,573,577,927,936 'typic':507,901 'unifi':295,646 'updat':366 'upstream':785 'upstream/downstream':325 'use':16,24,59,95,158,196,204,213,574,604,715,868,934,952 'user':43,129,190,343,349,358,369,385,426,502,524,917,951 'uuid':945 'varianc':879 'volum':300,658 'vs':743,912 'want':137,162,344,370 'warehous':75,330,333,378,382,390,463,540 'window':534 'work':69,962 'workflow':334 'write':471,913,933,937 'x':680,685","prices":[{"id":"a6859359-5d14-416d-9e05-247223d1e55c","listingId":"4d0eef44-df72-4701-a4d2-27c4ed7fd8ba","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"monte-carlo-data","category":"mc-agent-toolkit","install_from":"skills.sh"},"createdAt":"2026-04-18T22:12:52.731Z"}],"sources":[{"listingId":"4d0eef44-df72-4701-a4d2-27c4ed7fd8ba","source":"github","sourceId":"monte-carlo-data/mc-agent-toolkit/performance-diagnosis","sourceUrl":"https://github.com/monte-carlo-data/mc-agent-toolkit/tree/main/skills/performance-diagnosis","isPrimary":false,"firstSeenAt":"2026-04-18T22:12:52.731Z","lastSeenAt":"2026-05-02T12:55:21.883Z"}],"details":{"listingId":"4d0eef44-df72-4701-a4d2-27c4ed7fd8ba","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"monte-carlo-data","slug":"performance-diagnosis","github":{"repo":"monte-carlo-data/mc-agent-toolkit","stars":78,"topics":["agent-observability","agent-skills","ai-agents","claude-code","codex-skills","cursor","data-observability","data-quality","mcp","monte-carlo","opencode","skill-md","skillsmp","vscode"],"license":"apache-2.0","html_url":"https://github.com/monte-carlo-data/mc-agent-toolkit","pushed_at":"2026-04-30T23:25:43Z","description":"Official Monte Carlo toolkit for AI coding agents. Skills and plugins that bring data and agent observability — monitoring, triaging, troubleshooting, health checks  — into Claude Code, Cursor, and more.","skill_md_sha":"be5c125658eb7849f76bcd76c6415021eadc8bcc","skill_md_path":"skills/performance-diagnosis/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/monte-carlo-data/mc-agent-toolkit/tree/main/skills/performance-diagnosis"},"layout":"multi","source":"github","category":"mc-agent-toolkit","frontmatter":{"name":"monte-carlo-performance-diagnosis","description":"Diagnoses pipeline performance issues -- slow jobs, expensive queries,\nlatency trends -- using Monte Carlo's cross-platform observability.\nUses a tiered investigation approach: discover problems, bridge to\naffected tables, then drill into root causes. Activates when a user\nasks about slow pipelines, expensive queries, or performance regressions."},"skills_sh_url":"https://skills.sh/monte-carlo-data/mc-agent-toolkit/performance-diagnosis"},"updatedAt":"2026-05-02T12:55:21.883Z"}}