{"id":"2a0d6787-8af1-453d-90b0-e360d0686634","shortId":"kqhJBA","kind":"skill","title":"monitoring-and-alerting","tagline":"Design and run a monitoring system for a website or web app. Use this skill when setting up uptime checks, defining SLOs, configuring error tracking, choosing what to alert on, designing on-call rotations, or fixing alert fatigue. Triggers on monitoring, alerts, uptime, SLO, SLA,","description":"# Monitoring and Alerting\n\nDecide what to watch, what to alert on, and how to make sure the right person finds out when things break.\n\n---\n\n## When to use\n\n- Setting up monitoring on a new site or service\n- Defining SLOs (service level objectives) and error budgets\n- Choosing which alerts page someone vs which go to a quiet channel\n- Designing or fixing on-call rotation\n- Diagnosing alert fatigue\n- Filling monitoring gaps revealed by an incident\n- Migrating monitoring vendors\n\n## When NOT to use\n\n- Responding to an active incident (use `incident-response`)\n- Writing the post-mortem (use `after-action-report`)\n- Designing analytics dashboards for product metrics (use `analytics-strategy`)\n- Performance optimization itself (use `performance-optimization`)\n\n---\n\n## Required inputs\n\n- The system you're monitoring (URLs, services, dependencies)\n- Existing monitoring tools (uptime, errors, logs, APM)\n- Business hours and team timezone(s)\n- Who is on-call or available for incidents\n- Existing SLOs or success metrics, if any\n\n---\n\n## The framework: 4 layers\n\nMonitoring works in layers. Skip a layer and you'll miss a class of problems.\n\n### Layer 1: Availability\n\nIs the site up? The simplest, most important layer.\n\n- HTTP checks from multiple regions (every 1-5 minutes)\n- DNS resolution checks\n- Certificate expiration checks\n- Status code checks (alert on 5xx, not just timeout)\n\nThreshold: any sustained downtime (more than 2 consecutive failed checks) pages.\n\n### Layer 2: Correctness\n\nThe site is up, but is it serving the right thing?\n\n- Synthetic checks (a script that loads the homepage, clicks a button, validates expected text)\n- Critical user journeys (signup, checkout, search)\n- Content presence checks (homepage hasn't gone blank)\n- API contract checks (response shape and key fields are present)\n\nThreshold: failures of critical-path synthetics page. Non-critical page-level synthetics alert during business hours only.\n\n### Layer 3: Performance\n\nThe site is up and correct, but is it fast enough?\n\n- Core Web Vitals (LCP, INP, CLS) from real users (RUM)\n- Synthetic performance (Lighthouse, WebPageTest, custom)\n- API response times (p50, p95, p99)\n- Database query times for slow queries\n- Dependency response times (third-party APIs)\n\nThreshold: regressions from baseline (e.g., p95 doubled in 5 minutes). Don't alert on absolute thresholds without baselines.\n\n### Layer 4: Errors and anomalies\n\nThe site is up, correct, and fast for most, but errors are happening.\n\n- Error rate (% of requests returning 5xx)\n- Client-side error rate (uncaught JS exceptions)\n- Log error volume (unexpected spikes)\n- Anomaly detection (traffic falling off a cliff)\n- Background job failures\n- Queue depth\n\nThreshold: rate-based, not count-based. \"Error rate above 1% for 5 minutes\" beats \"more than 100 errors per minute.\"\n\n---\n\n## SLOs and error budgets\n\nA Service Level Objective is the target for reliability. Common form: \"99.9% of homepage requests succeed in under 2 seconds, measured over 30 days.\"\n\nThe components:\n- **The thing you're measuring** (homepage requests)\n- **The success criterion** (returns 2xx in under 2 seconds)\n- **The target** (99.9% of them)\n- **The window** (over 30 days)\n\nThe error budget is the inverse: 0.1% of requests can fail. If you've used the whole budget, slow down on risky changes.\n\n### Picking SLOs\n\nDon't aim for 100%. Don't aim for \"five nines\" (99.999%) unless you really need it. Each nine costs an order of magnitude more.\n\n| SLO | Allowed downtime per month |\n|---|---|\n| 99% | 7 hours, 18 minutes |\n| 99.9% | 43 minutes |\n| 99.95% | 21 minutes |\n| 99.99% | 4 minutes, 22 seconds |\n| 99.999% | 26 seconds |\n\nFor most marketing sites, 99.9% is plenty. For SaaS, 99.95% is reasonable. Anything higher needs significant infrastructure investment.\n\n### Using error budgets\n\nWhen the budget is healthy, ship aggressively. When the budget is half-spent, slow down. When the budget is exhausted, freeze risky changes until reliability recovers.\n\nThis is what makes SLOs useful: they create a feedback loop between reliability and velocity.\n\n---\n\n## Workflow\n\n### Step 1: Inventory what's already monitored\n\nWhat tools are in place? What checks exist? What dashboards? What alerts?\n\nMany teams have a tangle of half-configured tools. The first job is the inventory.\n\n### Step 2: Map the system\n\nDraw the architecture. Front-end, back-end, database, third-party APIs, queues, workers. Each box is a candidate for monitoring.\n\nFor each box, ask:\n- What does \"up\" mean?\n- What does \"correct\" mean?\n- What does \"fast\" mean?\n- What's the most common failure mode?\n\n### Step 3: Define the SLOs\n\nPick 3-5 SLOs. They should be:\n- Tied to user-visible behavior (not internal metrics)\n- Achievable with current infrastructure\n- Measured automatically\n- Reviewed at least quarterly\n\n### Step 4: Set up checks across the 4 layers\n\nFor each box, configure checks at each layer. Some boxes won't have all four; that's fine.\n\n| Box | Availability | Correctness | Performance | Errors |\n|---|---|---|---|---|\n| Homepage | HTTP check | Synthetic | LCP/INP | JS errors |\n| Login API | HTTP check | Synthetic flow | p95 latency | 5xx rate |\n\n### Step 5: Decide what pages and what doesn't\n\nThree tiers:\n\n1. **Page (wakes someone up):** site down, critical flow broken, error rate spike, security incident.\n2. **Notify (during business hours):** non-critical synthetic failure, performance regression, slow query, dependency degradation.\n3. **Log (no notification):** anomalies for later review, low-priority warnings, info-level events.\n\nAnything in tier 1 must be:\n- Actionable (the on-call can do something about it)\n- Important (it represents real impact)\n- Rare (less than 1-2 per week is the goal)\n\nIf tier 1 alerts fire frequently, alert fatigue sets in. People stop responding.\n\n### Step 6: Configure routing\n\nWhere do alerts go?\n\n- Tier 1: paging system (e.g., PagerDuty, Opsgenie). Direct to on-call.\n- Tier 2: chat channel (Slack, Teams). Tagged with the area.\n- Tier 3: dashboard or log only.\n\nEach tier should have a documented escalation path. If the on-call doesn't ack within 5-15 minutes, escalate.\n\n### Step 7: Build dashboards\n\nOne dashboard per audience:\n\n- **Real-time ops dashboard:** current health, recent alerts, error rates, throughput\n- **SLO dashboard:** SLO status and error budget consumption\n- **Per-service dashboards:** detail for individual services or pages\n- **Executive dashboard:** uptime over weeks/months, key business metrics\n\nDashboards are different from alerts. Alerts say \"look now.\" Dashboards say \"here's what's happening.\"\n\n### Step 8: Run an alert audit\n\nEvery quarter, audit:\n- Which alerts fired? Were they actionable?\n- Which alerts didn't fire when they should have?\n- Are any alerts noisy (more than once a week, low actionability)?\n- Are runbooks up to date?\n- Have SLOs been met? Any consistently breached?\n\nTune the system. Monitoring drifts without active maintenance.\n\n---\n\n## Failure patterns\n\n**Alert on cause, not symptom.** \"CPU is high\" is a cause. \"Users are slow\" is a symptom. Alert on symptoms; investigate causes.\n\n**Alert without a runbook.** If the on-call doesn't know what to do, the alert is useless. Every paging alert needs a runbook (even a one-line one).\n\n**No baselines for \"normal.\"** Alerting on \"more than 100 errors per minute\" sounds reasonable but a busy day might exceed that without anything being wrong. Use rate-based and anomaly-based alerts.\n\n**Single-region monitoring.** Your monitoring service in the same region as your site means you'll miss regional outages and you'll get woken up when monitoring itself has issues.\n\n**Monitoring the monitoring.** Or rather, not. If your alerting platform is down, who tells you? Most paging services offer their own status feeds. Subscribe.\n\n**Too many tiers of severity.** P0/P1/P2/P3/P4 with different SLAs becomes a sorting exercise. Three tiers (page, notify, log) is plenty.\n\n**Synthetics that don't match reality.** A synthetic that hits the homepage every minute tests \"is the homepage up.\" It doesn't test \"is the actual user flow working.\" Build synthetics for the journeys that matter.\n\n**Static thresholds that never get tuned.** Traffic grows, behavior changes, thresholds set last year are wrong. Review thresholds quarterly.\n\n**On-call rotation with no handoffs.** Each new on-call has to figure out the system. Document. Run weekly handoff meetings or async updates.\n\n**Pager fatigue.** If on-call is paged more than once or twice a week, something is wrong. Audit the alerts. Reduce, tune, or fix the underlying issues.\n\n---\n\n## Output format\n\nA monitoring plan includes:\n\n- **System map:** what's being monitored\n- **SLOs:** the 3-5 reliability targets\n- **Checks per layer:** availability, correctness, performance, errors\n- **Alert tiering:** what pages, what notifies, what logs\n- **Routing:** where alerts go, escalation paths\n- **Dashboards:** what audiences see\n- **Runbooks:** linked from each paging alert\n- **Audit cadence:** when this gets reviewed\n\n---\n\n## Reference files\n\n- [`references/slo-design-guide.md`](references/slo-design-guide.md): Detailed walkthrough of writing SLOs, error budget policies, and common SLO mistakes for web services.","tags":["monitoring","and","alerting","claude","skills","rampstackco","agent-skills","anthropic","awesome-claude-code","awesome-claude-prompts","awesome-claude-skills","claude-code"],"capabilities":["skill","source-rampstackco","skill-monitoring-and-alerting","topic-agent-skills","topic-anthropic","topic-awesome-claude-code","topic-awesome-claude-prompts","topic-awesome-claude-skills","topic-claude","topic-claude-code","topic-claude-skills","topic-good-first-issue","topic-mcp","topic-product-management","topic-seo"],"categories":["claude-skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/rampstackco/claude-skills/monitoring-and-alerting","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add rampstackco/claude-skills","source_repo":"https://github.com/rampstackco/claude-skills","install_from":"skills.sh"}},"qualityScore":"0.540","qualityRationale":"deterministic score 0.54 from registry signals: · indexed on github topic:agent-skills · 181 github stars · SKILL.md body (9,241 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T18:55:18.053Z","embedding":null,"createdAt":"2026-04-30T01:01:29.331Z","updatedAt":"2026-05-18T18:55:18.053Z","lastSeenAt":"2026-05-18T18:55:18.053Z","tsv":"'-15':997 '-2':924 '-5':244,768,1405 '0.1':543 '1':226,243,470,676,852,902,923,932,952 '100':477,566,1180 '18':595 '2':267,273,503,525,711,867,964 '21':601 '22':606 '26':609 '2xx':522 '3':345,762,767,883,974,1404 '30':507,535 '4':208,411,604,793,799 '43':598 '5':400,472,842,996 '5xx':257,433,839 '6':944 '7':593,1001 '8':1063 '99':592 '99.9':496,529,597,615 '99.95':600,620 '99.99':603 '99.999':573,608 'absolut':406 'achiev':782 'ack':994 'across':797 'action':148,905,1076,1096 'activ':134,1115 'actual':1306 'after-action-report':146 'aggress':638 'aim':564,569 'alert':4,33,42,47,53,60,97,115,255,339,404,693,933,936,949,1016,1050,1051,1066,1072,1078,1088,1119,1136,1141,1157,1162,1176,1205,1245,1382,1415,1425,1438 'allow':588 'alreadi':680 'analyt':151,158 'analytics-strategi':157 'anomali':414,447,887,1203 'anomaly-bas':1202 'anyth':623,899,1194 'api':314,373,391,728,832 'apm':183 'app':16 'architectur':717 'area':972 'ask':741 'async':1360 'audienc':1007,1431 'audit':1067,1070,1380,1439 'automat':787 'avail':196,227,820,1411 'back':722 'back-end':721 'background':454 'base':462,466,1200,1204 'baselin':395,409,1173 'beat':474 'becom':1270 'behavior':778,1325 'blank':313 'box':732,740,803,810,819 'breach':1108 'break':74 'broken':861 'budget':94,484,539,554,631,634,641,650,1026,1455 'build':1002,1310 'busi':184,341,870,1044,1188 'button':296 'cadenc':1440 'call':38,112,194,909,962,991,1149,1338,1347,1367 'candid':735 'caus':1121,1129,1140 'certif':249 'chang':559,655,1326 'channel':106,966 'chat':965 'check':24,238,248,251,254,270,287,308,316,688,796,805,826,834,1408 'checkout':304 'choos':30,95 'class':222 'click':294 'client':435 'client-sid':434 'cliff':453 'cls':363 'code':253 'common':494,758,1458 'compon':510 'configur':27,702,804,945 'consecut':268 'consist':1107 'consumpt':1027 'content':306 'contract':315 'core':358 'correct':274,352,419,748,821,1412 'cost':581 'count':465 'count-bas':464 'cpu':1124 'creat':666 'criterion':520 'critic':300,328,334,859,874 'critical-path':327 'current':784,1013 'custom':372 'dashboard':152,691,975,1003,1005,1012,1021,1031,1039,1046,1055,1429 'databas':379,724 'date':1101 'day':508,536,1189 'decid':54,843 'defin':25,87,763 'degrad':882 'depend':176,385,881 'depth':458 'design':5,35,107,150 'detail':1032,1449 'detect':448 'diagnos':114 'didn':1079 'differ':1048,1268 'direct':958 'dns':246 'document':984,1354 'doesn':848,992,1150,1301 'doubl':398 'downtim':264,589 'draw':715 'drift':1113 'e.g':396,955 'end':720,723 'enough':357 'error':28,93,181,412,425,428,437,443,467,478,483,538,630,823,830,862,1017,1025,1181,1414,1454 'escal':985,999,1427 'even':1166 'event':898 'everi':242,1068,1160,1293 'exceed':1191 'except':441 'execut':1038 'exercis':1273 'exhaust':652 'exist':177,199,689 'expect':298 'expir':250 'fail':269,547 'failur':325,456,759,876,1117 'fall':450 'fast':356,421,752 'fatigu':43,116,937,1363 'feed':1259 'feedback':668 'field':321 'figur':1350 'file':1446 'fill':117 'find':70 'fine':818 'fire':934,1073,1081 'first':705 'five':571 'fix':41,109,1386 'flow':836,860,1308 'form':495 'format':1391 'four':815 'framework':207 'freez':653 'frequent':935 'front':719 'front-end':718 'gap':119 'get':1229,1321,1443 'go':102,950,1426 'goal':929 'gone':312 'grow':1324 'half':644,701 'half-configur':700 'half-spent':643 'handoff':1342,1357 'happen':427,1061 'hasn':310 'health':1014 'healthi':636 'high':1126 'higher':624 'hit':1290 'homepag':293,309,498,516,824,1292,1298 'hour':185,342,594,871 'http':237,825,833 'impact':919 'import':235,915 'incid':123,135,138,198,866 'incident-respons':137 'includ':1395 'individu':1034 'info':896 'info-level':895 'infrastructur':627,785 'inp':362 'input':168 'intern':780 'inventori':677,709 'invers':542 'invest':628 'investig':1139 'issu':1236,1389 'job':455,706 'journey':302,1314 'js':440,829 'key':320,1043 'know':1152 'last':1329 'latenc':838 'later':889 'layer':209,213,216,225,236,272,344,410,800,808,1410 'lcp':361 'lcp/inp':828 'least':790 'less':921 'level':90,337,487,897 'lighthous':370 'line':1170 'link':1434 'll':219,1222,1228 'load':291 'log':182,442,884,977,1278,1422 'login':831 'look':1053 'loop':669 'low':892,1095 'low-prior':891 'magnitud':585 'mainten':1116 'make':65,662 'mani':694,1262 'map':712,1397 'market':613 'match':1285 'matter':1316 'mean':745,749,753,1220 'measur':505,515,786 'meet':1358 'met':1105 'metric':155,203,781,1045 'might':1190 'migrat':124 'minut':245,401,473,480,596,599,602,605,998,1183,1294 'miss':220,1223 'mistak':1460 'mode':760 'monitor':2,9,46,51,80,118,125,173,178,210,681,737,1112,1209,1211,1233,1237,1239,1393,1401 'monitoring-and-alert':1 'month':591 'mortem':144 'multipl':240 'must':903 'need':577,625,1163 'never':1320 'new':83,1344 'nine':572,580 'noisi':1089 'non':333,873 'non-crit':332,872 'normal':1175 'notif':886 'notifi':868,1277,1420 'object':91,488 'offer':1255 'on-cal':36,110,192,907,960,989,1147,1336,1345,1365 'one':1004,1169,1171 'one-lin':1168 'op':1011 'opsgeni':957 'optim':161,166 'order':583 'outag':1225 'output':1390 'p0/p1/p2/p3/p4':1266 'p50':376 'p95':377,397,837 'p99':378 'page':98,271,331,336,845,853,953,1037,1161,1253,1276,1369,1418,1437 'page-level':335 'pager':1362 'pagerduti':956 'parti':390,727 'path':329,986,1428 'pattern':1118 'peopl':940 'per':479,590,925,1006,1029,1182,1409 'per-servic':1028 'perform':160,165,346,369,822,877,1413 'performance-optim':164 'person':69 'pick':560,766 'place':686 'plan':1394 'platform':1246 'plenti':617,1280 'polici':1456 'post':143 'post-mortem':142 'presenc':307 'present':323 'prioriti':893 'problem':224 'product':154 'quarter':791,1069,1335 'queri':380,384,880 'queue':457,729 'quiet':105 'rare':920 'rate':429,438,461,468,840,863,1018,1199 'rate-bas':460,1198 'rather':1241 're':172,514 'real':365,918,1009 'real-tim':1008 'realiti':1286 'realli':576 'reason':622,1185 'recent':1015 'recov':658 'reduc':1383 'refer':1445 'references/slo-design-guide.md':1447,1448 'region':241,1208,1216,1224 'regress':393,878 'reliabl':493,657,671,1406 'report':149 'repres':917 'request':431,499,517,545 'requir':167 'resolut':247 'respond':131,942 'respons':139,317,374,386 'return':432,521 'reveal':120 'review':788,890,1333,1444 'right':68,284 'riski':558,654 'rotat':39,113,1339 'rout':946,1423 'rum':367 'run':7,1064,1355 'runbook':1098,1144,1165,1433 'saa':619 'say':1052,1056 'script':289 'search':305 'second':504,526,607,610 'secur':865 'see':1432 'serv':282 'servic':86,89,175,486,1030,1035,1212,1254,1463 'set':21,78,794,938,1328 'sever':1265 'shape':318 'ship':637 'side':436 'signific':626 'signup':303 'simplest':233 'singl':1207 'single-region':1206 'site':84,230,276,348,416,614,857,1219 'skill':19 'skill-monitoring-and-alerting' 'skip':214 'sla':50 'slack':967 'slas':1269 'slo':49,587,1020,1022,1459 'slos':26,88,200,481,561,663,765,769,1103,1402,1453 'slow':383,555,646,879,1132 'someon':99,855 'someth':912,1377 'sort':1272 'sound':1184 'source-rampstackco' 'spent':645 'spike':446,864 'static':1317 'status':252,1023,1258 'step':675,710,761,792,841,943,1000,1062 'stop':941 'strategi':159 'subscrib':1260 'succeed':500 'success':202,519 'sure':66 'sustain':263 'symptom':1123,1135,1138 'synthet':286,330,338,368,827,835,875,1281,1288,1311 'system':10,170,714,954,1111,1353,1396 'tag':969 'tangl':698 'target':491,528,1407 'team':187,695,968 'tell':1250 'test':1295,1303 'text':299 'thing':73,285,512 'third':389,726 'third-parti':388,725 'three':850,1274 'threshold':261,324,392,407,459,1318,1327,1334 'throughput':1019 'tie':773 'tier':851,901,931,951,963,973,980,1263,1275,1416 'time':375,381,387,1010 'timeout':260 'timezon':188 'tool':179,683,703 'topic-agent-skills' 'topic-anthropic' 'topic-awesome-claude-code' 'topic-awesome-claude-prompts' 'topic-awesome-claude-skills' 'topic-claude' 'topic-claude-code' 'topic-claude-skills' 'topic-good-first-issue' 'topic-mcp' 'topic-product-management' 'topic-seo' 'track':29 'traffic':449,1323 'trigger':44 'tune':1109,1322,1384 'twice':1374 'uncaught':439 'under':1388 'unexpect':445 'unless':574 'updat':1361 'uptim':23,48,180,1040 'url':174 'use':17,77,130,136,145,156,163,551,629,664,1197 'useless':1159 'user':301,366,776,1130,1307 'user-vis':775 'valid':297 've':550 'veloc':673 'vendor':126 'visibl':777 'vital':360 'volum':444 'vs':100 'wake':854 'walkthrough':1450 'warn':894 'watch':57 'web':15,359,1462 'webpagetest':371 'websit':13 'week':926,1094,1356,1376 'weeks/months':1042 'whole':553 'window':533 'within':995 'without':408,1114,1142,1193 'woken':1230 'won':811 'work':211,1309 'worker':730 'workflow':674 'write':140,1452 'wrong':1196,1332,1379 'year':1330","prices":[{"id":"0ab99299-1f90-42b7-a1d1-63c85afb6e53","listingId":"2a0d6787-8af1-453d-90b0-e360d0686634","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"rampstackco","category":"claude-skills","install_from":"skills.sh"},"createdAt":"2026-04-30T01:01:29.331Z"}],"sources":[{"listingId":"2a0d6787-8af1-453d-90b0-e360d0686634","source":"github","sourceId":"rampstackco/claude-skills/monitoring-and-alerting","sourceUrl":"https://github.com/rampstackco/claude-skills/tree/main/skills/monitoring-and-alerting","isPrimary":false,"firstSeenAt":"2026-04-30T01:01:29.331Z","lastSeenAt":"2026-05-18T18:55:18.053Z"}],"details":{"listingId":"2a0d6787-8af1-453d-90b0-e360d0686634","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"rampstackco","slug":"monitoring-and-alerting","github":{"repo":"rampstackco/claude-skills","stars":181,"topics":["agent-skills","anthropic","awesome-claude-code","awesome-claude-prompts","awesome-claude-skills","claude","claude-code","claude-skills","good-first-issue","mcp","product-management","seo","show-hn","showcase","showdev","web-design","web-development"],"license":"mit","html_url":"https://github.com/rampstackco/claude-skills","pushed_at":"2026-05-10T22:40:22Z","description":"Stack-agnostic Claude Skills covering the full website lifecycle: brand, design, content, SEO, dev, ops, growth, and research. Build, ship, audit, optimize.","skill_md_sha":"971d9631e9e41af3cbaa20b8b19643eebd21a7c0","skill_md_path":"skills/monitoring-and-alerting/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/rampstackco/claude-skills/tree/main/skills/monitoring-and-alerting"},"layout":"multi","source":"github","category":"claude-skills","frontmatter":{"name":"monitoring-and-alerting","description":"Design and run a monitoring system for a website or web app. Use this skill when setting up uptime checks, defining SLOs, configuring error tracking, choosing what to alert on, designing on-call rotations, or fixing alert fatigue. Triggers on monitoring, alerts, uptime, SLO, SLA, error rate, on-call, pager, alert fatigue, observability, dashboards, what should we monitor. Also triggers when an incident reveals a gap in monitoring."},"skills_sh_url":"https://skills.sh/rampstackco/claude-skills/monitoring-and-alerting"},"updatedAt":"2026-05-18T18:55:18.053Z"}}