{"id":"b085c6e8-f891-4e2c-b36e-5d08abd46e56","shortId":"9Tu2Xn","kind":"skill","title":"prom-query","tagline":"Prometheus Metrics Query & Alert Interpreter — query metrics, interpret timeseries, triage alerts","description":"# prom-query — Prometheus Metrics Query & Alert Interpreter\n\nYou have access to a Prometheus-compatible metrics server. Use this skill to query metrics, check alerts, inspect targets, and explore available metrics. You can query **Prometheus, Thanos, Mimir, and VictoriaMetrics** — they all share the same HTTP API.\n\n## Commands\n\n| Command | Purpose | Example |\n|---------|---------|---------|\n| `query <promql>` | Instant query (current value) | `prom-query query 'up'` |\n| `range <promql> [--start=] [--end=] [--step=]` | Range query (timeseries over time) | `prom-query range 'rate(http_requests_total[5m])' --start=-1h --step=1m` |\n| `alerts [--state=firing\\|pending\\|inactive]` | List active alerts | `prom-query alerts --state=firing` |\n| `targets [--state=active\\|dropped\\|any]` | Scrape target health | `prom-query targets` |\n| `explore [pattern]` | Search available metrics by name pattern | `prom-query explore 'http_request'` |\n| `rules [--type=alert\\|record]` | Alerting & recording rules | `prom-query rules --type=alert` |\n\n## How to Translate Natural Language to PromQL\n\nWhen the user asks a question about their system, translate it to PromQL using these patterns:\n\n### Error Rate\n```\n# \"What's the error rate for the API?\"\nrate(http_requests_total{code=~\"5..\"}[5m]) / rate(http_requests_total[5m])\n\n# \"Error rate for the payments service\"\nrate(http_requests_total{service=\"payments\", code=~\"5..\"}[5m])\n\n# \"4xx and 5xx errors per second\"\nsum(rate(http_requests_total{code=~\"[45]..\"}[5m])) by (code)\n```\n\n### Latency (Histograms)\n```\n# \"P99 latency\"\nhistogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))\n\n# \"P50 latency by service\"\nhistogram_quantile(0.50, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))\n\n# \"Average request duration\"\nrate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m])\n```\n\n### CPU Usage\n```\n# \"CPU usage per instance\"\n100 - (avg by (instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)\n\n# \"CPU usage per pod (Kubernetes)\"\nsum(rate(container_cpu_usage_seconds_total{container!=\"\"}[5m])) by (pod, namespace)\n\n# \"Which pods use the most CPU?\"\ntopk(10, sum(rate(container_cpu_usage_seconds_total{container!=\"\"}[5m])) by (pod, namespace))\n```\n\n### Memory\n```\n# \"Memory usage percentage per instance\"\n(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100\n\n# \"Memory usage per pod (Kubernetes)\"\nsum(container_memory_working_set_bytes{container!=\"\"}) by (pod, namespace)\n\n# \"Pods using more than 1GB RAM\"\nsum(container_memory_working_set_bytes{container!=\"\"}) by (pod, namespace) > 1e9\n```\n\n### Disk\n```\n# \"Disk usage percentage\"\n(1 - (node_filesystem_avail_bytes{mountpoint=\"/\"} / node_filesystem_size_bytes{mountpoint=\"/\"})) * 100\n\n# \"Disk will be full in 4 hours?\" (linear prediction)\npredict_linear(node_filesystem_avail_bytes{mountpoint=\"/\"}[1h], 4*3600) < 0\n```\n\n### Network\n```\n# \"Network traffic in/out per interface\"\nrate(node_network_receive_bytes_total[5m])\nrate(node_network_transmit_bytes_total[5m])\n```\n\n### Kubernetes-Specific\n```\n# \"How many pods are not ready?\"\nsum(kube_pod_status_ready{condition=\"false\"}) by (namespace)\n\n# \"Pods in CrashLoopBackOff\"\nkube_pod_container_status_waiting_reason{reason=\"CrashLoopBackOff\"}\n\n# \"Deployment replica mismatch\"\nkube_deployment_spec_replicas != kube_deployment_status_available_replicas\n\n# \"Node conditions\"\nkube_node_status_condition{condition=\"Ready\", status=\"true\"} == 0\n```\n\n### General Patterns\n```\n# \"Show me everything about <service>\"\n# First, explore what metrics exist:\nprom-query explore '<service_name>'\n\n# \"Is everything up?\"\nprom-query query 'up'\n\n# \"What changed in the last hour?\"\n# Use range query with the relevant metric and look for step changes:\nprom-query range '<metric>' --start=-1h --step=1m\n\n# Rate of any counter:\nrate(<counter_metric>[5m])\n\n# Sum across labels:\nsum(<metric>) by (<label>)\n\n# Top N:\ntopk(10, <metric>)\n```\n\n## How to Interpret Timeseries Data\n\nWhen you get range query results, look for:\n\n1. **Trends:** Is the value going up, down, or flat over time? Compare first vs last values.\n2. **Spikes:** Look at min/max vs average. A large gap suggests spikes or dips.\n3. **Step changes:** Did the value suddenly jump to a new baseline? (deployment, config change)\n4. **Periodicity:** Does the pattern repeat? (daily traffic patterns, cron jobs)\n5. **Correlation:** If querying multiple metrics, do changes happen at the same timestamps?\n\n### Reading the Summary Fields\n\nRange query results include automatic summaries for each series:\n- `min` / `max` / `avg`: Statistical summary of all values\n- `first` / `last`: Start and end values (shows trend direction)\n- `pointCount`: Number of data points\n- `downsampled`: Whether the step was automatically increased to limit data volume\n\n### Smart Context Management\n\nThe script automatically downsamples range queries that would return more than 500 data points. When `downsampled: true`, tell the user the step was adjusted and offer to zoom into a narrower time window for full resolution.\n\n## Incident Triage Workflow\n\nWhen helping with an incident or investigating a problem:\n\n1. **Start with alerts:** `prom-query alerts --state=firing` — see what's actually firing\n2. **Check targets:** `prom-query targets` — are any scrape targets down?\n3. **Query the specific metric** mentioned in the alert\n4. **Range query** to see the trend leading up to the alert\n5. **Explore related metrics** to find correlation\n6. **Check rules** to understand alert thresholds\n\n## Alert Interpretation\n\nWhen presenting alerts to the user:\n- Group by severity (critical → warning → info)\n- Highlight how long each alert has been firing (from `activeAt`)\n- Include the summary/description annotation\n- If the alert has a `value`, explain what it means in context\n- Suggest next steps: which metric to query for more detail\n\n## Discord v2 Delivery Mode (OpenClaw v2026.2.14+)\n\nWhen running in a Discord channel:\n\n- Send a compact first summary (firing alerts, top impacted service, suggested next query).\n- Keep the first message under ~1200 characters and avoid wide tables initially.\n- If Discord components are available, include quick actions:\n  - `Show Last 1h Trend`\n  - `List Firing Alerts`\n  - `Explore Related Metrics`\n- If components are unavailable, provide the same options as a numbered list.\n- For long timeseries explanations, send short chunks (<=15 lines per message).\n\n## Important Notes\n\n- All operations are **read-only**. This skill never modifies Prometheus data, rules, or configuration.\n- Large result sets are automatically limited and summarized.\n- The `explore` command uses regex pattern matching (case-insensitive).\n- Time arguments accept: relative (`-1h`, `-30m`, `-2d`), epoch timestamps, or ISO8601 dates.\n- If PROMETHEUS_TOKEN is set, it's sent as a Bearer token. Never include tokens in your responses.\n\n## Error Handling\n\nIf a query fails:\n- **\"Cannot reach Prometheus\"** → Check PROMETHEUS_URL and network connectivity\n- **PromQL parse error** → The query syntax is wrong. Fix and retry.\n- **\"no data\"** → The metric may not exist, or the label selector is too specific. Try `explore` to find the right metric name.\n- **Timeout** → The query is too expensive. Add filters, reduce the time range, or use `topk()`.\n\nPowered by Anvil AI 📊","tags":["prom","query","cacheforge","skills","cacheforge-ai","agent-skills","ai-agents","clawhub","devops","discord-v2","kubernetes","openclaw"],"capabilities":["skill","source-cacheforge-ai","skill-prom-query","topic-agent-skills","topic-ai-agents","topic-cacheforge","topic-clawhub","topic-devops","topic-discord-v2","topic-kubernetes","topic-openclaw","topic-prometheus"],"categories":["cacheforge-skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/cacheforge-ai/cacheforge-skills/prom-query","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add cacheforge-ai/cacheforge-skills","source_repo":"https://github.com/cacheforge-ai/cacheforge-skills","install_from":"skills.sh"}},"qualityScore":"0.454","qualityRationale":"deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (7,348 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T19:09:04.733Z","embedding":null,"createdAt":"2026-05-18T13:14:38.806Z","updatedAt":"2026-05-18T19:09:04.733Z","lastSeenAt":"2026-05-18T19:09:04.733Z","tsv":"'-1':95,538,966 '-2':970 '-30':968 '0':419,491 '0.50':251 '0.99':234 '1':342,388,570,737 '10':323,556 '100':286,298,351,399 '1200':879 '15':923 '1e9':383 '1gb':371 '1h':416,896 '1m':98,541 '2':587,752 '3':601,764 '3600':418 '4':405,417,616,773 '45':224 '4xx':212 '5':190,210,627,785 '500':700 '5m':93,191,196,211,225,242,259,272,279,297,312,332,432,439,547 '5xx':214 '6':792 'accept':964 'access':25 'across':549 'action':893 'activ':105,115 'activeat':822 'actual':750 'add':1049 'adjust':712 'ai':1061 'alert':7,14,21,40,99,106,110,141,143,151,740,744,772,784,797,799,803,817,829,867,900 'annot':826 'anvil':1060 'api':61,184 'argument':963 'ask':162 'automat':648,680,691,948 'avail':45,128,391,413,479,890 'averag':263,593 'avg':287,655 'avoid':882 'baselin':612 'bearer':987 'bucket':241,258 'byte':346,350,362,378,392,397,414,430,437 'cannot':1001 'case':960 'case-insensit':959 'chang':516,532,603,615,634 'channel':860 'charact':880 'check':39,753,793,1004 'chunk':922 'code':189,209,223,227 'command':62,63,954 'compact':863 'compar':582 'compat':30 'compon':888,905 'condit':454,482,486,487 'config':614 'configur':943 'connect':1009 'contain':306,311,326,331,358,363,374,379,463 'context':687,838 'correl':628,791 'count':278 'counter':545 'cpu':280,282,292,299,307,321,327 'crashloopbackoff':460,468 'critic':810 'cron':625 'current':69 'd':971 'daili':622 'data':561,673,684,701,940,1022 'date':976 'deliveri':851 'deploy':469,473,477,613 'detail':848 'dip':600 'direct':669 'discord':849,859,887 'disk':384,385,400 'downsampl':675,692,704 'drop':116 'durat':239,256,265,269,276 'end':78,665 'epoch':972 'error':175,180,197,215,995,1012 'everyth':496,508 'exampl':65 'exist':502,1027 'expens':1048 'explain':833 'explan':919 'explor':44,125,136,499,506,786,901,953,1036 'fail':1000 'fals':455 'field':643 'filesystem':390,395,412 'filter':1050 'find':790,1038 'fire':101,112,746,751,820,866,899 'first':498,583,661,864,876 'fix':1018 'flat':579 'full':403,723 'gap':596 'general':492 'get':564 'go':575 'group':807 'h':96,539,967 'handl':996 'happen':635 'health':120 'help':729 'highlight':813 'histogram':229,232,249 'hour':406,520 'http':60,90,137,186,193,204,220,237,254,267,274 'idl':296 'impact':869 'import':927 'in/out':423 'inact':103 'incid':725,732 'includ':647,823,891,990 'increas':681 'info':812 'initi':885 'insensit':961 'inspect':41 'instanc':285,289,341 'instant':67 'interfac':425 'interpret':8,11,22,559,800 'investig':734 'iso8601':975 'job':626 'jump':608 'keep':874 'kube':450,461,472,476,483 'kubernet':303,356,441 'kubernetes-specif':440 'label':550,1030 'languag':156 'larg':595,944 'last':519,585,662,895 'latenc':228,231,246 'le':244,261 'lead':780 'limit':683,949 'line':924 'linear':407,410 'list':104,898,915 'long':815,917 'look':529,568,589 'm':969 'manag':688 'mani':444 'match':958 'max':654 'may':1025 'mean':836 'memavail':345 'memori':336,337,344,348,352,359,375 'memtot':349 'mention':769 'messag':877,926 'metric':5,10,19,31,38,46,129,501,527,632,768,788,843,903,1024,1041 'mimir':52 'min':653 'min/max':591 'mismatch':471 'mode':295,852 'modifi':938 'mountpoint':393,398,415 'multipl':631 'n':554 'name':131,1042 'namespac':315,335,366,382,457 'narrow':719 'natur':155 'network':420,421,428,435,1008 'never':937,989 'new':611 'next':840,872 'node':291,343,347,389,394,411,427,434,481,484 'note':928 'number':671,914 'offer':714 'openclaw':853 'oper':930 'option':911 'p50':245 'p99':230 'pars':1011 'pattern':126,132,174,493,620,624,957 'payment':201,208 'pend':102 'per':216,284,301,340,354,424,925 'percentag':339,387 'period':617 'pod':302,314,317,334,355,365,367,381,445,451,458,462 'point':674,702 'pointcount':670 'power':1058 'predict':408,409 'present':802 'problem':736 'prom':2,16,72,86,108,122,134,147,504,511,534,742,756 'prom-queri':1,15,71,85,107,121,133,146,503,510,533,741,755 'prometheus':4,18,29,50,939,978,1003,1005 'prometheus-compat':28 'promql':158,171,1010 'provid':908 'purpos':64 'quantil':233,250 'queri':3,6,9,17,20,37,49,66,68,73,74,81,87,109,123,135,148,505,512,513,523,535,566,630,645,694,743,757,765,775,845,873,999,1014,1045 'question':164 'quick':892 'ram':372 'rang':76,80,88,522,536,565,644,693,774,1054 'rate':89,176,181,185,192,198,203,219,236,253,266,273,290,305,325,426,433,542,546 'reach':1002 'read':640,933 'read-on':932 'readi':448,453,488 'reason':466,467 'receiv':429 'record':142,144 'reduc':1051 'regex':956 'relat':787,902,965 'relev':526 'repeat':621 'replica':470,475,480 'request':91,138,187,194,205,221,238,255,264,268,275 'resolut':724 'respons':994 'result':567,646,945 'retri':1020 'return':697 'right':1040 'rule':139,145,149,794,941 'run':856 'scrape':118,761 'script':690 'search':127 'second':217,240,257,270,277,293,309,329 'see':747,777 'selector':1031 'send':861,920 'sent':984 'seri':652 'server':32 'servic':202,207,248,262,870 'set':361,377,946,981 'sever':809 'share':57 'short':921 'show':494,667,894 'size':396 'skill':35,936 'skill-prom-query' 'smart':686 'source-cacheforge-ai' 'spec':474 'specif':442,767,1034 'spike':588,598 'start':77,94,537,663,738 'state':100,111,114,745 'statist':656 'status':452,464,478,485,489 'step':79,97,531,540,602,678,710,841 'sudden':607 'suggest':597,839,871 'sum':218,235,252,271,304,324,357,373,449,548,551 'summar':951 'summari':642,649,657,865 'summary/description':825 'syntax':1015 'system':167 'tabl':884 'target':42,113,119,124,754,758,762 'tell':706 'thano':51 'threshold':798 'time':84,581,720,962,1053 'timeout':1043 'timeseri':12,82,560,918 'timestamp':639,973 'token':979,988,991 'top':553,868 'topic-agent-skills' 'topic-ai-agents' 'topic-cacheforge' 'topic-clawhub' 'topic-devops' 'topic-discord-v2' 'topic-kubernetes' 'topic-openclaw' 'topic-prometheus' 'topk':322,555,1057 'total':92,188,195,206,222,294,310,330,431,438 'traffic':422,623 'translat':154,168 'transmit':436 'trend':571,668,779,897 'tri':1035 'triag':13,726 'true':490,705 'type':140,150 'unavail':907 'understand':796 'url':1006 'usag':281,283,300,308,328,338,353,386 'use':33,172,318,368,521,955,1056 'user':161,708,806 'v2':850 'v2026.2.14':854 'valu':70,574,586,606,660,666,832 'victoriametr':54 'volum':685 'vs':584,592 'wait':465 'warn':811 'whether':676 'wide':883 'window':721 'work':360,376 'workflow':727 'would':696 'wrong':1017 'zoom':716","prices":[{"id":"a49b9945-f77a-4c07-b44c-1e2df2b548b5","listingId":"b085c6e8-f891-4e2c-b36e-5d08abd46e56","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"cacheforge-ai","category":"cacheforge-skills","install_from":"skills.sh"},"createdAt":"2026-05-18T13:14:38.806Z"}],"sources":[{"listingId":"b085c6e8-f891-4e2c-b36e-5d08abd46e56","source":"github","sourceId":"cacheforge-ai/cacheforge-skills/prom-query","sourceUrl":"https://github.com/cacheforge-ai/cacheforge-skills/tree/main/skills/prom-query","isPrimary":false,"firstSeenAt":"2026-05-18T13:14:38.806Z","lastSeenAt":"2026-05-18T19:09:04.733Z"}],"details":{"listingId":"b085c6e8-f891-4e2c-b36e-5d08abd46e56","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"cacheforge-ai","slug":"prom-query","github":{"repo":"cacheforge-ai/cacheforge-skills","stars":8,"topics":["agent-skills","ai-agents","cacheforge","clawhub","devops","discord-v2","kubernetes","openclaw","prometheus"],"license":"mit","html_url":"https://github.com/cacheforge-ai/cacheforge-skills","pushed_at":"2026-02-22T20:49:48Z","description":"⚡ SOTA agent skills for OpenClaw — observability, security, code quality, incident response, and more. Built by Anvil AI.","skill_md_sha":"41e2f397c5d96a819a88753ccb3673a1e30d0578","skill_md_path":"skills/prom-query/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/cacheforge-ai/cacheforge-skills/tree/main/skills/prom-query"},"layout":"multi","source":"github","category":"cacheforge-skills","frontmatter":{"name":"prom-query","license":"MIT","description":"Prometheus Metrics Query & Alert Interpreter — query metrics, interpret timeseries, triage alerts"},"skills_sh_url":"https://skills.sh/cacheforge-ai/cacheforge-skills/prom-query"},"updatedAt":"2026-05-18T19:09:04.733Z"}}