{"id":"6a215dfd-824d-4496-8d35-7893222412c4","shortId":"2Cqnd8","kind":"skill","title":"prometheus","tagline":"Query Prometheus monitoring metrics and alert rules. Use when the user needs to check CPU/memory/disk utilization, service health, audit alert rules, analyze capacity trends, or mentions Prometheus, PromQL, metrics monitoring, or targets.","description":"# prometheus\n\nQuery monitoring metrics, check alerts, and verify target health via the Prometheus HTTP API. API and PromQL syntax are referenced through Context7 MCP; only environment-specific rules are documented here.\n\n## Setup\n\nConfigure your Prometheus endpoint before using this skill:\n\n| Variable | Description | Required |\n|----------|-------------|----------|\n| `PROMETHEUS_URL` | Your Prometheus server URL (e.g. `http://prometheus.internal:9090`) | Yes |\n\nCommon metric prefixes to monitor:\n- `node_*` — Node Exporter (host metrics: CPU, memory, disk, network)\n- `kube_*` — kube-state-metrics (K8s object state: deployments, pods, nodes)\n- `container_*` — cAdvisor (container resource usage)\n- `apiserver_*` — K8s API Server metrics\n- `kubelet_*` — Kubelet metrics\n- `prometheus_*` — Prometheus self-monitoring\n\nIf you have additional exporters (Kafka, Redis, custom applications), add their metric prefixes here:\n\n| Prefix | Source | Description |\n|--------|--------|-------------|\n| `kafka_*` | Kafka Exporter | Broker and consumer group metrics |\n| `fluentbit_*` | Fluent Bit | Log pipeline metrics |\n| *(add your own)* | | |\n\nAuthentication: Configure as needed for your environment (none, basic auth, or bearer token).\n\n> API endpoints and PromQL syntax can be found in the official Prometheus documentation.\n\n## Rules\n\n### Query Considerations\n\n- Confirm whether your Prometheus uses HTTP or HTTPS and configure `PROMETHEUS_URL` accordingly\n- `step` should not be smaller than the scrape interval (typically 15s-60s) to avoid invalid interpolation\n- High-cardinality labels (user_id, request_id) **must not** be used in `rate()` / `sum by()` aggregations\n- On macOS, use `date -v-1H +%s` instead of the Linux `date -d '1 hour ago' +%s`\n\n### Job Label Convention\n\nJob labels are the key to locating services. Common naming patterns:\n\n| Pattern | Example | Description |\n|---------|---------|-------------|\n| `{env}-{region}-{service}` | `prod-gateway` | Service by environment and region |\n| `kubernetes-{resource}` | `kubernetes-pods` | Standard K8s metrics |\n| `{component}-exporter` | `kafka-exporter` | Dedicated exporters |\n\n> Configure your own job naming convention here to help the agent locate services correctly.\n\n### Kafka Consumer Lag Monitoring\n\nIf you run Kafka with a Kafka Exporter, this is a common pattern:\n\n```promql\n# Aggregate consumer lag by consumergroup and topic\nsum by (consumergroup, topic) (kafka_consumergroup_lag)\n```\n\nNormal lag range depends on your workload. Sustained growth indicates consumer processing capacity issues.\n\n### Common Workflows\n\n- **Node resource investigation**: `node_cpu_seconds_total` -> `node_memory_MemAvailable_bytes` -> `node_filesystem_avail_bytes` -> locate high-load nodes\n- **Kafka health check**: `kafka_brokers` (broker count) -> `kafka_consumergroup_lag` (consumer lag) -> `kafka_topic_partition_under_replicated_partition` (under-replicated partitions)\n- **Container investigation**: `container_cpu_usage_seconds_total` -> `container_memory_working_set_bytes` -> aggregate by pod/namespace\n- **K8s cluster health**: `kube_node_status_condition` -> `kube_pod_status_phase` -> `kube_deployment_status_replicas_unavailable`\n\n## Examples\n\n### Bad\n\n```bash\n# High-cardinality label aggregation -- will cause Prometheus OOM\ncurl \"$PROMETHEUS_URL/api/v1/query?query=sum by(pod)(rate(container_cpu_usage_seconds_total[5m]))\"\n# pod label cardinality is too high (hundreds of pods); aggregate by namespace or deployment instead\n```\n\n### Good\n\n```bash\n# Check Kafka consumer lag\ncurl -s \"$PROMETHEUS_URL/api/v1/query?query=sum%20by%20(consumergroup,topic)(kafka_consumergroup_lag)\" | jq '.data.result[] | {group: .metric.consumergroup, topic: .metric.topic, lag: .value[1]}'\n\n# Check node CPU usage top 10\ncurl -s \"$PROMETHEUS_URL/api/v1/query?query=topk(10,100*(1-rate(node_cpu_seconds_total{mode=\\\"idle\\\"}[5m])))\" | jq '.data.result[] | {node: .metric.instance, cpu_pct: .value[1]}'\n\n# Disk space prediction (will it be full in 24h)\ncurl -s \"$PROMETHEUS_URL/api/v1/query?query=predict_linear(node_filesystem_avail_bytes{mountpoint=\\\"/\\\"}[24h],86400)\" | jq '.data.result[] | {instance: .metric.instance, predicted_bytes: .value[1]}'\n```","tags":["prometheus","enterprise","harness","engineering","addxai","agent-skills","ai-agent","ai-engineering","claude-code","code-review","cursor","devops"],"capabilities":["skill","source-addxai","skill-prometheus","topic-agent-skills","topic-ai-agent","topic-ai-engineering","topic-claude-code","topic-code-review","topic-cursor","topic-devops","topic-enterprise","topic-sre","topic-windsurf"],"categories":["enterprise-harness-engineering"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/addxai/enterprise-harness-engineering/prometheus","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add addxai/enterprise-harness-engineering","source_repo":"https://github.com/addxai/enterprise-harness-engineering","install_from":"skills.sh"}},"qualityScore":"0.458","qualityRationale":"deterministic score 0.46 from registry signals: · indexed on github topic:agent-skills · 16 github stars · SKILL.md body (4,207 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-04-22T01:02:12.464Z","embedding":null,"createdAt":"2026-04-21T19:04:01.947Z","updatedAt":"2026-04-22T01:02:12.464Z","lastSeenAt":"2026-04-22T01:02:12.464Z","tsv":"'1':254,504,519,535,566 '10':510,517 '100':518 '15s':217 '15s-60s':216 '1h':246 '20':490 '20by':489 '24h':544,557 '5m':461,527 '60s':218 '86400':558 'accord':205 'add':139,161 'addit':133 'agent':311 'aggreg':239,333,417,443,471 'ago':256 'alert':7,21,39 'analyz':23 'api':48,49,119,177 'apiserv':117 'applic':138 'audit':20 'auth':173 'authent':164 'avail':376,554 'avoid':220 'bad':437 'bash':438,478 'basic':172 'bearer':175 'bit':157 'broker':150,387,388 'byte':373,377,416,555,564 'cadvisor':113 'capac':24,359 'cardin':225,441,464 'caus':445 'check':15,38,385,479,505 'cluster':421 'common':87,269,330,361 'compon':294 'condit':426 'configur':67,165,202,301 'confirm':193 'consider':192 'consum':152,316,334,357,393,481 'consumergroup':337,342,345,391,491,494 'contain':112,114,405,407,412,456 'context7':56 'convent':260,306 'correct':314 'count':389 'cpu':97,367,408,457,507,522,532 'cpu/memory/disk':16 'curl':448,483,511,545 'custom':137 'd':253 'data.result':497,529,560 'date':243,252 'dedic':299 'depend':350 'deploy':109,432,475 'descript':76,146,274 'disk':99,536 'document':64,189 'e.g':84 'endpoint':70,178 'env':275 'environ':60,170,283 'environment-specif':59 'exampl':273,436 'export':94,134,149,295,298,300,326 'filesystem':375,553 'fluent':156 'fluentbit':155 'found':184 'full':542 'gateway':280 'good':477 'group':153,498 'growth':355 'health':19,43,384,422 'help':309 'high':224,380,440,467 'high-cardin':223,439 'high-load':379 'host':95 'hour':255 'http':47,198 'https':200 'hundr':468 'id':228,230 'idl':526 'indic':356 'instanc':561 'instead':248,476 'interpol':222 'interv':214 'invalid':221 'investig':365,406 'issu':360 'job':258,261,304 'jq':496,528,559 'k8s':106,118,292,420 'kafka':135,147,148,297,315,322,325,344,383,386,390,395,480,493 'kafka-export':296 'key':265 'kube':101,103,423,427,431 'kube-state-metr':102 'kubelet':122,123 'kubernet':286,289 'kubernetes-pod':288 'label':226,259,262,442,463 'lag':317,335,346,348,392,394,482,495,502 'linear':551 'linux':251 'load':381 'locat':267,312,378 'log':158 'maco':241 'mcp':57 'memavail':372 'memori':98,371,413 'mention':27 'metric':5,30,37,88,96,105,121,124,141,154,160,293 'metric.consumergroup':499 'metric.instance':531,562 'metric.topic':501 'mode':525 'monitor':4,31,36,91,129,318 'mountpoint':556 'must':231 'name':270,305 'namespac':473 'need':13,167 'network':100 'node':92,93,111,363,366,370,374,382,424,506,521,530,552 'none':171 'normal':347 'object':107 'offici':187 'oom':447 'partit':397,400,404 'pattern':271,272,331 'pct':533 'phase':430 'pipelin':159 'pod':110,290,428,454,462,470 'pod/namespace':419 'predict':538,550,563 'prefix':89,142,144 'process':358 'prod':279 'prod-gateway':278 'prometheus':1,3,28,34,46,69,78,81,125,126,188,196,203,446,449,485,513,547 'prometheus.internal:9090':85 'promql':29,51,180,332 'queri':2,35,191,451,487,515,549 'rang':349 'rate':236,455,520 'redi':136 'referenc':54 'region':276,285 'replic':399,403 'replica':434 'request':229 'requir':77 'resourc':115,287,364 'rule':8,22,62,190 'run':321 'scrape':213 'second':368,410,459,523 'self':128 'self-monitor':127 'server':82,120 'servic':18,268,277,281,313 'set':415 'setup':66 'skill':74 'skill-prometheus' 'smaller':210 'sourc':145 'source-addxai' 'space':537 'specif':61 'standard':291 'state':104,108 'status':425,429,433 'step':206 'sum':237,340,452,488 'sustain':354 'syntax':52,181 'target':33,42 'token':176 'top':509 'topic':339,343,396,492,500 'topic-agent-skills' 'topic-ai-agent' 'topic-ai-engineering' 'topic-claude-code' 'topic-code-review' 'topic-cursor' 'topic-devops' 'topic-enterprise' 'topic-sre' 'topic-windsurf' 'topk':516 'total':369,411,460,524 'trend':25 'typic':215 'unavail':435 'under-repl':401 'url':79,83,204 'url/api/v1/query':450,486,514,548 'usag':116,409,458,508 'use':9,72,197,234,242 'user':12,227 'util':17 'v':245 'v-1h':244 'valu':503,534,565 'variabl':75 'verifi':41 'via':44 'whether':194 'work':414 'workflow':362 'workload':353 'yes':86","prices":[{"id":"3924f228-48e7-4d25-88ed-c871045c85a4","listingId":"6a215dfd-824d-4496-8d35-7893222412c4","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"addxai","category":"enterprise-harness-engineering","install_from":"skills.sh"},"createdAt":"2026-04-21T19:04:01.947Z"}],"sources":[{"listingId":"6a215dfd-824d-4496-8d35-7893222412c4","source":"github","sourceId":"addxai/enterprise-harness-engineering/prometheus","sourceUrl":"https://github.com/addxai/enterprise-harness-engineering/tree/main/skills/prometheus","isPrimary":false,"firstSeenAt":"2026-04-21T19:04:01.947Z","lastSeenAt":"2026-04-22T01:02:12.464Z"}],"details":{"listingId":"6a215dfd-824d-4496-8d35-7893222412c4","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"addxai","slug":"prometheus","github":{"repo":"addxai/enterprise-harness-engineering","stars":16,"topics":["agent-skills","ai-agent","ai-engineering","claude-code","code-review","cursor","devops","enterprise","sre","windsurf"],"license":"apache-2.0","html_url":"https://github.com/addxai/enterprise-harness-engineering","pushed_at":"2026-04-17T08:57:37Z","description":"Enterprise-grade AI Agent Skills for software development, DevOps, SRE, security, and product teams. Compatible with Claude Code, Cursor, Windsurf, Gemini CLI, GitHub Copilot, and 30+ AI coding agents.","skill_md_sha":"290881eac9022b29629a866c9cb90b4fc733512f","skill_md_path":"skills/prometheus/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/addxai/enterprise-harness-engineering/tree/main/skills/prometheus"},"layout":"multi","source":"github","category":"enterprise-harness-engineering","frontmatter":{"name":"prometheus","description":"Query Prometheus monitoring metrics and alert rules. Use when the user needs to check CPU/memory/disk utilization, service health, audit alert rules, analyze capacity trends, or mentions Prometheus, PromQL, metrics monitoring, or targets."},"skills_sh_url":"https://skills.sh/addxai/enterprise-harness-engineering/prometheus"},"updatedAt":"2026-04-22T01:02:12.464Z"}}