Skillquality 0.46

prometheus

Query Prometheus monitoring metrics and alert rules. Use when the user needs to check CPU/memory/disk utilization, service health, audit alert rules, analyze capacity trends, or mentions Prometheus, PromQL, metrics monitoring, or targets.

Price
free
Protocol
skill
Verified
no

What it does

prometheus

Query monitoring metrics, check alerts, and verify target health via the Prometheus HTTP API. API and PromQL syntax are referenced through Context7 MCP; only environment-specific rules are documented here.

Setup

Configure your Prometheus endpoint before using this skill:

VariableDescriptionRequired
PROMETHEUS_URLYour Prometheus server URL (e.g. http://prometheus.internal:9090)Yes

Common metric prefixes to monitor:

  • node_* — Node Exporter (host metrics: CPU, memory, disk, network)
  • kube_* — kube-state-metrics (K8s object state: deployments, pods, nodes)
  • container_* — cAdvisor (container resource usage)
  • apiserver_* — K8s API Server metrics
  • kubelet_* — Kubelet metrics
  • prometheus_* — Prometheus self-monitoring

If you have additional exporters (Kafka, Redis, custom applications), add their metric prefixes here:

PrefixSourceDescription
kafka_*Kafka ExporterBroker and consumer group metrics
fluentbit_*Fluent BitLog pipeline metrics
(add your own)

Authentication: Configure as needed for your environment (none, basic auth, or bearer token).

API endpoints and PromQL syntax can be found in the official Prometheus documentation.

Rules

Query Considerations

  • Confirm whether your Prometheus uses HTTP or HTTPS and configure PROMETHEUS_URL accordingly
  • step should not be smaller than the scrape interval (typically 15s-60s) to avoid invalid interpolation
  • High-cardinality labels (user_id, request_id) must not be used in rate() / sum by() aggregations
  • On macOS, use date -v-1H +%s instead of the Linux date -d '1 hour ago' +%s

Job Label Convention

Job labels are the key to locating services. Common naming patterns:

PatternExampleDescription
{env}-{region}-{service}prod-gatewayService by environment and region
kubernetes-{resource}kubernetes-podsStandard K8s metrics
{component}-exporterkafka-exporterDedicated exporters

Configure your own job naming convention here to help the agent locate services correctly.

Kafka Consumer Lag Monitoring

If you run Kafka with a Kafka Exporter, this is a common pattern:

# Aggregate consumer lag by consumergroup and topic
sum by (consumergroup, topic) (kafka_consumergroup_lag)

Normal lag range depends on your workload. Sustained growth indicates consumer processing capacity issues.

Common Workflows

  • Node resource investigation: node_cpu_seconds_total -> node_memory_MemAvailable_bytes -> node_filesystem_avail_bytes -> locate high-load nodes
  • Kafka health check: kafka_brokers (broker count) -> kafka_consumergroup_lag (consumer lag) -> kafka_topic_partition_under_replicated_partition (under-replicated partitions)
  • Container investigation: container_cpu_usage_seconds_total -> container_memory_working_set_bytes -> aggregate by pod/namespace
  • K8s cluster health: kube_node_status_condition -> kube_pod_status_phase -> kube_deployment_status_replicas_unavailable

Examples

Bad

# High-cardinality label aggregation -- will cause Prometheus OOM
curl "$PROMETHEUS_URL/api/v1/query?query=sum by(pod)(rate(container_cpu_usage_seconds_total[5m]))"
# pod label cardinality is too high (hundreds of pods); aggregate by namespace or deployment instead

Good

# Check Kafka consumer lag
curl -s "$PROMETHEUS_URL/api/v1/query?query=sum%20by%20(consumergroup,topic)(kafka_consumergroup_lag)" | jq '.data.result[] | {group: .metric.consumergroup, topic: .metric.topic, lag: .value[1]}'

# Check node CPU usage top 10
curl -s "$PROMETHEUS_URL/api/v1/query?query=topk(10,100*(1-rate(node_cpu_seconds_total{mode=\"idle\"}[5m])))" | jq '.data.result[] | {node: .metric.instance, cpu_pct: .value[1]}'

# Disk space prediction (will it be full in 24h)
curl -s "$PROMETHEUS_URL/api/v1/query?query=predict_linear(node_filesystem_avail_bytes{mountpoint=\"/\"}[24h],86400)" | jq '.data.result[] | {instance: .metric.instance, predicted_bytes: .value[1]}'

Capabilities

skillsource-addxaiskill-prometheustopic-agent-skillstopic-ai-agenttopic-ai-engineeringtopic-claude-codetopic-code-reviewtopic-cursortopic-devopstopic-enterprisetopic-sretopic-windsurf

Install

Installnpx skills add addxai/enterprise-harness-engineering
Transportskills-sh
Protocolskill

Quality

0.46/ 1.00

deterministic score 0.46 from registry signals: · indexed on github topic:agent-skills · 16 github stars · SKILL.md body (4,207 chars)

Provenance

Indexed fromgithub
Enriched2026-04-22 01:02:12Z · deterministic:skill-github:v1 · v1
First seen2026-04-21
Last seen2026-04-22

Agent access