{"id":"ba64f3bc-1c7b-4763-aae7-749b0476c27b","shortId":"7XueHC","kind":"skill","title":"prometheus","tagline":"Prometheus monitoring and alerting for cloud-native observability. Use when implementing metrics collection, PromQL queries, alerting rules, service discovery, recording rules, and scrape config.","description":"# Prometheus Monitoring and Alerting\n\n## Overview\n\nPrometheus is a powerful open-source monitoring and alerting system designed for reliability and scalability in cloud-native environments.\n\n### Architecture Components\n\n- **Prometheus Server**: Core component that scrapes and stores time-series data\n- **Alertmanager**: Handles alerts, deduplication, grouping, routing, and notifications\n- **Pushgateway**: Allows ephemeral jobs to push metrics (use sparingly)\n- **Exporters**: Convert metrics from third-party systems to Prometheus format\n- **Client Libraries**: Instrument application code (Go, Java, Python, etc.)\n- **Prometheus Operator**: Kubernetes-native deployment and management\n\n### Data Model\n\n- **Metrics**: Time-series data identified by metric name and key-value labels\n- **Metric Types**:\n  - Counter: Monotonically increasing value (requests, errors)\n  - Gauge: Value that can go up/down (temperature, memory usage)\n  - Histogram: Observations in configurable buckets (latency, request size)\n  - Summary: Similar to histogram but calculates quantiles client-side\n\n## Setup and Configuration\n\n### Basic Prometheus Server Configuration\n\n```yaml\n# prometheus.yml\nglobal:\n  scrape_interval: 15s\n  scrape_timeout: 10s\n  evaluation_interval: 15s\n  external_labels:\n    cluster: \"production\"\n    region: \"us-east-1\"\n\n# Alertmanager configuration\nalerting:\n  alertmanagers:\n    - static_configs:\n        - targets:\n            - alertmanager:9093\n\n# Load rules files\nrule_files:\n  - \"alerts/*.yml\"\n  - \"rules/*.yml\"\n\n# Scrape configurations\nscrape_configs:\n  # Prometheus itself\n  - job_name: \"prometheus\"\n    static_configs:\n      - targets: [\"localhost:9090\"]\n\n  # Application services\n  - job_name: \"application\"\n    metrics_path: \"/metrics\"\n    static_configs:\n      - targets:\n          - \"app-1:8080\"\n          - \"app-2:8080\"\n        labels:\n          env: \"production\"\n          team: \"backend\"\n\n  # Kubernetes service discovery\n  - job_name: \"kubernetes-pods\"\n    kubernetes_sd_configs:\n      - role: pod\n    relabel_configs:\n      # Only scrape pods with prometheus.io/scrape annotation\n      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]\n        action: keep\n        regex: true\n      # Use custom metrics path if specified\n      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]\n        action: replace\n        target_label: __metrics_path__\n        regex: (.+)\n      # Use custom port if specified\n      - source_labels:\n          [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]\n        action: replace\n        regex: ([^:]+)(?::\\d+)?;(\\d+)\n        replacement: $1:$2\n        target_label: __address__\n      # Add namespace label\n      - source_labels: [__meta_kubernetes_namespace]\n        action: replace\n        target_label: kubernetes_namespace\n      # Add pod name label\n      - source_labels: [__meta_kubernetes_pod_name]\n        action: replace\n        target_label: kubernetes_pod_name\n      # Add service name label\n      - source_labels: [__meta_kubernetes_pod_label_app]\n        action: replace\n        target_label: app\n\n  # Node Exporter for host metrics\n  - job_name: \"node-exporter\"\n    static_configs:\n      - targets:\n          - \"node-exporter:9100\"\n```\n\n### Alertmanager Configuration\n\n```yaml\n# alertmanager.yml\nglobal:\n  resolve_timeout: 5m\n  slack_api_url: \"https://hooks.slack.com/services/YOUR/WEBHOOK/URL\"\n  pagerduty_url: \"https://events.pagerduty.com/v2/enqueue\"\n\n# Template files for custom notifications\ntemplates:\n  - \"/etc/alertmanager/templates/*.tmpl\"\n\n# Route alerts to appropriate receivers\nroute:\n  group_by: [\"alertname\", \"cluster\", \"service\"]\n  group_wait: 10s\n  group_interval: 10s\n  repeat_interval: 12h\n  receiver: \"default\"\n\n  routes:\n    # Critical alerts go to PagerDuty\n    - match:\n        severity: critical\n      receiver: \"pagerduty\"\n      continue: true\n\n    # Database alerts to DBA team\n    - match:\n        team: database\n      receiver: \"dba-team\"\n      group_by: [\"alertname\", \"instance\"]\n\n    # Development environment alerts\n    - match:\n        env: development\n      receiver: \"slack-dev\"\n      group_wait: 5m\n      repeat_interval: 4h\n\n# Inhibition rules (suppress alerts)\ninhibit_rules:\n  # Suppress warning alerts if critical alert is firing\n  - source_match:\n      severity: \"critical\"\n    target_match:\n      severity: \"warning\"\n    equal: [\"alertname\", \"instance\"]\n\n  # Suppress instance alerts if entire service is down\n  - source_match:\n      alertname: \"ServiceDown\"\n    target_match_re:\n      alertname: \".*\"\n    equal: [\"service\"]\n\nreceivers:\n  - name: \"default\"\n    slack_configs:\n      - channel: \"#alerts\"\n        title: \"Alert: {{ .GroupLabels.alertname }}\"\n        text: \"{{ range .Alerts }}{{ .Annotations.description }}{{ end }}\"\n\n  - name: \"pagerduty\"\n    pagerduty_configs:\n      - service_key: \"YOUR_PAGERDUTY_SERVICE_KEY\"\n        description: \"{{ .GroupLabels.alertname }}\"\n\n  - name: \"dba-team\"\n    slack_configs:\n      - channel: \"#database-alerts\"\n    email_configs:\n      - to: \"dba-team@example.com\"\n        headers:\n          Subject: \"Database Alert: {{ .GroupLabels.alertname }}\"\n\n  - name: \"slack-dev\"\n    slack_configs:\n      - channel: \"#dev-alerts\"\n        send_resolved: true\n```\n\n## Best Practices\n\n### Metric Naming Conventions\n\nFollow these naming patterns for consistency:\n\n```text\n# Format: <namespace>_<subsystem>_<metric>_<unit>\n\n# Counters (always use _total suffix)\nhttp_requests_total\nhttp_request_errors_total\ncache_hits_total\n\n# Gauges\nmemory_usage_bytes\nactive_connections\nqueue_size\n\n# Histograms (use _bucket, _sum, _count suffixes automatically)\nhttp_request_duration_seconds\nresponse_size_bytes\ndb_query_duration_seconds\n\n# Use consistent base units\n- seconds for duration (not milliseconds)\n- bytes for size (not kilobytes)\n- ratio for percentages (0.0-1.0, not 0-100)\n```\n\n### Label Cardinality Management\n\n#### DO\n\n```yaml\n# Good: Bounded cardinality\nhttp_requests_total{method=\"GET\", status=\"200\", endpoint=\"/api/users\"}\n\n# Good: Reasonable number of label values\ndb_queries_total{table=\"users\", operation=\"select\"}\n```\n\n#### DON'T\n\n```yaml\n# Bad: Unbounded cardinality (user IDs, email addresses, timestamps)\nhttp_requests_total{user_id=\"12345\"}\nhttp_requests_total{email=\"user@example.com\"}\nhttp_requests_total{timestamp=\"1234567890\"}\n\n# Bad: High cardinality (full URLs, IP addresses)\nhttp_requests_total{url=\"/api/users/12345/profile\"}\nhttp_requests_total{client_ip=\"192.168.1.100\"}\n```\n\n#### Guidelines\n\n- Keep label values to < 10 per label (ideally)\n- Total unique time-series per metric should be < 10,000\n- Use recording rules to pre-aggregate high-cardinality metrics\n- Avoid labels with unbounded values (IDs, timestamps, user input)\n\n### Recording Rules for Performance\n\nUse recording rules to pre-compute expensive queries:\n\n```yaml\n# rules/recording_rules.yml\ngroups:\n  - name: performance_rules\n    interval: 30s\n    rules:\n      # Pre-calculate request rates\n      - record: job:http_requests:rate5m\n        expr: sum(rate(http_requests_total[5m])) by (job)\n\n      # Pre-calculate error rates\n      - record: job:http_request_errors:rate5m\n        expr: sum(rate(http_request_errors_total[5m])) by (job)\n\n      # Pre-calculate error ratio\n      - record: job:http_request_error_ratio:rate5m\n        expr: |\n          job:http_request_errors:rate5m\n          /\n          job:http_requests:rate5m\n\n      # Pre-aggregate latency percentiles\n      - record: job:http_request_duration_seconds:p95\n        expr: histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (job, le))\n\n      - record: job:http_request_duration_seconds:p99\n        expr: histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (job, le))\n\n  - name: aggregation_rules\n    interval: 1m\n    rules:\n      # Multi-level aggregation for dashboards\n      - record: instance:node_cpu_utilization:ratio\n        expr: |\n          1 - avg(rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) by (instance)\n\n      - record: cluster:node_cpu_utilization:ratio\n        expr: avg(instance:node_cpu_utilization:ratio)\n\n      # Memory aggregation\n      - record: instance:node_memory_utilization:ratio\n        expr: |\n          1 - (\n            node_memory_MemAvailable_bytes\n            /\n            node_memory_MemTotal_bytes\n          )\n```\n\n### Alert Design (Symptoms vs Causes)\n\n#### Alert on symptoms (user-facing impact), not causes\n\n```yaml\n# alerts/symptom_based.yml\ngroups:\n  - name: symptom_alerts\n    rules:\n      # GOOD: Alert on user-facing symptoms\n      - alert: HighErrorRate\n        expr: |\n          (\n            sum(rate(http_requests_total{status=~\"5..\"}[5m]))\n            /\n            sum(rate(http_requests_total[5m]))\n          ) > 0.05\n        for: 5m\n        labels:\n          severity: critical\n          team: backend\n        annotations:\n          summary: \"High error rate detected\"\n          description: \"Error rate is {{ $value | humanizePercentage }} (threshold: 5%)\"\n          runbook: \"https://wiki.example.com/runbooks/high-error-rate\"\n\n      - alert: HighLatency\n        expr: |\n          histogram_quantile(0.95,\n            sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service)\n          ) > 1\n        for: 5m\n        labels:\n          severity: warning\n          team: backend\n        annotations:\n          summary: \"High latency on {{ $labels.service }}\"\n          description: \"P95 latency is {{ $value }}s (threshold: 1s)\"\n          impact: \"Users experiencing slow page loads\"\n\n      # GOOD: SLO-based alerting\n      - alert: SLOBudgetBurnRate\n        expr: |\n          (\n            1 - (\n              sum(rate(http_requests_total{status!~\"5..\"}[1h]))\n              /\n              sum(rate(http_requests_total[1h]))\n            )\n          ) > (14.4 * (1 - 0.999))  # 14.4x burn rate for 99.9% SLO\n        for: 5m\n        labels:\n          severity: critical\n          team: sre\n        annotations:\n          summary: \"SLO budget burning too fast\"\n          description: \"At current rate, monthly error budget will be exhausted in {{ $value | humanizeDuration }}\"\n```\n\n#### Cause-based alerts (use for debugging, not paging)\n\n```yaml\n# alerts/cause_based.yml\ngroups:\n  - name: infrastructure_alerts\n    rules:\n      # Lower severity for infrastructure issues\n      - alert: HighMemoryUsage\n        expr: |\n          (\n            node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes\n          ) / node_memory_MemTotal_bytes > 0.9\n        for: 10m\n        labels:\n          severity: warning # Not critical unless symptoms appear\n          team: infrastructure\n        annotations:\n          summary: \"High memory usage on {{ $labels.instance }}\"\n          description: \"Memory usage is {{ $value | humanizePercentage }}\"\n\n      - alert: DiskSpaceLow\n        expr: |\n          (\n            node_filesystem_avail_bytes{mountpoint=\"/\"}\n            /\n            node_filesystem_size_bytes{mountpoint=\"/\"}\n          ) < 0.1\n        for: 5m\n        labels:\n          severity: warning\n          team: infrastructure\n        annotations:\n          summary: \"Low disk space on {{ $labels.instance }}\"\n          description: \"Only {{ $value | humanizePercentage }} disk space remaining\"\n          action: \"Clean up logs or expand disk\"\n```\n\n### Alert Best Practices\n\n1. **For duration**: Use `for` clause to avoid flapping\n2. **Meaningful annotations**: Include summary, description, runbook URL, impact\n3. **Proper severity levels**: critical (page immediately), warning (ticket), info (log)\n4. **Actionable alerts**: Every alert should require human action\n5. **Include context**: Add labels for team ownership, service, environment\n\n## PromQL Examples\n\n### Rate Calculations\n\n```promql\n# Request rate (requests per second)\nrate(http_requests_total[5m])\n\n# Sum by service\nsum(rate(http_requests_total[5m])) by (service)\n\n# Increase over time window (total count)\nincrease(http_requests_total[1h])\n```\n\n### Error Ratios\n\n```promql\n# Error rate ratio\nsum(rate(http_requests_total{status=~\"5..\"}[5m]))\n/\nsum(rate(http_requests_total[5m]))\n\n# Success rate\nsum(rate(http_requests_total{status=~\"2..\"}[5m]))\n/\nsum(rate(http_requests_total[5m]))\n```\n\n### Histogram Queries\n\n```promql\n# P95 latency\nhistogram_quantile(0.95,\n  sum(rate(http_request_duration_seconds_bucket[5m])) by (le)\n)\n\n# P50, P95, P99 latency by service\nhistogram_quantile(0.50, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))\nhistogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))\nhistogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))\n\n# Average request duration\nsum(rate(http_request_duration_seconds_sum[5m])) by (service)\n/\nsum(rate(http_request_duration_seconds_count[5m])) by (service)\n```\n\n### Aggregation Operations\n\n```promql\n# Sum across all instances\nsum(node_memory_MemTotal_bytes) by (cluster)\n\n# Average CPU usage\navg(rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) by (instance)\n\n# Maximum value\nmax(http_request_duration_seconds) by (service)\n\n# Minimum value\nmin(node_filesystem_avail_bytes) by (instance)\n\n# Count number of instances\ncount(up == 1) by (job)\n\n# Standard deviation\nstddev(http_request_duration_seconds) by (service)\n```\n\n### Advanced Queries\n\n```promql\n# Top 5 services by request rate\ntopk(5, sum(rate(http_requests_total[5m])) by (service))\n\n# Bottom 3 instances by available memory\nbottomk(3, node_memory_MemAvailable_bytes)\n\n# Predict disk full time (linear regression)\npredict_linear(node_filesystem_avail_bytes{mountpoint=\"/\"}[1h], 4 * 3600) < 0\n\n# Compare with 1 day ago\nhttp_requests_total - http_requests_total offset 1d\n\n# Rate of change (derivative)\nderiv(node_memory_MemAvailable_bytes[5m])\n\n# Absent metric detection\nabsent(up{job=\"critical-service\"})\n```\n\n### Complex Aggregations\n\n```promql\n# Calculate Apdex score (Application Performance Index)\n(\n  sum(rate(http_request_duration_seconds_bucket{le=\"0.1\"}[5m]))\n  +\n  sum(rate(http_request_duration_seconds_bucket{le=\"0.5\"}[5m])) * 0.5\n)\n/\nsum(rate(http_request_duration_seconds_count[5m]))\n\n# Multi-window multi-burn-rate SLO\n(\n  sum(rate(http_requests_total{status=~\"5..\"}[1h]))\n  /\n  sum(rate(http_requests_total[1h]))\n  > 0.001 * 14.4\n)\nand\n(\n  sum(rate(http_requests_total{status=~\"5..\"}[5m]))\n  /\n  sum(rate(http_requests_total[5m]))\n  > 0.001 * 14.4\n)\n```\n\n## Kubernetes Integration\n\n### ServiceMonitor for Prometheus Operator\n\n```yaml\n# servicemonitor.yaml\napiVersion: monitoring.coreos.com/v1\nkind: ServiceMonitor\nmetadata:\n  name: app-metrics\n  namespace: monitoring\n  labels:\n    app: myapp\n    release: prometheus\nspec:\n  # Select services to monitor\n  selector:\n    matchLabels:\n      app: myapp\n\n  # Define namespaces to search\n  namespaceSelector:\n    matchNames:\n      - production\n      - staging\n\n  # Endpoint configuration\n  endpoints:\n    - port: metrics # Service port name\n      path: /metrics\n      interval: 30s\n      scrapeTimeout: 10s\n\n      # Relabeling\n      relabelings:\n        - sourceLabels: [__meta_kubernetes_pod_name]\n          targetLabel: pod\n        - sourceLabels: [__meta_kubernetes_namespace]\n          targetLabel: namespace\n\n      # Metric relabeling (filter/modify metrics)\n      metricRelabelings:\n        - sourceLabels: [__name__]\n          regex: \"go_.*\"\n          action: drop # Drop Go runtime metrics\n        - sourceLabels: [status]\n          regex: \"[45]..\"\n          targetLabel: error\n          replacement: \"true\"\n\n  # Optional: TLS configuration\n  # tlsConfig:\n  #   insecureSkipVerify: true\n  #   ca:\n  #     secret:\n  #       name: prometheus-tls\n  #       key: ca.crt\n```\n\n### PodMonitor for Direct Pod Scraping\n\n```yaml\n# podmonitor.yaml\napiVersion: monitoring.coreos.com/v1\nkind: PodMonitor\nmetadata:\n  name: app-pods\n  namespace: monitoring\n  labels:\n    release: prometheus\nspec:\n  # Select pods to monitor\n  selector:\n    matchLabels:\n      app: myapp\n\n  # Namespace selection\n  namespaceSelector:\n    matchNames:\n      - production\n\n  # Pod metrics endpoints\n  podMetricsEndpoints:\n    - port: metrics\n      path: /metrics\n      interval: 15s\n\n      # Relabeling\n      relabelings:\n        - sourceLabels: [__meta_kubernetes_pod_label_version]\n          targetLabel: version\n        - sourceLabels: [__meta_kubernetes_pod_node_name]\n          targetLabel: node\n```\n\n### PrometheusRule for Alerts and Recording Rules\n\n```yaml\n# prometheusrule.yaml\napiVersion: monitoring.coreos.com/v1\nkind: PrometheusRule\nmetadata:\n  name: app-rules\n  namespace: monitoring\n  labels:\n    release: prometheus\n    role: alert-rules\nspec:\n  groups:\n    - name: app_alerts\n      interval: 30s\n      rules:\n        - alert: HighErrorRate\n          expr: |\n            (\n              sum(rate(http_requests_total{status=~\"5..\", app=\"myapp\"}[5m]))\n              /\n              sum(rate(http_requests_total{app=\"myapp\"}[5m]))\n            ) > 0.05\n          for: 5m\n          labels:\n            severity: critical\n            team: backend\n          annotations:\n            summary: \"High error rate on {{ $labels.namespace }}/{{ $labels.pod }}\"\n            description: \"Error rate is {{ $value | humanizePercentage }}\"\n            dashboard: \"https://grafana.example.com/d/app-overview\"\n            runbook: \"https://wiki.example.com/runbooks/high-error-rate\"\n\n        - alert: PodCrashLooping\n          expr: |\n            rate(kube_pod_container_status_restarts_total[15m]) > 0\n          for: 5m\n          labels:\n            severity: warning\n          annotations:\n            summary: \"Pod {{ $labels.namespace }}/{{ $labels.pod }} is crash looping\"\n            description: \"Container {{ $labels.container }} has restarted {{ $value }} times in 15m\"\n\n    - name: app_recording_rules\n      interval: 30s\n      rules:\n        - record: app:http_requests:rate5m\n          expr: sum(rate(http_requests_total{app=\"myapp\"}[5m])) by (namespace, pod, method, status)\n\n        - record: app:http_request_duration_seconds:p95\n          expr: |\n            histogram_quantile(0.95,\n              sum(rate(http_request_duration_seconds_bucket{app=\"myapp\"}[5m])) by (le, namespace, pod)\n            )\n```\n\n### Prometheus Custom Resource\n\n```yaml\n# prometheus.yaml\napiVersion: monitoring.coreos.com/v1\nkind: Prometheus\nmetadata:\n  name: prometheus\n  namespace: monitoring\nspec:\n  replicas: 2\n  version: v2.45.0\n\n  # Service account for Kubernetes API access\n  serviceAccountName: prometheus\n\n  # Select ServiceMonitors\n  serviceMonitorSelector:\n    matchLabels:\n      release: prometheus\n\n  # Select PodMonitors\n  podMonitorSelector:\n    matchLabels:\n      release: prometheus\n\n  # Select PrometheusRules\n  ruleSelector:\n    matchLabels:\n      release: prometheus\n      role: alert-rules\n\n  # Resource limits\n  resources:\n    requests:\n      memory: 2Gi\n      cpu: 1000m\n    limits:\n      memory: 4Gi\n      cpu: 2000m\n\n  # Storage\n  storage:\n    volumeClaimTemplate:\n      spec:\n        accessModes:\n          - ReadWriteOnce\n        resources:\n          requests:\n            storage: 50Gi\n        storageClassName: fast-ssd\n\n  # Retention\n  retention: 30d\n  retentionSize: 45GB\n\n  # Alertmanager configuration\n  alerting:\n    alertmanagers:\n      - namespace: monitoring\n        name: alertmanager\n        port: web\n\n  # External labels\n  externalLabels:\n    cluster: production\n    region: us-east-1\n\n  # Security context\n  securityContext:\n    fsGroup: 2000\n    runAsNonRoot: true\n    runAsUser: 1000\n\n  # Enable admin API for management operations\n  enableAdminAPI: false\n\n  # Additional scrape configs (from Secret)\n  additionalScrapeConfigs:\n    name: additional-scrape-configs\n    key: prometheus-additional.yaml\n```\n\n## Application Instrumentation Examples\n\n### Go Application\n\n```go\n// main.go\npackage main\n\nimport (\n    \"net/http\"\n    \"time\"\n\n    \"github.com/prometheus/client_golang/prometheus\"\n    \"github.com/prometheus/client_golang/prometheus/promauto\"\n    \"github.com/prometheus/client_golang/prometheus/promhttp\"\n)\n\nvar (\n    // Counter for total requests\n    httpRequestsTotal = promauto.NewCounterVec(\n        prometheus.CounterOpts{\n            Name: \"http_requests_total\",\n            Help: \"Total number of HTTP requests\",\n        },\n        []string{\"method\", \"endpoint\", \"status\"},\n    )\n\n    // Histogram for request duration\n    httpRequestDuration = promauto.NewHistogramVec(\n        prometheus.HistogramOpts{\n            Name:    \"http_request_duration_seconds\",\n            Help:    \"HTTP request duration in seconds\",\n            Buckets: []float64{.001, .005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10},\n        },\n        []string{\"method\", \"endpoint\"},\n    )\n\n    // Gauge for active connections\n    activeConnections = promauto.NewGauge(\n        prometheus.GaugeOpts{\n            Name: \"active_connections\",\n            Help: \"Number of active connections\",\n        },\n    )\n\n    // Summary for response sizes\n    responseSizeBytes = promauto.NewSummaryVec(\n        prometheus.SummaryOpts{\n            Name:       \"http_response_size_bytes\",\n            Help:       \"HTTP response size in bytes\",\n            Objectives: map[float64]float64{0.5: 0.05, 0.9: 0.01, 0.99: 0.001},\n        },\n        []string{\"endpoint\"},\n    )\n)\n\n// Middleware to instrument HTTP handlers\nfunc instrumentHandler(endpoint string, handler http.HandlerFunc) http.HandlerFunc {\n    return func(w http.ResponseWriter, r *http.Request) {\n        start := time.Now()\n        activeConnections.Inc()\n        defer activeConnections.Dec()\n\n        // Wrap response writer to capture status code\n        wrapped := &responseWriter{ResponseWriter: w, statusCode: 200}\n\n        handler(wrapped, r)\n\n        duration := time.Since(start).Seconds()\n        httpRequestDuration.WithLabelValues(r.Method, endpoint).Observe(duration)\n        httpRequestsTotal.WithLabelValues(r.Method, endpoint,\n            http.StatusText(wrapped.statusCode)).Inc()\n    }\n}\n\ntype responseWriter struct {\n    http.ResponseWriter\n    statusCode int\n}\n\nfunc (rw *responseWriter) WriteHeader(code int) {\n    rw.statusCode = code\n    rw.ResponseWriter.WriteHeader(code)\n}\n\nfunc handleUsers(w http.ResponseWriter, r *http.Request) {\n    w.Header().Set(\"Content-Type\", \"application/json\")\n    w.Write([]byte(`{\"users\": []}`))\n}\n\nfunc main() {\n    // Register handlers\n    http.HandleFunc(\"/api/users\", instrumentHandler(\"/api/users\", handleUsers))\n    http.Handle(\"/metrics\", promhttp.Handler())\n\n    // Start server\n    http.ListenAndServe(\":8080\", nil)\n}\n```\n\n### Python Application (Flask)\n\n```python\n# app.py\nfrom flask import Flask, request\nfrom prometheus_client import Counter, Histogram, Gauge, generate_latest\nimport time\n\napp = Flask(__name__)\n\n# Define metrics\nrequest_count = Counter(\n    'http_requests_total',\n    'Total HTTP requests',\n    ['method', 'endpoint', 'status']\n)\n\nrequest_duration = Histogram(\n    'http_request_duration_seconds',\n    'HTTP request duration in seconds',\n    ['method', 'endpoint'],\n    buckets=[.001, .005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10]\n)\n\nactive_requests = Gauge(\n    'active_requests',\n    'Number of active requests'\n)\n\n# Middleware for instrumentation\n@app.before_request\ndef before_request():\n    active_requests.inc()\n    request.start_time = time.time()\n\n@app.after_request\ndef after_request(response):\n    active_requests.dec()\n\n    duration = time.time() - request.start_time\n    request_duration.labels(\n        method=request.method,\n        endpoint=request.endpoint or 'unknown'\n    ).observe(duration)\n\n    request_count.labels(\n        method=request.method,\n        endpoint=request.endpoint or 'unknown',\n        status=response.status_code\n    ).inc()\n\n    return response\n\n@app.route('/metrics')\ndef metrics():\n    return generate_latest()\n\n@app.route('/api/users')\ndef users():\n    return {'users': []}\n\nif __name__ == '__main__':\n    app.run(host='0.0.0.0', port=8080)\n```\n\n## Production Deployment Checklist\n\n- [ ] Set appropriate retention period (balance storage vs history needs)\n- [ ] Configure persistent storage with adequate size\n- [ ] Enable high availability (multiple Prometheus replicas or federation)\n- [ ] Set up remote storage for long-term retention (Thanos, Cortex, Mimir)\n- [ ] Configure service discovery for dynamic environments\n- [ ] Implement recording rules for frequently-used queries\n- [ ] Create symptom-based alerts with proper annotations\n- [ ] Set up Alertmanager with appropriate routing and receivers\n- [ ] Configure inhibition rules to reduce alert noise\n- [ ] Add runbook URLs to all critical alerts\n- [ ] Implement proper label hygiene (avoid high cardinality)\n- [ ] Monitor Prometheus itself (meta-monitoring)\n- [ ] Set up authentication and authorization\n- [ ] Enable TLS for scrape targets and remote storage\n- [ ] Configure rate limiting for queries\n- [ ] Test alert and recording rule validity (`promtool check rules`)\n- [ ] Implement backup and disaster recovery procedures\n- [ ] Document metric naming conventions for the team\n- [ ] Create dashboards in Grafana for common queries\n- [ ] Set up log aggregation alongside metrics (Loki)\n\n## Troubleshooting Commands\n\n```bash\n# Check Prometheus configuration syntax\npromtool check config prometheus.yml\n\n# Check rules file syntax\npromtool check rules alerts/*.yml\n\n# Test PromQL queries\npromtool query instant http://localhost:9090 'up'\n\n# Check which targets are up\ncurl http://localhost:9090/api/v1/targets\n\n# Query current metric values\ncurl 'http://localhost:9090/api/v1/query?query=up'\n\n# Check service discovery\ncurl http://localhost:9090/api/v1/targets/metadata\n\n# View TSDB stats\ncurl http://localhost:9090/api/v1/status/tsdb\n\n# Check runtime information\ncurl http://localhost:9090/api/v1/status/runtimeinfo\n```\n\n## Additional Resources\n\n- [Prometheus Documentation](https://prometheus.io/docs/)\n- [PromQL Basics](https://prometheus.io/docs/prometheus/latest/querying/basics/)\n- [Best Practices](https://prometheus.io/docs/practices/)\n- [Alerting Rules](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/)\n- [Recording Rules](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/)\n- [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator)","tags":["prometheus","atlasclaw","providers","cloudchef","agent-skills","agentic-workflow","ai-integration","openclaw"],"capabilities":["skill","source-cloudchef","skill-prometheus","topic-agent-skills","topic-agentic-workflow","topic-ai-integration","topic-openclaw"],"categories":["atlasclaw-providers"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/CloudChef/atlasclaw-providers/prometheus","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add CloudChef/atlasclaw-providers","source_repo":"https://github.com/CloudChef/atlasclaw-providers","install_from":"skills.sh"}},"qualityScore":"0.455","qualityRationale":"deterministic score 0.46 from registry signals: · indexed on github topic:agent-skills · 10 github stars · SKILL.md body (26,250 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T19:08:23.564Z","embedding":null,"createdAt":"2026-05-09T01:05:33.621Z","updatedAt":"2026-05-18T19:08:23.564Z","lastSeenAt":"2026-05-18T19:08:23.564Z","tsv":"'-1':235 '-1.0':661 '-100':664 '-2':238 '/api/users':681,2383,2385,2522 '/api/users/12345/profile':733 '/d/app-overview':1947 '/docs/)':2754 '/docs/practices/)':2764 '/docs/prometheus/latest/configuration/alerting_rules/)':2769 '/docs/prometheus/latest/configuration/recording_rules/)':2774 '/docs/prometheus/latest/querying/basics/)':2759 '/etc/alertmanager/templates':418 '/metrics':230,1743,1844,2388,2515 '/prometheus-operator/prometheus-operator)':2779 '/prometheus/client_golang/prometheus':2184 '/prometheus/client_golang/prometheus/promauto':2187 '/prometheus/client_golang/prometheus/promhttp':2190 '/runbooks/high-error-rate':1045,1951 '/scrape':266 '/services/your/webhook/url':406 '/v1':1702,1810,1876,2045 '/v2/enqueue':411 '0':663,1579,1963 '0.0':660 '0.0.0.0':2532 '0.001':1672,1689,2290 '0.01':2288 '0.05':1020,1922,2286 '0.1':1226,1629 '0.5':1639,1641,2285 '0.50':1405 '0.9':1187,2287 '0.95':879,1051,1386,1419,2022 '0.99':901,1433,2289 '0.999':1116 '000':759 '001':2233,2448 '005':2234,2449 '01':2235,2450 '025':2236,2451 '05':2237,2452 '1':190,324,932,966,1063,1099,1115,1258,1520,1582,2139,2238,2241,2453,2456 '10':745,758,2244,2459 '1000':2148 '1000m':2095 '10m':1189 '10s':178,433,436,1747 '12345':711 '1234567890':721 '12h':439 '14.4':1114,1117,1673,1690 '15m':1962,1985 '15s':175,181,1846 '192.168.1.100':739 '1d':1592 '1h':1107,1113,1342,1576,1665,1671 '1m':917 '1s':1084 '2':325,1267,1371,2055 '2.5':2242,2457 '200':679,2328 '2000':2144 '2000m':2100 '25':2239,2454 '2gi':2093 '3':1276,1552,1558 '30d':2117 '30s':800,1745,1899,1991 '3600':1578 '4':1287,1577 '45':1781 '45gb':2119 '4gi':2098 '4h':486 '5':1012,1041,1106,1296,1355,1536,1542,1664,1681,1910,2240,2243,2455,2458 '50gi':2110 '5m':400,483,818,839,887,909,941,1013,1019,1022,1059,1065,1125,1228,1320,1329,1356,1362,1372,1378,1394,1413,1427,1441,1455,1465,1493,1548,1602,1630,1640,1649,1682,1688,1913,1921,1924,1965,2006,2032 '8080':236,239,2393,2534 '9090':222,2711 '9090/api/v1/query':2727 '9090/api/v1/status/runtimeinfo':2747 '9090/api/v1/status/tsdb':2741 '9090/api/v1/targets':2720 '9090/api/v1/targets/metadata':2735 '9093':199 '9100':392 '99.9':1122 'absent':1603,1606 'access':2063 'accessmod':2105 'account':2059 'across':1472 'action':277,296,318,337,353,371,1248,1288,1295,1772 'activ':621,2250,2256,2261,2460,2463,2467 'active_requests.dec':2487 'active_requests.inc':2477 'activeconnect':2252 'activeconnections.dec':2315 'activeconnections.inc':2313 'add':329,343,360,1299,2610 'addit':2157,2165,2748 'additional-scrape-config':2164 'additionalscrapeconfig':2162 'address':310,328,704,728 'adequ':2551 'admin':2150 'advanc':1532 'aggreg':766,866,914,922,958,1468,1613,2680 'ago':1584 'alert':5,18,30,41,69,193,205,421,444,456,473,490,495,498,514,536,538,542,566,574,585,975,980,994,997,1003,1046,1095,1096,1154,1165,1172,1213,1255,1289,1291,1867,1891,1897,1901,1952,2086,2122,2591,2608,2616,2649,2702,2765 'alert-rul':1890,2085 'alertmanag':67,191,194,198,393,2120,2123,2127,2597 'alertmanager.yml':396 'alertnam':428,469,510,522,527 'alerts/cause_based.yml':1161 'alerts/symptom_based.yml':990 'allow':76 'alongsid':2681 'alway':603 'annot':267,273,292,314,1028,1071,1131,1200,1234,1269,1930,1969,2594 'annotations.description':543 'apdex':1616 'api':402,2062,2151 'apivers':1699,1807,1873,2042 'app':234,237,370,375,1708,1713,1724,1816,1830,1882,1896,1911,1919,1987,1994,2004,2013,2030,2416 'app-metr':1707 'app-pod':1815 'app-rul':1881 'app.after':2481 'app.before':2472 'app.py':2399 'app.route':2514,2521 'app.run':2530 'appear':1197 'applic':98,223,227,1618,2170,2174,2396 'application/json':2374 'appropri':423,2539,2599 'architectur':53 'authent':2632 'author':2634 'automat':631 'avail':1218,1510,1555,1573,2555 'averag':1445,1482 'avg':933,951,1485 'avoid':771,1265,2621 'backend':244,1027,1070,1929 'backup':2658 'bad':698,722 'balanc':2542 'base':645,1094,1153,2590 'bash':2686 'basic':166,2756 'best':589,1256,2760 'bottom':1551 'bottomk':1557 'bound':671 'bucket':149,627,886,908,1058,1393,1412,1426,1440,1627,1637,2029,2231,2447 'budget':1134,1144 'burn':1119,1135,1655 'byte':620,638,652,970,974,1178,1182,1186,1219,1224,1479,1511,1562,1574,1601,2274,2280,2376 'ca':1792 'ca.crt':1799 'cach':614 'calcul':158,804,823,844,1309,1615 'captur':2320 'cardin':666,672,700,724,769,2623 'caus':979,988,1152 'cause-bas':1151 'chang':1595 'channel':535,563,582 'check':2655,2687,2692,2695,2700,2713,2730,2742 'checklist':2537 'claus':1263 'clean':1249 'client':95,161,737,2407 'client-sid':160 'cloud':8,50 'cloud-nat':7,49 'cluster':184,429,945,1481,2133 'code':99,2322,2357,2360,2362,2510 'collect':15 'command':2685 'common':2675 'compar':1580 'complex':1612 'compon':54,58 'comput':790 'config':26,196,212,219,232,255,259,387,534,548,562,568,581,2159,2167,2693 'configur':148,165,169,192,210,394,1735,1788,2121,2547,2573,2603,2643,2689 'connect':622,2251,2257,2262 'consist':599,644 'contain':1958,1978 'content':2372 'content-typ':2371 'context':1298,2141 'continu':453 'convent':593,2666 'convert':85 'core':57 'cortex':2571 'count':629,1337,1464,1514,1518,1648,2422 'counter':130,602,2192,2409,2423 'cpu':928,936,947,954,1483,1488,2094,2099 'crash':1975 'creat':2587,2670 'critic':443,450,497,504,1025,1128,1194,1280,1610,1927,2615 'critical-servic':1609 'curl':2718,2725,2733,2739,2745 'current':1140,2722 'custom':282,304,415,2038 'd':321,322 'dashboard':924,1944,2671 'data':66,112,118 'databas':455,462,565,573 'database-alert':564 'day':1583 'db':639,688 'dba':458,465,559 'dba-team':464,558 'dba-team@example.com':570 'debug':1157 'dedupl':70 'def':2474,2483,2516,2523 'default':441,532 'defer':2314 'defin':1726,2419 'deploy':109,2536 'deriv':1596,1597 'descript':555,1034,1077,1138,1207,1241,1272,1938,1977 'design':43,976 'detect':1033,1605 'dev':480,579,584 'dev-alert':583 'develop':471,476 'deviat':1524 'direct':1802 'disast':2660 'discoveri':21,247,2575,2732 'disk':1237,1245,1254,1564 'diskspacelow':1214 'document':2663,2751 'drop':1773,1774 'durat':634,641,649,873,884,895,906,1056,1260,1391,1410,1424,1438,1447,1452,1462,1501,1528,1625,1635,1646,2016,2027,2216,2223,2228,2332,2340,2434,2438,2442,2488,2500 'dynam':2577 'east':189,2138 'email':567,703,715 'enabl':2149,2553,2635 'enableadminapi':2155 'end':544 'endpoint':680,1734,1736,1839,2211,2247,2292,2300,2338,2343,2431,2446,2495,2504 'entir':516 'env':241,475 'environ':52,472,1305,2578 'ephemer':77 'equal':509,528 'error':135,612,824,830,837,845,851,858,1031,1035,1143,1343,1346,1783,1933,1939 'etc':103 'evalu':179 'events.pagerduty.com':410 'events.pagerduty.com/v2/enqueue':409 'everi':1290 'exampl':1307,2172 'exhaust':1147 'expand':1253 'expens':791 'experienc':1087 'export':84,377,385,391 'expr':812,832,854,876,898,931,950,965,1005,1048,1098,1174,1215,1903,1954,1998,2019 'extern':182,2130 'externallabel':2132 'face':985,1001 'fals':2156 'fast':1137,2113 'fast-ssd':2112 'feder':2560 'file':202,204,413,2697 'filesystem':1217,1222,1509,1572 'filter/modify':1765 'fire':500 'flap':1266 'flask':2397,2401,2403,2417 'float64':2232,2283,2284 'follow':594 'format':94,601 'frequent':2584 'frequently-us':2583 'fsgroup':2143 'full':725,1565 'func':2298,2306,2353,2363,2378 'gaug':136,617,2248,2411,2462 'generat':2412,2519 'get':677 'github.com':2183,2186,2189,2778 'github.com/prometheus-operator/prometheus-operator)':2777 'github.com/prometheus/client_golang/prometheus':2182 'github.com/prometheus/client_golang/prometheus/promauto':2185 'github.com/prometheus/client_golang/prometheus/promhttp':2188 'global':172,397 'go':100,140,445,1771,1775,2173,2175 'good':670,682,996,1091 'grafana':2673 'grafana.example.com':1946 'grafana.example.com/d/app-overview':1945 'group':71,426,431,434,467,481,795,991,1162,1894 'grouplabels.alertname':539,556,575 'guidelin':740 'handl':68 'handler':2297,2302,2329,2381 'handleus':2364,2386 'header':571 'help':2203,2225,2258,2275 'high':723,768,1030,1073,1202,1932,2554,2622 'high-cardin':767 'higherrorr':1004,1902 'highlat':1047 'highmemoryusag':1173 'histogram':145,156,625,877,899,1049,1379,1384,1403,1417,1431,2020,2213,2410,2435 'histori':2545 'hit':615 'hooks.slack.com':405 'hooks.slack.com/services/your/webhook/url':404 'host':379,2531 'http':607,610,632,673,706,712,717,729,734,809,815,828,835,849,856,861,871,882,893,904,1008,1016,1054,1102,1110,1317,1326,1339,1351,1359,1367,1375,1389,1408,1422,1436,1450,1460,1499,1526,1545,1585,1588,1623,1633,1644,1660,1668,1677,1685,1906,1916,1995,2001,2014,2025,2200,2207,2221,2226,2271,2276,2296,2424,2428,2436,2440 'http.handle':2387 'http.handlefunc':2382 'http.handlerfunc':2303,2304 'http.listenandserve':2392 'http.request':2310,2368 'http.responsewriter':2308,2350,2366 'http.statustext':2344 'httprequestdur':2217 'httprequestduration.withlabelvalues':2336 'httprequeststot':2196 'httprequeststotal.withlabelvalues':2341 'human':1294 'humanizedur':1150 'humanizepercentag':1039,1212,1244,1943 'hygien':2620 'id':702,710,776 'ideal':748 'identifi':119 'idl':940,1492 'immedi':1282 'impact':986,1085,1275 'implement':13,2579,2617,2657 'import':2179,2402,2408,2414 'inc':2346,2511 'includ':1270,1297 'increas':132,1332,1338 'index':1620 'info':1285 'inform':2744 'infrastructur':1164,1170,1199,1233 'inhibit':487,491,2604 'input':779 'insecureskipverifi':1790 'instanc':470,511,513,926,943,952,960,1474,1495,1513,1517,1553 'instant':2709 'instrument':97,2171,2295,2471 'instrumenthandl':2299,2384 'int':2352,2358 'integr':1692 'interv':174,180,435,438,485,799,916,1744,1845,1898,1990 'io':275,294,316 'ip':727,738 'issu':1171 'java':101 'job':78,215,225,248,381,808,820,827,841,848,855,860,870,889,892,911,1522,1608 'keep':278,741 'key':125,550,554,1798,2168 'key-valu':124 'kilobyt':656 'kind':1703,1811,1877,2046 'kube':1956 'kubernet':107,245,251,253,271,290,312,335,341,350,357,367,1691,1752,1759,1851,1859,2061 'kubernetes-n':106 'kubernetes-pod':250 'label':127,183,240,269,288,299,309,327,331,333,340,346,348,356,363,365,369,374,665,686,742,747,772,1023,1066,1126,1190,1229,1300,1712,1820,1853,1886,1925,1966,2131,2619 'labels.container':1979 'labels.instance':1206,1240 'labels.namespace':1936,1972 'labels.pod':1937,1973 'labels.service':1076 'latenc':150,867,1074,1079,1383,1400 'latest':2413,2520 'le':890,912,1061,1396,1415,1429,1443,1628,1638,2034 'level':921,1279 'librari':96 'limit':2089,2096,2645 'linear':1567,1570 'load':200,1090 'localhost':221,2710,2719,2726,2734,2740,2746 'log':1251,1286,2679 'loki':2683 'long':2567 'long-term':2566 'loop':1976 'low':1236 'lower':1167 'main':2178,2379,2529 'main.go':2176 'manag':111,667,2153 'map':2282 'match':448,460,474,502,506,521,525 'matchlabel':1723,1829,2069,2075,2081 'matchnam':1731,1835 'max':1498 'maximum':1496 'meaning':1268 'memavail':969,1181,1561,1600 'memori':143,618,957,962,968,972,1176,1180,1184,1203,1208,1477,1556,1560,1599,2092,2097 'memtot':973,1177,1185,1478 'meta':270,289,311,334,349,366,1751,1758,1850,1858,2628 'meta-monitor':2627 'metadata':1705,1813,1879,2048 'method':676,2010,2210,2246,2430,2445,2493,2502 'metric':14,81,86,114,121,128,228,283,300,380,591,755,770,1604,1709,1738,1763,1766,1777,1838,1842,2420,2517,2664,2682,2723 'metricrelabel':1767 'middlewar':2293,2469 'millisecond':651 'mimir':2572 'min':1507 'minimum':1505 'mode':939,1491 'model':113 'monitor':3,28,39,1711,1721,1819,1827,1885,2052,2125,2624,2629 'monitoring.coreos.com':1701,1809,1875,2044 'monitoring.coreos.com/v1':1700,1808,1874,2043 'monoton':131 'month':1142 'mountpoint':1220,1225,1575 'multi':920,1651,1654 'multi-burn-r':1653 'multi-level':919 'multi-window':1650 'multipl':2556 'myapp':1714,1725,1831,1912,1920,2005,2031 'name':122,216,226,249,345,352,359,362,382,531,545,557,576,592,596,796,913,992,1163,1706,1741,1754,1769,1794,1814,1862,1880,1895,1986,2049,2126,2163,2199,2220,2255,2270,2418,2528,2665 'namespac':330,336,342,1710,1727,1760,1762,1818,1832,1884,2008,2035,2051,2124 'namespaceselector':1730,1834 'nativ':9,51,108 'need':2546 'net/http':2180 'nil':2394 'node':376,384,390,927,935,946,953,961,967,971,1175,1179,1183,1216,1221,1476,1487,1508,1559,1571,1598,1861,1864 'node-export':383,389 'nois':2609 'notif':74,416 'number':684,1515,2205,2259,2465 'object':2281 'observ':10,146,2339,2499 'offset':1591 'open':37 'open-sourc':36 'oper':105,693,1469,1696,2154,2776 'option':1786 'overview':31 'ownership':1303 'p50':1397 'p95':875,1078,1382,1398,2018 'p99':897,1399 'packag':2177 'page':1089,1159,1281 'pagerduti':407,447,452,546,547,552 'parti':90 'path':229,284,295,301,1742,1843 'pattern':597 'per':746,754,1314 'percentag':659 'percentil':868 'perform':783,797,1619 'period':2541 'persist':2548 'pod':252,257,262,272,291,313,344,351,358,368,1753,1756,1803,1817,1825,1837,1852,1860,1957,1971,2009,2036 'podcrashloop':1953 'podmetricsendpoint':1840 'podmonitor':1800,1812,2073 'podmonitor.yaml':1806 'podmonitorselector':2074 'port':305,317,1737,1740,1841,2128,2533 'power':35 'practic':590,1257,2761 'pre':765,789,803,822,843,865 'pre-aggreg':764,864 'pre-calcul':802,821,842 'pre-comput':788 'predict':1563,1569 'procedur':2662 'product':185,242,1732,1836,2134,2535 'promauto.newcountervec':2197 'promauto.newgauge':2253 'promauto.newhistogramvec':2218 'promauto.newsummaryvec':2268 'prometheus':1,2,27,32,55,93,104,167,213,217,274,293,315,1695,1716,1796,1822,1888,2037,2047,2050,2065,2071,2077,2083,2406,2557,2625,2688,2750,2775 'prometheus-additional.yaml':2169 'prometheus-tl':1795 'prometheus.counteropts':2198 'prometheus.gaugeopts':2254 'prometheus.histogramopts':2219 'prometheus.io':265,2753,2758,2763,2768,2773 'prometheus.io/docs/)':2752 'prometheus.io/docs/practices/)':2762 'prometheus.io/docs/prometheus/latest/configuration/alerting_rules/)':2767 'prometheus.io/docs/prometheus/latest/configuration/recording_rules/)':2772 'prometheus.io/docs/prometheus/latest/querying/basics/)':2757 'prometheus.io/scrape':264 'prometheus.summaryopts':2269 'prometheus.yaml':2041 'prometheus.yml':171,2694 'prometheusrul':1865,1878,2079 'prometheusrule.yaml':1872 'promhttp.handler':2389 'promql':16,1306,1310,1345,1381,1470,1534,1614,2705,2755 'promtool':2654,2691,2699,2707 'proper':1277,2593,2618 'push':80 'pushgateway':75 'python':102,2395,2398 'quantil':159,878,900,1050,1385,1404,1418,1432,2021 'queri':17,640,689,792,1380,1533,2586,2647,2676,2706,2708,2721,2728 'queue':623 'r':2309,2331,2367 'r.method':2337,2342 'rang':541 'rate':806,814,825,834,881,903,934,1007,1015,1032,1036,1053,1101,1109,1120,1141,1308,1312,1316,1325,1347,1350,1358,1364,1366,1374,1388,1407,1421,1435,1449,1459,1486,1540,1544,1593,1622,1632,1643,1656,1659,1667,1676,1684,1905,1915,1934,1940,1955,2000,2024,2644 'rate5m':811,831,853,859,863,1997 'ratio':657,846,852,930,949,956,964,1344,1348 're':526 'readwriteonc':2106 'reason':683 'receiv':424,440,451,463,477,530,2602 'record':22,761,780,785,807,826,847,869,891,925,944,959,1869,1988,1993,2012,2580,2651,2770 'recoveri':2661 'reduc':2607 'regex':279,302,320,1770,1780 'region':186,2135 'regist':2380 'regress':1568 'relabel':258,1748,1749,1764,1847,1848 'releas':1715,1821,1887,2070,2076,2082 'reliabl':45 'remain':1247 'remot':2563,2641 'repeat':437,484 'replac':297,319,323,338,354,372,1784 'replica':2054,2558 'request':134,151,608,611,633,674,707,713,718,730,735,805,810,816,829,836,850,857,862,872,883,894,905,1009,1017,1055,1103,1111,1311,1313,1318,1327,1340,1352,1360,1368,1376,1390,1409,1423,1437,1446,1451,1461,1500,1527,1539,1546,1586,1589,1624,1634,1645,1661,1669,1678,1686,1907,1917,1996,2002,2015,2026,2091,2108,2195,2201,2208,2215,2222,2227,2404,2421,2425,2429,2433,2437,2441,2461,2464,2468,2473,2476,2482,2485 'request.endpoint':2496,2505 'request.method':2494,2503 'request.start':2478,2490 'request_count.labels':2501 'request_duration.labels':2492 'requir':1293 'resolv':398,587 'resourc':2039,2088,2090,2107,2749 'respons':636,2265,2272,2277,2317,2486,2513 'response.status':2509 'responsesizebyt':2267 'responsewrit':2324,2325,2348,2355 'restart':1960,1981 'retent':2115,2116,2540,2569 'retentions':2118 'return':2305,2512,2518,2525 'role':256,1889,2084 'rout':72,420,425,442,2600 'rule':19,23,201,203,207,488,492,762,781,786,798,801,915,918,995,1166,1870,1883,1892,1900,1989,1992,2087,2581,2605,2652,2656,2696,2701,2766,2771 'rules/recording_rules.yml':794 'ruleselector':2080 'runasnonroot':2145 'runasus':2147 'runbook':1042,1273,1948,2611 'runtim':1776,2743 'rw':2354 'rw.responsewriter.writeheader':2361 'rw.statuscode':2359 'scalabl':47 'score':1617 'scrape':25,60,173,176,209,211,261,276,1804,2158,2166,2638 'scrapetimeout':1746 'sd':254 'search':1729 'second':635,642,647,874,885,896,907,937,1057,1315,1392,1411,1425,1439,1453,1463,1489,1502,1529,1626,1636,1647,2017,2028,2224,2230,2335,2439,2444 'secret':1793,2161 'secur':2140 'securitycontext':2142 'select':694,1718,1824,1833,2066,2072,2078 'selector':1722,1828 'send':586 'seri':65,117,753 'server':56,168,2391 'servic':20,224,246,361,430,517,529,549,553,1062,1304,1323,1331,1402,1416,1430,1444,1457,1467,1504,1531,1537,1550,1611,1719,1739,2058,2574,2731 'serviceaccountnam':2064 'servicedown':523 'servicemonitor':1693,1704,2067 'servicemonitor.yaml':1698 'servicemonitorselector':2068 'set':2370,2538,2561,2595,2630,2677 'setup':163 'sever':449,503,507,1024,1067,1127,1168,1191,1230,1278,1926,1967 'side':162 'similar':154 'size':152,624,637,654,1223,2266,2273,2278,2552 'skill' 'skill-prometheus' 'slack':401,479,533,561,578,580 'slack-dev':478,577 'slo':1093,1123,1133,1657 'slo-bas':1092 'slobudgetburnr':1097 'slow':1088 'sourc':38,268,287,308,332,347,364,501,520 'source-cloudchef' 'sourcelabel':1750,1757,1768,1778,1849,1857 'space':1238,1246 'spare':83 'spec':1717,1823,1893,2053,2104 'specifi':286,307 'sre':1130 'ssd':2114 'stage':1733 'standard':1523 'start':2311,2334,2390 'stat':2738 'static':195,218,231,386 'status':678,1011,1105,1354,1370,1663,1680,1779,1909,1959,2011,2212,2321,2432,2508 'statuscod':2327,2351 'stddev':1525 'storag':2101,2102,2109,2543,2549,2564,2642 'storageclassnam':2111 'store':62 'string':2209,2245,2291,2301 'struct':2349 'subject':572 'success':1363 'suffix':606,630 'sum':628,813,833,880,902,1006,1014,1052,1100,1108,1321,1324,1349,1357,1365,1373,1387,1406,1420,1434,1448,1454,1458,1471,1475,1543,1621,1631,1642,1658,1666,1675,1683,1904,1914,1999,2023 'summari':153,1029,1072,1132,1201,1235,1271,1931,1970,2263 'suppress':489,493,512 'symptom':977,982,993,1002,1196,2589 'symptom-bas':2588 'syntax':2690,2698 'system':42,91 'tabl':691 'target':197,220,233,298,326,339,355,373,388,505,524,2639,2715 'targetlabel':1755,1761,1782,1855,1863 'team':243,459,461,466,560,1026,1069,1129,1198,1232,1302,1928,2669 'temperatur':142 'templat':412,417 'term':2568 'test':2648,2704 'text':540,600 'thano':2570 'third':89 'third-parti':88 'threshold':1040,1083 'ticket':1284 'time':64,116,752,1334,1566,1983,2181,2415,2479,2491 'time-seri':63,115,751 'time.now':2312 'time.since':2333 'time.time':2480,2489 'timeout':177,399 'timestamp':705,720,777 'titl':537 'tls':1787,1797,2636 'tlsconfig':1789 'tmpl':419 'top':1535 'topic-agent-skills' 'topic-agentic-workflow' 'topic-ai-integration' 'topic-openclaw' 'topk':1541 'total':605,609,613,616,675,690,708,714,719,731,736,749,817,838,938,1010,1018,1104,1112,1319,1328,1336,1341,1353,1361,1369,1377,1490,1547,1587,1590,1662,1670,1679,1687,1908,1918,1961,2003,2194,2202,2204,2426,2427 'troubleshoot':2684 'true':280,454,588,1785,1791,2146 'tsdb':2737 'type':129,2347,2373 'unbound':699,774 'uniqu':750 'unit':646 'unknown':2498,2507 'unless':1195 'up/down':141 'url':403,408,726,732,1274,2612 'us':188,2137 'us-east':187,2136 'usag':144,619,1204,1209,1484 'use':11,82,281,303,604,626,643,760,784,1155,1261,2585 'user':692,701,709,778,984,1000,1086,2377,2524,2526 'user-fac':983,999 'user@example.com':716 'util':929,948,955,963 'v2.45.0':2057 'valid':2653 'valu':126,133,137,687,743,775,1038,1081,1149,1211,1243,1497,1506,1942,1982,2724 'var':2191 'version':1854,1856,2056 'view':2736 'volumeclaimtempl':2103 'vs':978,2544 'w':2307,2326,2365 'w.header':2369 'w.write':2375 'wait':432,482 'warn':494,508,1068,1192,1231,1283,1968 'web':2129 'wiki.example.com':1044,1950 'wiki.example.com/runbooks/high-error-rate':1043,1949 'window':1335,1652 'wrap':2316,2323,2330 'wrapped.statuscode':2345 'writehead':2356 'writer':2318 'x':1118 'yaml':170,395,669,697,793,989,1160,1697,1805,1871,2040 'yml':206,208,2703","prices":[{"id":"a87baf4f-bf06-438d-8fbb-f46b973ff48e","listingId":"ba64f3bc-1c7b-4763-aae7-749b0476c27b","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"CloudChef","category":"atlasclaw-providers","install_from":"skills.sh"},"createdAt":"2026-05-09T01:05:33.621Z"}],"sources":[{"listingId":"ba64f3bc-1c7b-4763-aae7-749b0476c27b","source":"github","sourceId":"CloudChef/atlasclaw-providers/prometheus","sourceUrl":"https://github.com/CloudChef/atlasclaw-providers/tree/main/skills/prometheus","isPrimary":false,"firstSeenAt":"2026-05-09T01:05:33.621Z","lastSeenAt":"2026-05-18T19:08:23.564Z"}],"details":{"listingId":"ba64f3bc-1c7b-4763-aae7-749b0476c27b","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"CloudChef","slug":"prometheus","github":{"repo":"CloudChef/atlasclaw-providers","stars":10,"topics":["agent-skills","agentic-workflow","ai-integration","openclaw"],"license":"apache-2.0","html_url":"https://github.com/CloudChef/atlasclaw-providers","pushed_at":"2026-05-18T03:15:37Z","description":"atlasclaw-providers are the integration with enterprise systems through skills and webhook.","skill_md_sha":"7bf6961b2071522fecd56f0997b205eb1e3e9ce6","skill_md_path":"skills/prometheus/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/CloudChef/atlasclaw-providers/tree/main/skills/prometheus"},"layout":"multi","source":"github","category":"atlasclaw-providers","frontmatter":{"name":"prometheus","description":"Prometheus monitoring and alerting for cloud-native observability. Use when implementing metrics collection, PromQL queries, alerting rules, service discovery, recording rules, and scrape config."},"skills_sh_url":"https://skills.sh/CloudChef/atlasclaw-providers/prometheus"},"updatedAt":"2026-05-18T19:08:23.564Z"}}