Skillquality 0.45

pilot-heartbeat-monitor

Detect agent failures and trigger automatic task redistribution or re-election. Use this skill when: 1. You need to detect when swarm members become unreachable 2. You want to trigger failover actions on agent failure 3. You need health monitoring for load balancing or leader el

Price
free
Protocol
skill
Verified
no

What it does

pilot-heartbeat-monitor

Monitor agent health via periodic heartbeats and detect failures using timeout-based detection.

Commands

Send heartbeat to peers

pilotctl --json publish "registry-hostname" "heartbeat:$SWARM_NAME" \
  --data "{\"agent\":\"$AGENT_ID\",\"timestamp\":\"$(date -u +%s)\",\"status\":\"alive\"}"

Detect failed agents

TIMEOUT=30
CURRENT_TIME=$(date -u +%s)

FAILED_AGENTS=$(pilotctl --json inbox \
  | jq --arg now "$CURRENT_TIME" --arg timeout "$TIMEOUT" \
    '[.messages[] | select(.topic == "heartbeat:'$SWARM_NAME'") | {agent: .payload.agent, last_seen: .payload.timestamp}]
    | group_by(.agent)
    | map(select(($now | tonumber) - (map(.last_seen) | max) > ($timeout | tonumber)))
    | .[].agent')

Verify failure with direct ping

for agent in $FAILED_AGENTS; do
  AGENT_ADDR=$(pilotctl --json peers | jq -r '.[] | select(.node_id == "'$agent'") | .address')

  PING_RESULT=$(pilotctl --json ping "$AGENT_ADDR" --count 3 --timeout 2s)
  LOSS=$(echo "$PING_RESULT" | jq -r '.packet_loss_pct')

  if [ "$LOSS" = "100" ]; then
    echo "Agent $agent CONFIRMED DOWN"
  fi
done

Workflow Example

Health monitor for worker pool with automatic task redistribution:

#!/bin/bash
SWARM_NAME="worker-pool"
HEARTBEAT_INTERVAL=5
FAILURE_TIMEOUT=15
REGISTRY_HOST="registry.example.com"

# Background: Send own heartbeats
(
  while true; do
    pilotctl --json publish "$REGISTRY_HOST" "heartbeat:$SWARM_NAME" \
      --data "{\"agent\":\"$AGENT_ID\",\"timestamp\":\"$(date -u +%s)\"}"
    sleep $HEARTBEAT_INTERVAL
  done
) &

# Monitor peer heartbeats
while true; do
  CURRENT_TIME=$(date -u +%s)

  # Detect timeouts and trigger recovery
  # ...

  sleep $HEARTBEAT_INTERVAL
done

Dependencies

Requires pilot-protocol skill, jq, and bc.

Capabilities

skillsource-teoslayerskill-pilot-heartbeat-monitortopic-agent-skillstopic-ai-agentstopic-clawhubtopic-networkingtopic-openclawtopic-overlay-networktopic-p2ptopic-pilot-protocol

Install

Quality

0.45/ 1.00

deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 6 github stars · SKILL.md body (1,829 chars)

Provenance

Indexed fromgithub
Enriched2026-05-18 19:14:56Z · deterministic:skill-github:v1 · v1
First seen2026-05-18
Last seen2026-05-18

Agent access