{"id":"792d838c-ddc6-4dc5-a570-28c0ea569c85","shortId":"TyrANL","kind":"skill","title":"swarm-self-heal","tagline":"Swarm reliability watchdog for OpenClaw — validates gateway/channel and every lane, performs bounded recovery, and emits auditable receipts.","description":"## When to use this skill\n\nUse this skill when the user wants to:\n- Diagnose why a multi-agent swarm feels \"stuck\" or partially offline\n- Check gateway + channel + lane liveness in one run\n- Perform bounded auto-recovery (restart + retry only)\n- Capture auditable receipts for incident timelines\n- Keep a primary watchdog lane plus a backup lane in place\n\n## Commands\n\n```bash\n# Install/refresh watchdog scripts + cron wiring\nbash skills/swarm-self-heal/scripts/setup.sh\n\n# Run an immediate canary check\nbash skills/swarm-self-heal/scripts/check.sh\n\n# Run watchdog directly (uses deployed workspace path)\nbash ~/.openclaw/workspace-studio/scripts/anvil_watchdog.sh\n\n# Optional: increase lane ping timeout for slower providers\nPING_TIMEOUT_SECONDS=180 bash ~/.openclaw/workspace-studio/scripts/anvil_watchdog.sh\n```\n\n## What it checks\n\n- **Gateway health** via `openclaw health`\n- **Channel readiness** via `openclaw channels status --json --probe`\n- **Passive lane recency** via `openclaw status --json` (latest OpenClaw-compatible)\n- **Active lane probe only when stale** for `main`, `builder-1`, `builder-2`, `reviewer`, `designer`\n- **Bounded recovery** with a single restart pass + targeted re-probe of infra failures\n\n## Output contract\n\nThe watchdog output includes:\n- `timestamp`\n- `targets`\n- `ok_agents`\n- `failed_agents`\n- `actions`\n- `VERDICT`\n- `RECEIPT`\n\n## Safety model\n\n- Bounded recovery only (single restart pass per run)\n- No destructive state wipes\n- No blind reinstall behavior\n- Recovery actions are explicit in output\n\n## Notes\n\n- Cron wiring sets both primary and backup watchdog lanes to `xhigh` thinking.\n- Telegram target is auto-derived from config when available, with a safe fallback.\n- Healthy runs can be summarized as a single line to reduce operator noise.","tags":["swarm","self","heal","cacheforge","skills","cacheforge-ai","agent-skills","ai-agents","clawhub","devops","discord-v2","kubernetes"],"capabilities":["skill","source-cacheforge-ai","skill-swarm-self-heal","topic-agent-skills","topic-ai-agents","topic-cacheforge","topic-clawhub","topic-devops","topic-discord-v2","topic-kubernetes","topic-openclaw","topic-prometheus"],"categories":["cacheforge-skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/cacheforge-ai/cacheforge-skills/swarm-self-heal","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add cacheforge-ai/cacheforge-skills","source_repo":"https://github.com/cacheforge-ai/cacheforge-skills","install_from":"skills.sh"}},"qualityScore":"0.454","qualityRationale":"deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,789 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T19:09:04.901Z","embedding":null,"createdAt":"2026-05-18T13:14:39.142Z","updatedAt":"2026-05-18T19:09:04.901Z","lastSeenAt":"2026-05-18T19:09:04.901Z","tsv":"'-1':155 '-2':157 '/.openclaw/workspace-studio/scripts/anvil_watchdog.sh':104,118 '180':116 'action':186,208 'activ':146 'agent':40,183,185 'audit':20,64 'auto':58,230 'auto-deriv':229 'auto-recoveri':57 'avail':235 'backup':76,220 'bash':81,87,94,103,117 'behavior':206 'blind':204 'bound':16,56,160,191 'builder':154,156 'canari':92 'captur':63 'channel':49,127,131 'check':47,93,121 'command':80 'compat':145 'config':233 'contract':175 'cron':85,214 'deploy':100 'deriv':231 'design':159 'destruct':200 'diagnos':35 'direct':98 'emit':19 'everi':13 'explicit':210 'fail':184 'failur':173 'fallback':239 'feel':42 'gateway':48,122 'gateway/channel':11 'heal':4 'health':123,126 'healthi':240 'immedi':91 'incid':67 'includ':179 'increas':106 'infra':172 'install/refresh':82 'json':133,141 'keep':69 'lane':14,50,73,77,107,136,147,222 'latest':142 'line':248 'live':51 'main':153 'model':190 'multi':39 'multi-ag':38 'nois':252 'note':213 'offlin':46 'ok':182 'one':53 'openclaw':9,125,130,139,144 'openclaw-compat':143 'oper':251 'option':105 'output':174,178,212 'partial':45 'pass':166,196 'passiv':135 'path':102 'per':197 'perform':15,55 'ping':108,113 'place':79 'plus':74 'primari':71,218 'probe':134,148,170 'provid':112 're':169 're-prob':168 'readi':128 'receipt':21,65,188 'recenc':137 'recoveri':17,59,161,192,207 'reduc':250 'reinstal':205 'reliabl':6 'restart':60,165,195 'retri':61 'review':158 'run':54,89,96,198,241 'safe':238 'safeti':189 'script':84 'second':115 'self':3 'set':216 'singl':164,194,247 'skill':26,29 'skill-swarm-self-heal' 'skills/swarm-self-heal/scripts/check.sh':95 'skills/swarm-self-heal/scripts/setup.sh':88 'slower':111 'source-cacheforge-ai' 'stale':151 'state':201 'status':132,140 'stuck':43 'summar':244 'swarm':2,5,41 'swarm-self-h':1 'target':167,181,227 'telegram':226 'think':225 'timelin':68 'timeout':109,114 'timestamp':180 'topic-agent-skills' 'topic-ai-agents' 'topic-cacheforge' 'topic-clawhub' 'topic-devops' 'topic-discord-v2' 'topic-kubernetes' 'topic-openclaw' 'topic-prometheus' 'use':24,27,99 'user':32 'valid':10 'verdict':187 'via':124,129,138 'want':33 'watchdog':7,72,83,97,177,221 'wipe':202 'wire':86,215 'workspac':101 'xhigh':224","prices":[{"id":"2a4be9fb-5aa2-4f51-ab5a-52e6871b4c3c","listingId":"792d838c-ddc6-4dc5-a570-28c0ea569c85","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"cacheforge-ai","category":"cacheforge-skills","install_from":"skills.sh"},"createdAt":"2026-05-18T13:14:39.142Z"}],"sources":[{"listingId":"792d838c-ddc6-4dc5-a570-28c0ea569c85","source":"github","sourceId":"cacheforge-ai/cacheforge-skills/swarm-self-heal","sourceUrl":"https://github.com/cacheforge-ai/cacheforge-skills/tree/main/skills/swarm-self-heal","isPrimary":false,"firstSeenAt":"2026-05-18T13:14:39.142Z","lastSeenAt":"2026-05-18T19:09:04.901Z"}],"details":{"listingId":"792d838c-ddc6-4dc5-a570-28c0ea569c85","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"cacheforge-ai","slug":"swarm-self-heal","github":{"repo":"cacheforge-ai/cacheforge-skills","stars":8,"topics":["agent-skills","ai-agents","cacheforge","clawhub","devops","discord-v2","kubernetes","openclaw","prometheus"],"license":"mit","html_url":"https://github.com/cacheforge-ai/cacheforge-skills","pushed_at":"2026-02-22T20:49:48Z","description":"⚡ SOTA agent skills for OpenClaw — observability, security, code quality, incident response, and more. Built by Anvil AI.","skill_md_sha":"78db63fb31234f021809f5177c7b97eecb6ae78f","skill_md_path":"skills/swarm-self-heal/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/cacheforge-ai/cacheforge-skills/tree/main/skills/swarm-self-heal"},"layout":"multi","source":"github","category":"cacheforge-skills","frontmatter":{"name":"swarm-self-heal","license":"MIT","description":"Swarm reliability watchdog for OpenClaw — validates gateway/channel and every lane, performs bounded recovery, and emits auditable receipts."},"skills_sh_url":"https://skills.sh/cacheforge-ai/cacheforge-skills/swarm-self-heal"},"updatedAt":"2026-05-18T19:09:04.901Z"}}