{"id":"5f81553f-0a5e-4c52-8071-59a822da5340","shortId":"zeNHUY","kind":"skill","title":"sre-agent","tagline":">-","description":"# sre-agent\n\n## Description\n\nSRE Agent. Four operating modes, which can invoke each other.\n\n## Setup\n\nBefore using sre-agent, configure the following:\n\n| Variable | Description | Required For |\n|----------|-------------|-------------|\n| `PAGERDUTY_API_TOKEN` | PagerDuty API v2 Access Key | oncall / diagnosis / patrol |\n| `NOTIFICATION_WEBHOOK_URL` | Notification webhook URL (e.g. Slack, Feishu, Teams) | oncall / patrol notifications |\n| `NOTIFICATION_WEBHOOK_SECRET` | Webhook signing secret (if applicable) | oncall / patrol notifications |\n\nAdditionally, populate `references/infra-context.md` with your infrastructure details:\n- Prometheus/Thanos/VictoriaMetrics endpoints\n- Cloud account IDs and VPC CIDRs\n- Kubernetes cluster contexts\n- Available diagnostic skills\n\n## Mode Routing\n\nRoute to the appropriate mode based on `$ARGUMENTS` or user input characteristics:\n\n| Input Characteristics | Mode | Rules File |\n|----------|------|---------|\n| \"oncall\", \"check alerts\", scheduled trigger | **oncall** | `references/mode-oncall.md` |\n| Contains specific incidents / alert / alert content | **diagnosis** | `references/mode-diagnosis.md` |\n| \"patrol\", \"health check\", \"inspection\" | **patrol** | `references/mode-patrol.md` |\n| \"iterate\", \"retrospective\", \"improve sre-agent\" | **iteration** | `references/mode-iteration.md` |\n| \"check alerts\", \"ack\", \"resolve\", PagerDuty operations | **Use PagerDuty capability directly** | `references/capability-pagerduty.md` |\n\nAfter entering the corresponding mode, the **rules file for that mode must be read** and strictly followed.\n\n## Inter-Mode Call Relations\n\n```\noncall ──invokes──> diagnosis (Triage dispatches Diagnosis Agent for deep investigation)\npatrol ──invokes──> diagnosis (deep analysis of critical-level patrol findings)\ndiagnosis ─references─> patrol-playbook (consults known failure patterns to assist investigation)\noncall ──persists──> known-issues (written after user confirmation)\ndiagnosis ─reads─> known-issues (references known issues)\niteration ─reads/writes─> all references (improves sre-agent itself based on feedback)\n```\n\n## Rules\n\nThe following rules apply across all modes and do not require additional file reads.\n\n### Security Boundary (Read-Only)\n\n**Absolutely prohibited** (in oncall / patrol / diagnosis modes):\n- Do not autonomously call PagerDuty API acknowledge / resolve endpoints\n- Do not perform any infrastructure changes (kubectl apply/delete, argocd sync, AWS resource modifications)\n- Do not restart services, roll back deployments, or modify configurations\n- Do not expose secrets in reports (passwords, tokens, connection strings)\n\n**Allowed**: All GET / list / describe / logs / query read-only operations.\n\n### No Human Intervention Principle\n\nsre-agent is designed for autonomous operation, independent of human interaction.\n\n- **Do not ask the user questions**: Don't ask \"Should I continue investigating?\" or \"Want me to dig deeper?\". Make autonomous decisions and proactively explore all available data sources\n- **Handle blockages independently**: If a data source is inaccessible, try alternative paths; if all paths are blocked, document in the report's `missing_signals`, do not stop and wait for a person\n- **Surface limitations in reports**: When unable to obtain certain information due to permissions or network issues, explicitly annotate in the report what was attempted, why it failed, and how to fill the gap\n\n### Environment and Endpoint Lookup\n\n- All infrastructure context is in `references/infra-context.md`\n- **Never guess endpoint domains or cluster names**; look them up from the reference file\n\n### Out of Scope\n\n- No change operations\n- No service topology inference (Phase 3)\n- No automated remediation (Phase 3)\n\n### Command Execution Standards\n\nThree absolute prohibitions (violations trigger mandatory human review):\n1. **Do not create files using heredoc / cat / echo** -- use the Write tool\n2. **Do not chain multiple commands in Bash** -- no `&&`, `||`, or `;`; one Bash call executes one command only\n3. **Do not add redirections** -- no `2>&1`, `2>/dev/null`, or `> file`\n\nCore principle: simple commands (one command + arguments, no shell syntax) are executed directly; commands with pipes, redirections, or special characters must be written as sh/py scripts using the Write tool first.\n\n### Environment Error Guidance\n\nWhen script execution errors occur (such as missing environment variables, uninstalled tools, or authentication failures), read `references/setup.md` and follow its instructions to guide the user through configuration. Do not guess at solutions.\n\n## Shared Capabilities\n\n- **PagerDuty**: Alert querying and operations across all modes -> `references/capability-pagerduty.md`\n- **Feishu notifications**: Sending notifications from any mode -> `references/capability-feishu.md`\n- **Temp script cleanup**: Cleaning up `.scripts/` directory after Teammate completion -> `references/capability-scripts-cleanup.md`\n\n## Layered Loading\n\n```\nLayer 0: SKILL.md       — loaded on skill trigger (routing + global rules)\nLayer 1: mode-*.md      — Lead reads when entering a mode (orchestration logic)\nLayer 2: role-*.md      — Lead reads when creating a Teammate (role contract, prompt blueprint)\nLayer 3: capability/data — each Teammate reads on demand during execution (tool usage + data)\n```\n\nEach layer is only loaded when needed, avoiding reading all files at once.\n\n## Examples\n\n### Bad Example\n\n```\nUser: oncall\nAgent: What do you want me to do? Should I check alerts? Or do you want to see the patrol report?\n```\n\nProblem: Violates the \"No Human Intervention Principle\". Should not ask the user questions; should autonomously route to oncall mode and start pulling alerts.\n\n### Good Example\n\n```\nUser: oncall\nAgent: [read mode-oncall.md] -> [call PagerDuty API to pull triggered incidents]\n      -> [deduplicate and correlate] -> [triage by severity] -> [dispatch diagnosis agents in parallel]\n      -> [output structured incident_report] -> [Feishu notification]\n```\n\nCorrect: Autonomously routes to oncall mode, executes the full diagnostic pipeline, no human intervention needed.\n\n## References\n\n| File | Layer | Content |\n|------|------|------|\n| `references/mode-oncall.md` | Orchestration | oncall Lead orchestration: architecture, lifecycle, messaging protocol |\n| `references/mode-diagnosis.md` | Orchestration | Direct diagnosis invocation orchestration (simple -> direct, complex -> create Team) |\n| `references/mode-patrol.md` | Orchestration | patrol Lead orchestration: entry discovery, report aggregation, lifecycle |\n| `references/mode-iteration.md` | Orchestration | Iteration methodology (self-learning, diagnosis quality assessment, incident retrospective) |\n| `references/role-entry.md` | Role | Entry: alert pulling (cron poll PagerDuty) |\n| `references/role-triage.md` | Role | Triage: triage dispatch (dedup/correlate/dispatch) |\n| `references/role-diagnosis.md` | Role | Diagnosis: diagnostic investigation (multi-dimensional parallel) |\n| `references/role-patrol-l1.md` | Role | Patrol L1: service discovery + five-domain inspection |\n| `references/role-patrol-l2.md` | Role | Patrol L2: targeted deep inspection |\n| `references/capability-pagerduty.md` | Capability | PagerDuty script usage |\n| `references/capability-feishu.md` | Capability | Feishu notifications (including patrol card templates) |\n| `references/capability-scripts-cleanup.md` | Capability | Temp script cleanup |\n| `references/infra-context.md` | Data | Infrastructure mapping (endpoints, accounts, clusters) |\n| `references/known-issues.md` | Data | Known issues database |\n| `references/report-standard.md` | Data | Unified report standard (incident_report YAML structure + Feishu mapping, shared by Diagnosis + Triage) |\n| `references/known-issue-evidence-standard.md` | Data | expected_evidence quality standard (shared by Triage + iteration mode) |\n| `references/patrol-playbook.md` | Data | Patrol experience database |\n| `references/setup.md` | Data | Installation and configuration (environment variables, required tools, troubleshooting) |","tags":["sre","agent","enterprise","harness","engineering","addxai","agent-skills","ai-agent","ai-engineering","claude-code","code-review","cursor"],"capabilities":["skill","source-addxai","skill-sre-agent","topic-agent-skills","topic-ai-agent","topic-ai-engineering","topic-claude-code","topic-code-review","topic-cursor","topic-devops","topic-enterprise","topic-sre","topic-windsurf"],"categories":["enterprise-harness-engineering"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/addxai/enterprise-harness-engineering/sre-agent","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add addxai/enterprise-harness-engineering","source_repo":"https://github.com/addxai/enterprise-harness-engineering","install_from":"skills.sh"}},"qualityScore":"0.458","qualityRationale":"deterministic score 0.46 from registry signals: · indexed on github topic:agent-skills · 16 github stars · SKILL.md body (8,096 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-04-22T01:02:12.863Z","embedding":null,"createdAt":"2026-04-21T19:04:02.333Z","updatedAt":"2026-04-22T01:02:12.863Z","lastSeenAt":"2026-04-22T01:02:12.863Z","tsv":"'/dev/null':511 '0':613 '1':472,509,623 '2':485,508,510,635 '3':455,460,502,649 'absolut':250,465 'access':37 'account':76,878 'ack':137 'acknowledg':263 'across':235,587 'add':505 'addit':66,242 'agent':3,6,9,23,132,174,225,316,679,727,745 'aggreg':801 'alert':108,116,117,136,583,690,722,818 'allow':299 'altern':365 'analysi':182 'annot':404 'api':32,35,262,732 'appli':234 'applic':62 'apply/delete':273 'appropri':92 'architectur':778 'argocd':274 'argument':96,520 'ask':328,334,709 'assess':812 'assist':199 'attempt':410 'authent':561 'autom':457 'autonom':259,320,346,714,755 'avail':84,352 'avoid':668 'aw':276 'back':284 'bad':675 'base':94,227 'bash':492,496 'block':371 'blockag':356 'blueprint':647 'boundari':246 'call':166,260,497,730 'capability/data':650 'capabl':143,581,856,861,869 'card':866 'cat':479 'certain':395 'chain':488 'chang':271,448 'charact':533 'characterist':100,102 'check':107,123,135,689 'cidr':80 'clean':602 'cleanup':601,872 'cloud':75 'cluster':82,435,879 'command':461,490,500,517,519,527 'complet':608 'complex':790 'configur':24,288,574,920 'confirm':209 'connect':297 'consult':194 'contain':113 'content':118,772 'context':83,426 'continu':337 'contract':645 'core':514 'correct':754 'correl':739 'correspond':149 'creat':475,641,791 'critic':185 'critical-level':184 'cron':820 'data':353,360,660,874,881,886,901,912,917 'databas':884,915 'decis':347 'dedup/correlate/dispatch':828 'dedupl':737 'deep':176,181,853 'deeper':344 'demand':655 'deploy':285 'describ':303 'descript':7,28 'design':318 'detail':72 'diagnosi':40,119,170,173,180,189,210,255,744,785,810,831,898 'diagnost':85,763,832 'dig':343 'dimension':836 'direct':144,526,784,789 'directori':605 'discoveri':799,843 'dispatch':172,743,827 'document':372 'domain':433,846 'due':397 'e.g':48 'echo':480 'endpoint':74,265,422,432,877 'enter':147,629 'entri':798,817 'environ':420,545,556,921 'error':546,551 'evid':903 'exampl':674,676,724 'execut':462,498,525,550,657,760 'expect':902 'experi':914 'explicit':403 'explor':350 'expos':291 'fail':413 'failur':196,562 'feedback':229 'feishu':50,591,752,862,894 'file':105,153,243,443,476,513,671,770 'fill':417 'find':188 'first':544 'five':845 'five-domain':844 'follow':26,162,232,566 'four':10 'full':762 'gap':419 'get':301 'global':620 'good':723 'guess':431,577 'guid':570 'guidanc':547 'handl':355 'health':122 'heredoc':478 'human':311,324,470,704,766 'id':77 'improv':129,222 'inaccess':363 'incid':115,736,750,813,890 'includ':864 'independ':322,357 'infer':453 'inform':396 'infrastructur':71,270,425,875 'input':99,101 'inspect':124,847,854 'instal':918 'instruct':568 'inter':164 'inter-mod':163 'interact':325 'intervent':312,705,767 'investig':177,200,338,833 'invoc':786 'invok':15,169,179 'issu':205,214,217,402,883 'iter':127,133,218,805,909 'key':38 'known':195,204,213,216,882 'known-issu':203,212 'kubectl':272 'kubernet':81 'l1':841 'l2':851 'layer':610,612,622,634,648,662,771 'lead':626,638,776,796 'learn':809 'level':186 'lifecycl':779,802 'limit':388 'list':302 'load':611,615,665 'log':304 'logic':633 'look':437 'lookup':423 'make':345 'mandatori':469 'map':876,895 'md':625,637 'messag':780 'methodolog':806 'miss':377,555 'mode':12,87,93,103,150,156,165,237,256,589,597,624,631,718,759,910 'mode-oncall.md':729 'modif':278 'modifi':287 'multi':835 'multi-dimension':834 'multipl':489 'must':157,534 'name':436 'need':667,768 'network':401 'never':430 'notif':42,45,54,55,65,592,594,753,863 'obtain':394 'occur':552 'oncal':39,52,63,106,111,168,201,253,678,717,726,758,775 'one':495,499,518 'oper':11,140,309,321,449,586 'orchestr':632,774,777,783,787,794,797,804 'output':748 'pagerduti':31,34,139,142,261,582,731,822,857 'parallel':747,837 'password':295 'path':366,369 'patrol':41,53,64,121,125,178,187,192,254,698,795,840,850,865,913 'patrol-playbook':191 'pattern':197 'perform':268 'permiss':399 'persist':202 'person':386 'phase':454,459 'pipe':529 'pipelin':764 'playbook':193 'poll':821 'popul':67 'principl':313,515,706 'proactiv':349 'problem':700 'prohibit':251,466 'prometheus/thanos/victoriametrics':73 'prompt':646 'protocol':781 'pull':721,734,819 'qualiti':811,904 'queri':305,584 'question':331,712 'read':159,211,244,248,307,563,627,639,653,669,728 'read-on':247,306 'reads/writes':219 'redirect':506,530 'refer':190,215,221,442,769 'references/capability-feishu.md':598,860 'references/capability-pagerduty.md':145,590,855 'references/capability-scripts-cleanup.md':609,868 'references/infra-context.md':68,429,873 'references/known-issue-evidence-standard.md':900 'references/known-issues.md':880 'references/mode-diagnosis.md':120,782 'references/mode-iteration.md':134,803 'references/mode-oncall.md':112,773 'references/mode-patrol.md':126,793 'references/patrol-playbook.md':911 'references/report-standard.md':885 'references/role-diagnosis.md':829 'references/role-entry.md':815 'references/role-patrol-l1.md':838 'references/role-patrol-l2.md':848 'references/role-triage.md':823 'references/setup.md':564,916 'relat':167 'remedi':458 'report':294,375,390,407,699,751,800,888,891 'requir':29,241,923 'resolv':138,264 'resourc':277 'restart':281 'retrospect':128,814 'review':471 'role':636,644,816,824,830,839,849 'roll':283 'rout':88,89,619,715,756 'rule':104,152,230,233,621 'schedul':109 'scope':446 'script':539,549,600,604,858,871 'secret':57,60,292 'secur':245 'see':696 'self':808 'self-learn':807 'send':593 'servic':282,451,842 'setup':18 'sever':742 'sh/py':538 'share':580,896,906 'shell':522 'sign':59 'signal':378 'simpl':516,788 'skill':86,617 'skill-sre-agent' 'skill.md':614 'slack':49 'solut':579 'sourc':354,361 'source-addxai' 'special':532 'specif':114 'sre':2,5,8,22,131,224,315 'sre-ag':1,4,21,130,223,314 'standard':463,889,905 'start':720 'stop':381 'strict':161 'string':298 'structur':749,893 'surfac':387 'sync':275 'syntax':523 'target':852 'team':51,792 'teammat':607,643,652 'temp':599,870 'templat':867 'three':464 'token':33,296 'tool':484,543,559,658,924 'topic-agent-skills' 'topic-ai-agent' 'topic-ai-engineering' 'topic-claude-code' 'topic-code-review' 'topic-cursor' 'topic-devops' 'topic-enterprise' 'topic-sre' 'topic-windsurf' 'topolog':452 'tri':364 'triag':171,740,825,826,899,908 'trigger':110,468,618,735 'troubleshoot':925 'unabl':392 'unifi':887 'uninstal':558 'url':44,47 'usag':659,859 'use':20,141,477,481,540 'user':98,208,330,572,677,711,725 'v2':36 'variabl':27,557,922 'violat':467,701 'vpc':79 'wait':383 'want':340,683,694 'webhook':43,46,56,58 'write':483,542 'written':206,536 'yaml':892","prices":[{"id":"5e9da0fb-7152-42da-8046-02d08698db38","listingId":"5f81553f-0a5e-4c52-8071-59a822da5340","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"addxai","category":"enterprise-harness-engineering","install_from":"skills.sh"},"createdAt":"2026-04-21T19:04:02.333Z"}],"sources":[{"listingId":"5f81553f-0a5e-4c52-8071-59a822da5340","source":"github","sourceId":"addxai/enterprise-harness-engineering/sre-agent","sourceUrl":"https://github.com/addxai/enterprise-harness-engineering/tree/main/skills/sre-agent","isPrimary":false,"firstSeenAt":"2026-04-21T19:04:02.333Z","lastSeenAt":"2026-04-22T01:02:12.863Z"}],"details":{"listingId":"5f81553f-0a5e-4c52-8071-59a822da5340","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"addxai","slug":"sre-agent","github":{"repo":"addxai/enterprise-harness-engineering","stars":16,"topics":["agent-skills","ai-agent","ai-engineering","claude-code","code-review","cursor","devops","enterprise","sre","windsurf"],"license":"apache-2.0","html_url":"https://github.com/addxai/enterprise-harness-engineering","pushed_at":"2026-04-17T08:57:37Z","description":"Enterprise-grade AI Agent Skills for software development, DevOps, SRE, security, and product teams. Compatible with Claude Code, Cursor, Windsurf, Gemini CLI, GitHub Copilot, and 30+ AI coding agents.","skill_md_sha":"98e942c4d5aab877df49ddd59e25f4ffce0950a2","skill_md_path":"skills/sre-agent/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/addxai/enterprise-harness-engineering/tree/main/skills/sre-agent"},"layout":"multi","source":"github","category":"enterprise-harness-engineering","frontmatter":{"name":"sre-agent","description":">-"},"skills_sh_url":"https://skills.sh/addxai/enterprise-harness-engineering/sre-agent"},"updatedAt":"2026-04-22T01:02:12.863Z"}}