{"id":"2da5d6e2-4ef5-4bd9-a33b-81af64225d19","shortId":"s7A4Xw","kind":"skill","title":"review-observability","tagline":"Use when the user asks for an observability review, telemetry review, logging review, metrics review, tracing review, OpenTelemetry review, OTel review, Prometheus review, \"are we observable\", logging audit, structured logging audit, trace coverage check, span coverage check, met","description":"# Observability Review\n\nStructured review of logging, metrics, and tracing sufficiency. Producing actionable, prioritized findings with code-level references.\n\n**Out of scope** (defer to siblings):\n- Health/readiness probes, gRPC health, retry/timeout semantics → `review-reliability`\n- Log injection, PII leakage, log-as-attack-channel → `review-security` (but flag obvious secret-in-log here as well)\n- Dashboard / alert authoring (where to set thresholds) → typically out of scope; observability review covers signal *availability*, not alert tuning\n\n## Workflow\n\n### 1. Scope and explore\n\n- Confirm scope with the user: full codebase, specific packages/directories, changed files only (PR or branch diff), or specific concern.\n- **Resolve scope to a file/package list.** Based on what the user requested:\n  - **Changed files (PR or branch):** Run `git diff --name-only --diff-filter=d <base>...HEAD`. If the user references a PR number, use `gh pr diff <number> --name-only`. Filter to source files (`.go`, `.ts`, `.tsx`, `.js`, `.py`, etc.) and configs (`prometheus.yml`, `otel-collector-config.yaml`).\n  - **Explicit paths/packages:** Include all files under given directories.\n  - **Full codebase:** No filtering.\n- **If invoked from review-all**: receive `file_list`, `package_paths`, `has_changes`, `base_ref`, `REVIEW_DIR`, and `pr_url` from the orchestrator. Skip your own scope confirmation.\n- **Pass the resolved scope** to all exploration and investigation subagents.\n\n### 2. System overview\n\nProduce a brief telemetry summary covering:\n- **Logging library/sink**: structured (`slog`, `zap`, `zerolog`, `logrus`) vs. unstructured (`fmt.Println`, `log.Printf`). Output sink (stdout, file, syslog, log aggregator).\n- **Metrics library/exporter**: Prometheus client, OTel metrics, statsd. Where scraping happens.\n- **Tracing library/exporter**: OTel tracer, vendor-specific (Datadog, Honeycomb, Jaeger). Sampling strategy.\n- **Correlation**: are trace IDs threaded through logs (e.g. `slog.With(\"trace_id\", ...)`)? Are span context propagated across RPC boundaries (`otelgrpc`, `otelhttp`)?\n- **Collector / pipeline**: is there an OTel Collector in the path? What processors and exporters?\n\n### 3. Launch investigation subagent\n\nLaunch a single investigation subagent (`subagent_type=\"generalPurpose\"`, `model: sonnet` per `subagent-model-routing`) with the system overview and in-scope file list.\n\nPrompt it to:\n- Read all in-scope source files and any observability config files (Prometheus, OTel Collector, dashboard JSON).\n- Apply the checklists in [reference.md](reference.md):\n  - Logging sufficiency and structure\n  - Metrics sufficiency and cardinality\n  - Tracing sufficiency and propagation\n  - Correlation across signals\n- Identify gaps: RPC handlers without spans, error paths without log entries, hot loops emitting per-iteration metrics, unbounded label values.\n- For each finding, search nearby code for existing tracking (TODO/FIXME/HACK).\n- Return findings using the **observability findings** template.\n\n### 4. Run static analyzers (when applicable)\n\nMost observability gaps are not lint-discoverable; the work is semantic. A few mechanical checks:\n\n```sh\n# Go: find packages spawning goroutines/RPC handlers without otel instrumentation\n# (these are heuristics for the subagent to triage, not gates)\ngrep -rl \"http.HandleFunc\\|grpc.NewServer\" --include='*.go' <paths>\ngrep -rL \"otelhttp\\|otelgrpc\\|otel\\.Tracer\" --include='*.go' <paths>\n\n# Cardinality smell: dynamic strings in metric labels\ngrep -rn \"\\.With(prometheus\\.Labels{\" --include='*.go' <paths>\ngrep -rn \"labels\\.NewBuilder\\|labels.FromMap\" --include='*.go' <paths>\n```\n\nThese are inputs for the subagent. False positives are expected.\n\n### 5. Present results\n\nResolve the review output directory (same pattern as siblings):\n\n```sh\nREVIEW_DATE=$(date +%Y-%m-%d)\nREVIEW_DIR=\"reviews/${REVIEW_DATE}\"\nif [ -d \"$REVIEW_DIR\" ]; then REVIEW_DIR=\"reviews/${REVIEW_DATE}-$(date +%H%M)\"; fi\nmkdir -p \"$REVIEW_DIR\"\n```\n\nCapture run metadata (see [Run metadata header](#run-metadata-header)) and prepend to `${REVIEW_DIR}/OBSERVABILITY-REVIEW.md`.\n\nOutput structure:\n1. Run metadata header\n2. Telemetry overview (from step 2)\n3. Findings table (grouped by signal: logging / metrics / tracing / correlation)\n4. Recommended fix order\n\nPresent the report to the user.\n\n---\n\n## Run metadata header\n\n```sh\nRUN_DATETIME=$(date -u +\"%Y-%m-%d %H:%M UTC\")\nGIT_BRANCH=$(git rev-parse --abbrev-ref HEAD)\nGIT_COMMIT=$(git rev-parse --short HEAD)\nGIT_COMMIT_FULL=$(git rev-parse HEAD)\nGIT_SUBJECT=$(git log -1 --pretty=%s)\n# When scope is diff-based: BASE_REF=<base>; BASE_COMMIT=$(git rev-parse --short \"$BASE_REF\")\n```\n\n```markdown\n> **Run:** {RUN_DATETIME}\n> **Branch:** {GIT_BRANCH} @ {GIT_COMMIT} (`{GIT_COMMIT_FULL}`)\n> **Subject:** {GIT_SUBJECT}\n> **Base:** {BASE_REF} @ {BASE_COMMIT}   <!-- omit when scope is not diff-based -->\n> **Scope:** {scope description}\n```\n\n---\n\n## Finding link wrapping (PR mode)\n\nWhen `pr_url` is provided (or `gh pr view --json url -q .url 2>/dev/null` returns one for standalone runs), wrap every `path:line` reference inside the finding tables as a Markdown link:\n\n```sh\n~/.claude/scripts/pr-deeplink.sh \"$pr_url\" <path> <line>\n```\n\nThe display text stays `path:line`. Pass `L` as the fourth argument for findings about removed code. Findings follow `terse-comments`: concrete fix, optional `bug:`/`risk:`/`nit:`/`unsure:` prefix.\n\n---\n\n## Output Templates\n\n### Observability findings\n\n```markdown\n| Priority | Signal | Finding | Impact | Effort | Tracked |\n|----------|--------|---------|--------|--------|---------|\n| P0 | tracing | Description with code references | Impact on debuggability / MTTR | trivial / small / moderate / large | — |\n| P1 | metrics | Description with code references | Cardinality / cost impact | Effort estimate | TODO in file:line |\n```\n\n**Signal column values:** `logging`, `metrics`, `tracing`, `correlation`.\n\n### Re-evaluation table (for follow-up reviews)\n\n```markdown\n| Finding | Status | What Changed |\n|---------|--------|--------------|\n| ~~1. Description~~ | FIXED | Brief explanation of the fix |\n| 2. Description | Still applicable | No changes |\n```\n\n---\n\n## Guidelines\n\n- Observability is about debuggability under failure. The bar is \"could oncall figure out what broke at 3am from these signals alone?\"\n- Cardinality is cost. Flag high-cardinality labels (user IDs, request IDs, raw URLs) on Prometheus metrics; they're fine as log fields and span attributes.\n- Search the organization's codebase for existing logger / tracer / metric patterns before recommending new ones.\n- Include effort estimates.\n- When the user asks for a follow-up review, find the most recent review directory and append a re-evaluation table.\n- For detailed framework categories, see [reference.md](reference.md).\n- **REVIEW.md integration**: If a `REVIEW.md` context section was provided by the review-all orchestrator (or exists at the repository root when running standalone), treat its rules as additional review criteria. \"Always check\" items are HIGH severity; domain-specific items (Observability section) are MEDIUM severity. \"Skip\" patterns exclude matching files from review scope.\n- Findings must cite probed evidence (`path:line`, grep output, command result), not pattern-matched suspicion. Per `~/.claude/rules/probe-not-assume.md`.","tags":["review","observability","skill","issue","paultyng","agent-skills","ai-tools","claude-code","cursor","dotfiles"],"capabilities":["skill","source-paultyng","skill-review-observability","topic-agent-skills","topic-ai-tools","topic-claude-code","topic-cursor","topic-dotfiles"],"categories":["skill-issue"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/paultyng/skill-issue/review-observability","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add paultyng/skill-issue","source_repo":"https://github.com/paultyng/skill-issue","install_from":"skills.sh"}},"qualityScore":"0.454","qualityRationale":"deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (7,372 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T19:09:02.605Z","embedding":null,"createdAt":"2026-05-18T13:21:27.567Z","updatedAt":"2026-05-18T19:09:02.605Z","lastSeenAt":"2026-05-18T19:09:02.605Z","tsv":"'-1':660 '/.claude/rules/probe-not-assume.md':1017 '/.claude/scripts/pr-deeplink.sh':742 '/dev/null':722 '/observability-review.md':583 '1':118,586,836 '2':247,590,595,721,844 '3':330,596 '3am':867 '4':438,606 '5':525 'abbrev':637 'abbrev-ref':636 'across':311,398 'action':53 'addit':974 'aggreg':273 'alert':99,115 'alon':871 'alway':977 'analyz':441 'append':933 'appli':379 'applic':443,847 'argument':756 'ask':8,919 'attack':83 'attribut':897 'audit':31,34 'author':100 'avail':113 'bar':858 'base':147,222,668,669,671,678,695,696,698 'boundari':313 'branch':136,157,631,684,686 'brief':252,839 'broke':865 'bug':770 'captur':567 'cardin':392,494,806,872,878 'categori':942 'chang':131,153,221,835,849 'channel':84 'check':37,40,459,978 'checklist':381 'cite':1002 'client':277 'code':58,426,761,790,804 'code-level':57 'codebas':128,206,902 'collector':316,322,376 'column':816 'command':1009 'comment':766 'commit':641,649,672,688,690,699 'concern':140 'concret':767 'config':194,372 'confirm':122,236 'context':309,951 'correl':296,397,605,821 'cost':807,874 'could':860 'cover':111,255 'coverag':36,39 'criteria':976 'd':167,543,550,626 'dashboard':98,377 'datadog':291 'date':539,540,548,558,559,622 'datetim':621,683 'debugg':794,854 'defer':64 'descript':702,788,802,837,845 'detail':940 'diff':137,160,165,179,667 'diff-bas':666 'diff-filt':164 'dir':225,545,552,555,566,582 'directori':204,532,931 'discover':451 'display':746 'domain':984 'domain-specif':983 'dynam':496 'e.g':303 'effort':784,809,914 'emit':413 'entri':410 'error':406 'estim':810,915 'etc':192 'evalu':824,937 'everi':729 'evid':1004 'exclud':994 'exist':428,904,962 'expect':524 'explan':840 'explicit':197 'explor':121,243 'export':329 'failur':856 'fals':521 'fi':562 'field':894 'figur':862 'file':132,154,186,201,216,270,357,368,373,813,996 'file/package':145 'filter':166,183,208 'find':55,423,432,436,462,597,703,735,758,762,778,782,832,926,1000 'fine':891 'fix':608,768,838,843 'flag':89,875 'fmt.println':265 'follow':763,828,923 'follow-up':827,922 'fourth':755 'framework':941 'full':127,205,650,691 'gap':401,446 'gate':479 'generalpurpos':341 'gh':177,714 'git':159,630,632,640,642,648,651,656,658,673,685,687,689,693 'given':203 'go':187,461,485,493,507,514 'goroutines/rpc':465 'grep':480,486,501,508,1007 'group':599 'grpc':69 'grpc.newserver':483 'guidelin':850 'h':560,627 'handler':403,466 'happen':283 'head':168,639,647,655 'header':573,577,589,618 'health':70 'health/readiness':67 'heurist':472 'high':877,981 'high-cardin':876 'honeycomb':292 'hot':411 'http.handlefunc':482 'id':299,306,881,883 'identifi':400 'impact':783,792,808 'in-scop':354,364 'includ':199,484,492,506,513,913 'inject':77 'input':517 'insid':733 'instrument':469 'integr':947 'investig':245,332,337 'invok':210 'item':979,986 'iter':416 'jaeger':293 'js':190 'json':378,717 'l':752 'label':419,500,505,510,879 'labels.frommap':512 'larg':799 'launch':331,334 'leakag':79 'level':59 'library/exporter':275,285 'library/sink':257 'line':731,750,814,1006 'link':704,740 'lint':450 'lint-discover':449 'list':146,217,358 'log':15,30,33,47,76,81,94,256,272,302,385,409,602,659,818,893 'log-as-attack-channel':80 'log.printf':266 'logger':905 'logrus':262 'loop':412 'm':542,561,625,628 'markdown':680,739,779,831 'match':995,1014 'mechan':458 'medium':990 'met':41 'metadata':569,572,576,588,617 'metric':17,48,274,279,389,417,499,603,801,819,888,907 'mkdir':563 'mode':707 'model':342,347 'moder':798 'mttr':795 'must':1001 'name':162,181 'name-on':161,180 'nearbi':425 'new':911 'newbuild':511 'nit':772 'number':175 'observ':3,11,29,42,109,371,435,445,777,851,987 'obvious':90 'oncal':861 'one':724,912 'opentelemetri':21 'option':769 'orchestr':231,960 'order':609 'organ':900 'otel':23,278,286,321,375,468,490 'otel-collector-config.yaml':196 'otelgrpc':314,489 'otelhttp':315,488 'output':267,531,584,775,1008 'overview':249,352,592 'p':564 'p0':786 'p1':800 'packag':218,463 'packages/directories':130 'pars':635,645,654,676 'pass':237,751 'path':219,325,407,730,749,1005 'paths/packages':198 'pattern':534,908,993,1013 'pattern-match':1012 'per':344,415,1016 'per-iter':414 'pii':78 'pipelin':317 'posit':522 'pr':134,155,174,178,227,706,709,715,743 'prefix':774 'prepend':579 'present':526,610 'pretti':661 'priorit':54 'prioriti':780 'probe':68,1003 'processor':327 'produc':52,250 'prometheus':25,276,374,504,887 'prometheus.yml':195 'prompt':359 'propag':310,396 'provid':712,954 'py':191 'q':719 'raw':884 're':823,890,936 're-evalu':822,935 'read':362 'receiv':215 'recent':929 'recommend':607,910 'ref':223,638,670,679,697 'refer':60,172,732,791,805 'reference.md':383,384,944,945 'reliabl':75 'remov':760 'report':612 'repositori':965 'request':152,882 'resolv':141,239,528 'result':527,1010 'retry/timeout':71 'return':431,723 'rev':634,644,653,675 'rev-pars':633,643,652,674 'review':2,12,14,16,18,20,22,24,26,43,45,74,86,110,213,224,530,538,544,546,547,551,554,556,557,565,581,830,925,930,958,975,998 'review-al':212,957 'review-observ':1 'review-reli':73 'review-secur':85 'review.md':946,950 'risk':771 'rl':481,487 'rn':502,509 'root':966 'rout':348 'rpc':312,402 'rule':972 'run':158,439,568,571,575,587,616,620,681,682,727,968 'run-metadata-head':574 'sampl':294 'scope':63,108,119,123,142,235,240,356,366,664,700,701,999 'scrape':282 'search':424,898 'secret':92 'secret-in-log':91 'section':952,988 'secur':87 'see':570,943 'semant':72,455 'set':103 'sever':982,991 'sh':460,537,619,741 'short':646,677 'sibl':66,536 'signal':112,399,601,781,815,870 'singl':336 'sink':268 'skill' 'skill-review-observability' 'skip':232,992 'slog':259 'slog.with':304 'small':797 'smell':495 'sonnet':343 'sourc':185,367 'source-paultyng' 'span':38,308,405,896 'spawn':464 'specif':129,139,290,985 'standalon':726,969 'static':440 'statsd':280 'status':833 'stay':748 'stdout':269 'step':594 'still':846 'strategi':295 'string':497 'structur':32,44,258,388,585 'subag':246,333,338,339,346,475,520 'subagent-model-rout':345 'subject':657,692,694 'suffici':51,386,390,394 'summari':254 'suspicion':1015 'syslog':271 'system':248,351 'tabl':598,736,825,938 'telemetri':13,253,591 'templat':437,776 'ters':765 'terse-com':764 'text':747 'thread':300 'threshold':104 'todo':811 'todo/fixme/hack':430 'topic-agent-skills' 'topic-ai-tools' 'topic-claude-code' 'topic-cursor' 'topic-dotfiles' 'trace':19,35,50,284,298,305,393,604,787,820 'tracer':287,491,906 'track':429,785 'treat':970 'triag':477 'trivial':796 'ts':188 'tsx':189 'tune':116 'type':340 'typic':105 'u':623 'unbound':418 'unstructur':264 'unsur':773 'url':228,710,718,720,744,885 'use':4,176,433 'user':7,126,151,171,615,880,918 'utc':629 'valu':420,817 'vendor':289 'vendor-specif':288 'view':716 'vs':263 'well':97 'without':404,408,467 'work':453 'workflow':117 'wrap':705,728 'y':541,624 'zap':260 'zerolog':261","prices":[{"id":"05d78e6f-7dbc-45c5-9c73-8cc68f07c1b9","listingId":"2da5d6e2-4ef5-4bd9-a33b-81af64225d19","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"paultyng","category":"skill-issue","install_from":"skills.sh"},"createdAt":"2026-05-18T13:21:27.567Z"}],"sources":[{"listingId":"2da5d6e2-4ef5-4bd9-a33b-81af64225d19","source":"github","sourceId":"paultyng/skill-issue/review-observability","sourceUrl":"https://github.com/paultyng/skill-issue/tree/main/skills/review-observability","isPrimary":false,"firstSeenAt":"2026-05-18T13:21:27.567Z","lastSeenAt":"2026-05-18T19:09:02.605Z"}],"details":{"listingId":"2da5d6e2-4ef5-4bd9-a33b-81af64225d19","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"paultyng","slug":"review-observability","github":{"repo":"paultyng/skill-issue","stars":8,"topics":["agent-skills","ai-tools","claude-code","cursor","dotfiles"],"license":"mit","html_url":"https://github.com/paultyng/skill-issue","pushed_at":"2026-05-18T18:26:54Z","description":"Personal Claude Code / Cursor agent skills, rules, and config","skill_md_sha":"c5a33155658840ad942afef9b79fbe0917292989","skill_md_path":"skills/review-observability/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/paultyng/skill-issue/tree/main/skills/review-observability"},"layout":"multi","source":"github","category":"skill-issue","frontmatter":{"name":"review-observability","description":"Use when the user asks for an observability review, telemetry review, logging review, metrics review, tracing review, OpenTelemetry review, OTel review, Prometheus review, \"are we observable\", logging audit, structured logging audit, trace coverage check, span coverage check, metrics cardinality review, or production observability assessment."},"skills_sh_url":"https://skills.sh/paultyng/skill-issue/review-observability"},"updatedAt":"2026-05-18T19:09:02.605Z"}}