analyze-knowledge
>-
What it does
Analyze Knowledge
Analyze git history to surface code expertise, suggest reviewers, and assess knowledge concentration risk.
1. Determine Mode and Scope
Auto-detect from user intent:
| Mode | Triggers | Default scope |
|---|---|---|
| Reviewer | "who should review", "find reviewers", "suggest reviewers" | Changed files: git diff --name-only $(git merge-base HEAD main)..HEAD |
| Distribution | "who knows", "lottery factor", "bus factor", "knowledge spread", "expertise" | User-specified directory, or repo root |
If scope is ambiguous, ask. Accept:
- Explicit paths or globs
- A PR number (extract changed files via
gh pr diff --name-only) - A directory
- "whole repo" (default for distribution mode)
Exclude generated files: check for // Code generated or DO NOT EDIT headers.
2. Normalize Authors
# Check for .mailmap
if [ -f .mailmap ]; then
MAILMAP_FLAG="--use-mailmap"
else
MAILMAP_FLAG=""
fi
- Pass
$MAILMAP_FLAGto all git commands. - If no
.mailmap, group by author name (not email) to merge multiple addresses. - Filter bots: exclude authors matching
[bot],dependabot,renovate,github-actions, or emails containingnoreply@github.comwith non-human names.
2.5. Fan out for broad scope
When scope is large, delegate per-area data collection to subagents so main context never sees raw blame/log output for hundreds of files. Aligns with parallelize-subagents (delegate when output would flood main context) and delegate-investigation (read-only history work belongs in Explore).
| Trigger | Strategy |
|---|---|
| Distribution mode, "whole repo" or >20 directories in scope | One Explore subagent per top-level directory; each runs its tiered collection and returns the per-area summary only. |
| Reviewer mode, >20 changed files | Batch files in groups of ~10; one Explore subagent per batch returning per-file expert lists. |
| Tier 3 line-level on >5 files | One Explore subagent per file (blame output is voluminous). |
| Anything smaller | Run inline. Don't spin up a subagent for one git shortlog. |
Each subagent prompt follows subagent-prompt-contract:
- Goal: "Return top-N authors per file in <area>, with weighted contribution scores"
- Inline context: paths,
$MAILMAP_FLAGstate, the recency weighting table from step 4, thegitcommand(s) to run - Output shape: Markdown table only, ≤30 lines
- Constraints: read-only (
Exploreenforces this); do not spawn further subagents - Model:
haiku(persubagent-model-routing) — mechanical aggregation; output is bounded and requires no architectural reasoning - Return: prefix the summary with
Status: DONE | DONE_WITH_CONCERNS | BLOCKED | NEEDS_CONTEXT
Parent merges per-area returns into the final table in step 5.
3. Collect Data (Tiered)
Tier 1: Commit counts (always run, fast)
git shortlog -sne --no-merges $MAILMAP_FLAG -- <paths> | head -20
Use this for a quick overview. Sufficient when the user just wants a rough sense.
Tier 2: File-level changes (default)
git log --no-merges --since="2 years ago" $MAILMAP_FLAG \
--format='COMMIT:%H|%an|%aI' --numstat -- <paths>
Parse with jq pipelines from reference.md. This produces per-author, per-file change counts with timestamps for recency weighting.
For large repos or broad scope, add --since="1 year ago" to keep it fast.
Tier 3: Line-level blame (opt-in, small file sets only)
Only use when:
- User explicitly asks for deep/line-level analysis
- Scope is ≤15 files
git blame --porcelain $MAILMAP_FLAG <file> | \
awk '/^author /{print}' | sort | uniq -c | sort -rn
For >5 files, fan out per step 2.5 (one Explore subagent per file). For ≤5 files, parallelize inline with xargs -P4.
4. Analyze
Recency Weighting
Weight contributions by age:
| Age | Weight |
|---|---|
| < 6 months | 1.0 |
| 6–12 months | 0.7 |
| 12–24 months | 0.4 |
| > 24 months | 0.1 |
Reviewer Mode
- For each file in scope, compute weighted contribution score per author
- Aggregate across all files: an author touching many of the changed files scores higher
- Exclude the PR author (current
git config user.name) from suggestions - Rank by score, present top 3–5
- For each reviewer, list the directories/files where they have the most expertise
Distribution Mode
- Group files by directory (auto-detect depth: <20 files → no grouping, 20–200 → 1 level, 200+ → 2 levels)
- Per area, compute:
- Top experts: authors ranked by weighted contribution
- Lottery factor: count of authors contributing ≥10% of weighted changes
- Concentration: percentage of changes from the top contributor
- Flag risk levels:
- HIGH: lottery factor = 1 (single expert)
- MEDIUM: lottery factor = 2
- LOW: lottery factor ≥ 3
5. Present Results
Output directly to terminal. No file output.
Reviewer Mode
## Suggested Reviewers (<N> files from <source>)
| Rank | Reviewer | Score | Key areas | Last active |
|------|----------------|-------|-----------------------------|--------------|
| 1 | Alice Smith | 0.82 | internal/auth/, pkg/tokens/ | 2 weeks ago |
| 2 | Bob Jones | 0.65 | internal/auth/ | 1 month ago |
| 3 | Carol Lee | 0.41 | pkg/tokens/, cmd/server/ | 3 months ago |
Distribution Mode
## Knowledge Distribution: <scope>
| Area | Top experts | Lottery factor | Risk |
|----------------------|-----------------------------------|----------------|--------|
| internal/auth/oauth/ | Alice (0.9) | 1 | HIGH |
| internal/auth/jwt/ | Alice (0.6), Bob (0.5) | 2 | MEDIUM |
| internal/auth/rbac/ | Bob (0.4), Carol (0.3), Dan (0.3) | 3 | LOW |
### Concentration Warnings
- **internal/auth/oauth/**: single expert; Alice authored 94% of recent changes
- **internal/auth/jwt/**: two experts with significant overlap
Edge Cases
- Squash-imported repo: if total unique authors < 3 or median commits/author < 2, warn that history may not be representative
- Monorepo with many contributors: use tier 2 scoped to paths + time window; never run tier 3 on broad scope
- No changes in scope (reviewer mode): fall back to directory-level analysis of the files' parent directories
Guidelines
- Use
jqfor all JSON/pipeline processing (see reference.md for templates) - Present findings conversationally: tables for data, plain text for warnings and recommendations
- When suggesting reviewers, mention why each person is suggested (which files/areas they know)
- For distribution analysis, lead with the highest-risk areas
- Per
~/.claude/rules/probe-not-assume.md: confirm via tool/command before recommending; do not infer.
Capabilities
Install
Quality
deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (6,951 chars)