bisect
Use when hunting a regression, phrases like "bisect", "find the commit that broke X", "this used to work", "regression in test Y", "when did <symptom> start". Also use when escalated from ci-debug-loop because log analysis can't pinpoint the offending change, or when a previously
What it does
Bisect
Drive git bisect to isolate the first commit that introduced a bug, then report it. The skill stops at the report. Fixing belongs to the user with full context.
When to use
- User asks to "bisect", "find the commit that broke X", says "this used to work", or names a regression in a specific test/build target.
ci-debug-loopexhausted log analysis without identifying a clear cause.- A behavior, test, or build that was green at a known earlier point is now red.
Do not use when the bug is obviously in the most recent commit (inspect it), or when the reproducer is too fragile/flaky to drive an automated bisect (fix the flake first).
Inputs (confirm before starting)
Per confirm-before-implementing, gather these before touching git bisect:
- Symptom: one-line description of what's broken (e.g. "TestExchangeToken fails with
unauthorized"). - Reproducer: a shell command that exits 0 when the bug is absent, 1 when present. If the user can't supply one, build it together (see Reproducer strategy).
- Known-bad ref: default
HEAD. - Known-good ref: default oldest commit within the last 2 weeks:
If the user has a tag, branch, or specific date in mind, use it instead.git rev-list --since='2 weeks ago' --reverse HEAD | head -1
Reproducer strategy
The reproducer must work at every commit in the bisect range. The reproducer that exists today often won't: APIs change, test files don't yet exist on old commits, etc. Pick the highest-applicable strategy:
-
External reproducer (preferred). A self-contained shell script in
$TMPDIR(e.g./tmp/repro.sh) that builds and exercises the project from outside the source tree. Hits a stable surface (CLI flag, HTTP endpoint, exported function with stable signature) that survived the bisect range. Examples:# CLI: build and check observable behavior go build -o /tmp/foo ./cmd/foo && /tmp/foo --some-flag | grep -q expected # HTTP: spin up server, hit endpoint, kill go run ./cmd/server & PID=$!; sleep 1; curl -fsS localhost:8080/health; kill $PIDThis survives any in-tree changes because the reproducer doesn't live in the tree.
-
Carry an in-tree reproducer file across checkouts. Stash the reproducer test file,
git stash applyit at each step, run, undo. Workable but brittle; checkouts during bisect can conflict with the stash. Use only when an external reproducer is impossible. -
Narrow the range first. If the reproducer fundamentally requires an API that didn't exist before commit X, set the known-good to X and bisect within the API-stable window. The bisect then can't find regressions older than X. Accept that as a scope limit, don't fight it.
If none of the three apply, bisect is the wrong tool. Read the diff manually instead.
Workflow
-
Resolve refs and reproducer. Confirm the reproducer exits 1 at HEAD (bug present) and 0 at known-good (bug absent). If either fails, the inputs are wrong. Stop and re-gather.
-
Estimate cost. Count commits in range:
git rev-list --count <good>..<bad>Steps ≈
log2(N). If more than ~8 steps (>256 commits), ask the user to narrow the known-good; bisect time grows quickly. -
Snapshot working state. Auto-stash uncommitted changes including untracked files:
STASH_REF="" if ! git diff --quiet || ! git diff --cached --quiet || [ -n "$(git ls-files --others --exclude-standard)" ]; then git stash push -u -m "bisect-auto-$(date +%s)" && STASH_REF=$(git rev-parse stash@{0}) fi ORIG_HEAD=$(git rev-parse --abbrev-ref HEAD) trap 'git bisect reset >/dev/null 2>&1; git checkout "$ORIG_HEAD" 2>/dev/null; [ -n "$STASH_REF" ] && git stash pop 2>/dev/null' EXIT INT TERMThe trap restores state on success, error, or interrupt.
-
Build the wrapper script. Write to
$TMPDIR/bisect-wrapper.sh:#!/bin/sh set -e # Skip commits explicitly marked as intentionally broken. if git log -1 --format=%B | grep -qF '[skip-bisect]'; then exit 125 fi # Optional project regen, uncomment per project needs. # task generate || exit 125 # buf generate || exit 125 # Run the reproducer. Exit 0 = good, 1 = bad, 125 = skip. <USER REPRODUCER COMMAND HERE>chmod +xthe wrapper. Compile errors and missing-tool failures from the reproducer should propagate as exit codes the wrapper passes through; commits that can't even build typically return non-zero from the build step. Convert those to 125 by guarding with|| exit 125if the user wants to skip them rather than treat them as bad. -
Run bisect.
git bisect start <bad> <good> git bisect run "$TMPDIR/bisect-wrapper.sh"git bisect rundrives to completion automatically. Capture the output. -
Read the result. The last line of
git bisect runoutput is<sha> is the first bad commit. Capture the SHA:FIRST_BAD=$(git bisect log | awk '/first bad commit/ {print $2}' | head -1)If
git bisect logdoesn't have it (older git), parse the run output instead. -
Reset state.
git bisect resetreturns HEAD to the original ref. The trap will fire on exit and restore the stash. -
Generate report (see Output).
Edge cases
- Flaky reproducer. If the same commit returns different exit codes on repeat runs, bisect is unreliable. Loop the reproducer N times (e.g. 5) and treat any failure as bad. Wrap the user's command in
for i in 1 2 3 4 5; do <repro> || exit 1; done; exit 0. If still flaky, fix the flake first. - Stale generated files after
git checkoutbetween steps. Add a regen step at the top of the wrapper (task generate,buf generate,npm run generate) before the reproducer. [skip-bisect]commits. Wrapper greps the message and exits 125. This relies on thecommit-per-phaserule's marker convention; assume well-formed history.- Compile/setup failures on old commits often surface as non-zero exits. By default
git bisect runtreats those as "bad". If they're really "can't tell", convert to 125 by guarding setup with|| exit 125in the wrapper. - Working tree dirt. Handled by step 3's auto-stash + trap. Don't skip the trap; Ctrl-C without it leaves the user mid-bisect.
- Merges in the range.
git bisecthandles them by linearizing via the commit graph. No special action. - Refactor renames. The offending commit may be a rename or move that exposes a latent bug elsewhere. Note this in the report; don't claim the rename is the root cause.
Output
Produce a single Markdown report:
## Offending commit
**<sha7>** — <subject>
Author: <name> | Date: <YYYY-MM-DD>
PR: <url if found via `gh pr list --search "<sha>" --state merged`>
## Diff
<output of `git show --stat <sha>` followed by `git show <sha>`>
## Commit message
<full body from `git log -1 --format=%B <sha>`>
## Suggested next step
- [ ] Write a regression test that reproduces <symptom> at HEAD (so the bisect is repeatable)
- [ ] Revert + reimplement, OR fix forward (user's call)
- [ ] Open issue / PR comment if the offending commit was already merged and shipped
Do not auto-revert, auto-fix, or push anything. The skill ends at the report.
Cross-references
commit-per-phaserule defines the[skip-bisect]marker the wrapper looks for.ci-debug-loopskill is the typical escalation source. When log analysis stalls, hand the failing test/command to this skill as the reproducer.verify-when-completeskill produces good reproducer one-liners (task test -- -run '^TestX$').
Anti-patterns
- Running bisect without a deterministic reproducer. Guesses propagate exponentially.
- Skipping the auto-stash and trap. Leaves the user mid-bisect on Ctrl-C.
- Treating compile-error commits as "bad" without checking. That's a setup failure, not the bug.
- Auto-fixing the offending commit. Out of scope for this skill.
- Per
~/.claude/rules/probe-not-assume.md: confirm via tool/command before recommending; do not infer.
Capabilities
Install
Quality
deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (8,138 chars)