devex-review
Live developer experience audit. Uses the browse tool to actually TEST the developer experience: navigates docs, tries the getting started flow, times TTHW, screenshots error messages, evaluates CLI help text. Produces a DX scorecard with evidence. Compares against /plan-devex-re
What it does
Preamble
eval "$(~/.vibestack/bin/vibe-slug 2>/dev/null)" 2>/dev/null || SLUG="unknown"
_LEARN_FILE="${VIBESTACK_HOME:-$HOME/.vibestack}/projects/${SLUG:-unknown}/learnings.jsonl"
if [ -f "$_LEARN_FILE" ]; then
_LEARN_COUNT=$(wc -l < "$_LEARN_FILE" 2>/dev/null | tr -d ' ')
echo "LEARNINGS: $_LEARN_COUNT entries loaded"
if [ "$_LEARN_COUNT" -gt 5 ] 2>/dev/null; then
~/.vibestack/bin/vibe-learnings-search --limit 5 2>/dev/null || true
fi
else
echo "LEARNINGS: none yet"
fi
Step 0: Detect platform and base branch
First, detect the git hosting platform from the remote URL:
git remote get-url origin 2>/dev/null
- If the URL contains "github.com" → platform is GitHub
- If the URL contains "gitlab" → platform is GitLab
- Otherwise, check CLI availability:
gh auth status 2>/dev/nullsucceeds → platform is GitHub (covers GitHub Enterprise)glab auth status 2>/dev/nullsucceeds → platform is GitLab (covers self-hosted)- Neither → unknown (use git-native commands only)
Determine which branch this PR/MR targets, or the repo's default branch if no PR/MR exists. Use the result as "the base branch" in all subsequent steps.
If GitHub:
gh pr view --json baseRefName -q .baseRefName— if succeeds, use itgh repo view --json defaultBranchRef -q .defaultBranchRef.name— if succeeds, use it
If GitLab:
glab mr view -F json 2>/dev/nulland extract thetarget_branchfield — if succeeds, use itglab repo view -F json 2>/dev/nulland extract thedefault_branchfield — if succeeds, use it
Git-native fallback (if unknown platform, or CLI commands fail):
git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's|refs/remotes/origin/||'- If that fails:
git rev-parse --verify origin/main 2>/dev/null→ usemain - If that fails:
git rev-parse --verify origin/master 2>/dev/null→ usemaster
If all fail, fall back to main.
Print the detected base branch name. In every subsequent git diff, git log,
git fetch, git merge, and PR/MR creation command, substitute the detected
branch name wherever the instructions say "the base branch" or <default>.
SETUP
# vibestack does not include a browse daemon.
echo "BROWSE_NOT_AVAILABLE"
If BROWSE_NOT_AVAILABLE: skip all $B commands and use text-only fallbacks (curl, open, direct HTTP checks).
DX First Principles
These are the laws. Every recommendation traces back to one of these.
- Zero friction at T0. First five minutes decide everything. One click to start. Hello world without reading docs. No credit card. No demo call.
- Incremental steps. Never force developers to understand the whole system before getting value from one part. Gentle ramp, not cliff.
- Learn by doing. Playgrounds, sandboxes, copy-paste code that works in context. Reference docs are necessary but never sufficient.
- Decide for me, let me override. Opinionated defaults are features. Escape hatches are requirements. Strong opinions, loosely held.
- Fight uncertainty. Developers need: what to do next, whether it worked, how to fix it when it didn't. Every error = problem + cause + fix.
- Show code in context. Hello world is a lie. Show real auth, real error handling, real deployment. Solve 100% of the problem.
- Speed is a feature. Iteration speed is everything. Response times, build times, lines of code to accomplish a task, concepts to learn.
- Create magical moments. What would feel like magic? Stripe's instant API response. Vercel's push-to-deploy. Find yours and make it the first thing developers experience.
The Seven DX Characteristics
| # | Characteristic | What It Means | Gold Standard |
|---|---|---|---|
| 1 | Usable | Simple to install, set up, use. Intuitive APIs. Fast feedback. | Stripe: one key, one curl, money moves |
| 2 | Credible | Reliable, predictable, consistent. Clear deprecation. Secure. | TypeScript: gradual adoption, never breaks JS |
| 3 | Findable | Easy to discover AND find help within. Strong community. Good search. | React: every question answered on SO |
| 4 | Useful | Solves real problems. Features match actual use cases. Scales. | Tailwind: covers 95% of CSS needs |
| 5 | Valuable | Reduces friction measurably. Saves time. Worth the dependency. | Next.js: SSR, routing, bundling, deploy in one |
| 6 | Accessible | Works across roles, environments, preferences. CLI + GUI. | VS Code: works for junior to principal |
| 7 | Desirable | Best-in-class tech. Reasonable pricing. Community momentum. | Vercel: devs WANT to use it, not tolerate it |
Cognitive Patterns — How Great DX Leaders Think
Internalize these; don't enumerate them.
- Chef-for-chefs — Your users build products for a living. The bar is higher because they notice everything.
- First five minutes obsession — New dev arrives. Clock starts. Can they hello-world without docs, sales, or credit card?
- Error message empathy — Every error is pain. Does it identify the problem, explain the cause, show the fix, link to docs?
- Escape hatch awareness — Every default needs an override. No escape hatch = no trust = no adoption at scale.
- Journey wholeness — DX is discover → evaluate → install → hello world → integrate → debug → upgrade → scale → migrate. Every gap = a lost dev.
- Context switching cost — Every time a dev leaves your tool (docs, dashboard, error lookup), you lose them for 10-20 minutes.
- Upgrade fear — Will this break my production app? Clear changelogs, migration guides, codemods, deprecation warnings. Upgrades should be boring.
- SDK completeness — If devs write their own HTTP wrapper, you failed. If the SDK works in 4 of 5 languages, the fifth community hates you.
- Pit of Success — "We want customers to simply fall into winning practices" (Rico Mariani). Make the right thing easy, the wrong thing hard.
- Progressive disclosure — Simple case is production-ready, not a toy. Complex case uses the same API. SwiftUI: `Button("Save") { save() }` → full customization, same API.
DX Scoring Rubric (0-10 calibration)
| Score | Meaning |
|---|---|
| 9-10 | Best-in-class. Stripe/Vercel tier. Developers rave about it. |
| 7-8 | Good. Developers can use it without frustration. Minor gaps. |
| 5-6 | Acceptable. Works but with friction. Developers tolerate it. |
| 3-4 | Poor. Developers complain. Adoption suffers. |
| 1-2 | Broken. Developers abandon after first attempt. |
| 0 | Not addressed. No thought given to this dimension. |
The gap method: For each score, explain what a 10 looks like for THIS product. Then fix toward 10.
TTHW Benchmarks (Time to Hello World)
| Tier | Time | Adoption Impact |
|---|---|---|
| Champion | < 2 min | 3-4x higher adoption |
| Competitive | 2-5 min | Baseline |
| Needs Work | 5-10 min | Significant drop-off |
| Red Flag | > 10 min | 50-70% abandon |
Hall of Fame Reference
During each review pass, load the relevant section from: `~/.claude/skills/plan-devex-review/dx-hall-of-fame.md`
Read ONLY the section for the current pass (e.g., "## Pass 1" for Getting Started). Do NOT read the entire file at once. This keeps context focused.
Scope Declaration
Browse can test web-accessible surfaces: docs pages, API playgrounds, web dashboards, signup flows, interactive tutorials, error pages.
Browse CANNOT test: CLI install friction, terminal output quality, local environment setup, email verification flows, auth requiring real credentials, offline behavior, build times, IDE integration.
For untestable dimensions, use bash (for CLI --help, README, CHANGELOG) or mark as INFERRED from artifacts. Never guess. State your evidence source for every score.
Step 0: Target Discovery
- Read CLAUDE.md for project URL, docs URL, CLI install command
- Read README.md for getting started instructions
- Read package.json or equivalent for install commands
If URLs are missing, AskUserQuestion: "What's the URL for the docs/product I should test?"
Boomerang Baseline
Check for prior /plan-devex-review scores:
eval "$(~/.vibestack/bin/vibe-slug 2>/dev/null)"
true # vibe-review-read 2>/dev/null | grep plan-devex-review || echo "NO_PRIOR_PLAN_REVIEW"
If prior scores exist, display them. These are your baseline for the boomerang comparison.
Step 1: Getting Started Audit
Navigate to the docs/landing page via browse. Screenshot it.
GETTING STARTED AUDIT
=====================
Step 1: [what dev does] Time: [est] Friction: [low/med/high] Evidence: [screenshot/bash output]
Step 2: [what dev does] Time: [est] Friction: [low/med/high] Evidence: [screenshot/bash output]
...
TOTAL: [N steps, M minutes]
Score 0-10. Load "## Pass 1" from dx-hall-of-fame.md for calibration.
Step 2: API/CLI/SDK Ergonomics Audit
Test what you can:
- CLI: Run
--helpvia bash. Evaluate output quality, flag design, discoverability. - API playground: Navigate via browse if one exists. Screenshot.
- Naming: Check consistency across the API surface.
Score 0-10. Load "## Pass 2" from dx-hall-of-fame.md for calibration.
Step 3: Error Message Audit
Trigger common error scenarios:
- Browse: Navigate to 404 pages, submit invalid forms, try unauthenticated access
- CLI: Run with missing args, invalid flags, bad input
Screenshot each error. Score against the Elm/Rust/Stripe three-tier model.
Score 0-10. Load "## Pass 3" from dx-hall-of-fame.md for calibration.
Step 4: Documentation Audit
Navigate the docs structure via browse:
- Check search functionality (try 3 common queries)
- Verify code examples are copy-paste-complete
- Check language switcher behavior
- Check information architecture (can you find what you need in <2 min?)
Screenshot key findings. Score 0-10. Load "## Pass 4" from dx-hall-of-fame.md.
Step 5: Upgrade Path Audit
Read via bash:
- CHANGELOG quality (clear? user-facing? migration notes?)
- Migration guides (exist? step-by-step?)
- Deprecation warnings in code (grep for deprecated/obsolete)
Score 0-10. Evidence: INFERRED from files. Load "## Pass 5" from dx-hall-of-fame.md.
Step 6: Developer Environment Audit
Read via bash:
- README setup instructions (steps? prerequisites? platform coverage?)
- CI/CD configuration (exists? documented?)
- TypeScript types (if applicable)
- Test utilities / fixtures
Score 0-10. Evidence: INFERRED from files. Load "## Pass 6" from dx-hall-of-fame.md.
Step 7: Community & Ecosystem Audit
Browse:
- Community links (GitHub Discussions, Discord, Stack Overflow)
- GitHub issues (response time, templates, labels)
- Contributing guide
Score 0-10. Evidence: TESTED where web-accessible, INFERRED otherwise.
Step 8: DX Measurement Audit
Check for feedback mechanisms:
- Bug report templates
- NPS or feedback widgets
- Analytics on docs
Score 0-10. Evidence: INFERRED from files/pages.
DX Scorecard with Evidence
+====================================================================+
| DX LIVE AUDIT — SCORECARD |
+====================================================================+
| Dimension | Score | Evidence | Method |
|----------------------|--------|----------|----------|
| Getting Started | __/10 | [screenshots] | TESTED |
| API/CLI/SDK | __/10 | [screenshots] | PARTIAL |
| Error Messages | __/10 | [screenshots] | PARTIAL |
| Documentation | __/10 | [screenshots] | TESTED |
| Upgrade Path | __/10 | [file refs] | INFERRED |
| Dev Environment | __/10 | [file refs] | INFERRED |
| Community | __/10 | [screenshots] | TESTED |
| DX Measurement | __/10 | [file refs] | INFERRED |
+--------------------------------------------------------------------+
| TTHW (measured) | __ min | [step count] | TESTED |
| Overall DX | __/10 | | |
+====================================================================+
Boomerang Comparison
If /plan-devex-review scores exist from the baseline check:
PLAN vs REALITY
================
| Dimension | Plan Score | Live Score | Delta | Alert |
|------------------|-----------|-----------|-------|-------|
| Getting Started | __/10 | __/10 | __ | ⚠/✓ |
| API/CLI/SDK | __/10 | __/10 | __ | ⚠/✓ |
| Error Messages | __/10 | __/10 | __ | ⚠/✓ |
| Documentation | __/10 | __/10 | __ | ⚠/✓ |
| Upgrade Path | __/10 | __/10 | __ | ⚠/✓ |
| Dev Environment | __/10 | __/10 | __ | ⚠/✓ |
| Community | __/10 | __/10 | __ | ⚠/✓ |
| DX Measurement | __/10 | __/10 | __ | ⚠/✓ |
| TTHW | __ min | __ min | __ min| ⚠/✓ |
Flag any dimension where live score < plan score - 2 (reality fell short of plan).
Review Log
PLAN MODE EXCEPTION — ALWAYS RUN:
true # vibe-review-log '{"skill":"devex-review","timestamp":"TIMESTAMP","status":"STATUS","overall_score":N,"product_type":"TYPE","tthw_measured":"TTHW","dimensions_tested":N,"dimensions_inferred":N,"boomerang":"YES_OR_NO","commit":"COMMIT"}'
Review Readiness Dashboard
After completing the review, read the review log and config to display the dashboard.
true # vibe-review-read
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between review (diff-scoped pre-landing review) and plan-eng-review (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between adversarial-review (new auto-scaled) and codex-review (legacy). For Design Review, show whichever is more recent between plan-design-review (full visual audit) and design-review-lite (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. For the Outside Voice row, show the most recent codex-plan-review entry — this captures outside voices from both /plan-ceo-review and /plan-eng-review.
Source attribution: If the most recent entry for a skill has a `"via"` field, append it to the status label in parentheses. Examples: plan-eng-review with via:"autoplan" shows as "CLEAR (PLAN via /autoplan)". review with via:"ship" shows as "CLEAR (DIFF via /ship)". Entries without a via field show as "CLEAR (PLAN)" or "CLEAR (DIFF)" as before.
Note: autoplan-voices and design-outside-voices entries are audit-trail-only (forensic data for cross-model consensus analysis). They do not appear in the dashboard and are not checked by any consumer.
Display:
+====================================================================+
| REVIEW READINESS DASHBOARD |
+====================================================================+
| Review | Runs | Last Run | Status | Required |
|-----------------|------|---------------------|-----------|----------|
| Eng Review | 1 | 2026-03-16 15:00 | CLEAR | YES |
| CEO Review | 0 | — | — | no |
| Design Review | 0 | — | — | no |
| Adversarial | 0 | — | — | no |
| Outside Voice | 0 | — | — | no |
+--------------------------------------------------------------------+
| VERDICT: CLEARED — Eng Review passed |
+====================================================================+
Review tiers:
- Eng Review (required by default): The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with `vibe-config set skip_eng_review true` (the "don't bother me" setting).
- CEO Review (optional): Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup.
- Design Review (optional): Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes.
- Adversarial Review (automatic): Always-on for every review. Every diff gets both Claude adversarial subagent and Codex adversarial challenge. Large diffs (200+ lines) additionally get Codex structured review with P1 gate. No configuration needed.
- Outside Voice (optional): Independent plan review from a different AI model. Offered after all review sections complete in /plan-ceo-review and /plan-eng-review. Falls back to Claude subagent if Codex is unavailable. Never gates shipping.
Verdict logic:
- CLEARED: Eng Review has >= 1 entry within 7 days from either `review` or `plan-eng-review` with status "clean" (or `skip_eng_review` is `true`)
- NOT CLEARED: Eng Review missing, stale (>7 days), or has open issues
- CEO, Design, and Codex reviews are shown for context but never block shipping
- If `skip_eng_review` config is `true`, Eng Review shows "SKIPPED (global)" and verdict is CLEARED
Staleness detection: After displaying the dashboard, check if any existing reviews may be stale:
- Parse the `---HEAD---` section from the bash output to get the current HEAD commit hash
- For each review entry that has a `commit` field: compare it against the current HEAD. If different, count elapsed commits: `git rev-list --count STORED_COMMIT..HEAD`. Display: "Note: {skill} review from {date} may be stale — {N} commits since review"
- For entries without a `commit` field (legacy entries): display "Note: {skill} review from {date} has no commit tracking — consider re-running for accurate staleness detection"
- If all reviews match the current HEAD, do not display any staleness notes
Plan File Review Report
After displaying the Review Readiness Dashboard in conversation output, also update the plan file itself so review status is visible to anyone reading the plan.
Detect the plan file
- Check if there is an active plan file in this conversation (the host provides plan file paths in system messages — look for plan file references in the conversation context).
- If not found, skip this section silently — not every review runs in plan mode.
Generate the report
Read the review log output you already have from the Review Readiness Dashboard step above. Parse each JSONL entry. Each skill logs different fields:
- plan-ceo-review: `status`, `unresolved`, `critical_gaps`, `mode`, `scope_proposed`, `scope_accepted`, `scope_deferred`, `commit` → Findings: "{scope_proposed} proposals, {scope_accepted} accepted, {scope_deferred} deferred" → If scope fields are 0 or missing (HOLD/REDUCTION mode): "mode: {mode}, {critical_gaps} critical gaps"
- plan-eng-review: `status`, `unresolved`, `critical_gaps`, `issues_found`, `mode`, `commit` → Findings: "{issues_found} issues, {critical_gaps} critical gaps"
- plan-design-review: `status`, `initial_score`, `overall_score`, `unresolved`, `decisions_made`, `commit` → Findings: "score: {initial_score}/10 → {overall_score}/10, {decisions_made} decisions"
- plan-devex-review: `status`, `initial_score`, `overall_score`, `product_type`, `tthw_current`, `tthw_target`, `mode`, `persona`, `competitive_tier`, `unresolved`, `commit` → Findings: "score: {initial_score}/10 → {overall_score}/10, TTHW: {tthw_current} → {tthw_target}"
- devex-review: `status`, `overall_score`, `product_type`, `tthw_measured`, `dimensions_tested`, `dimensions_inferred`, `boomerang`, `commit` → Findings: "score: {overall_score}/10, TTHW: {tthw_measured}, {dimensions_tested} tested/{dimensions_inferred} inferred"
- codex-review: `status`, `gate`, `findings`, `findings_fixed` → Findings: "{findings} findings, {findings_fixed}/{findings} fixed"
All fields needed for the Findings column are now present in the JSONL entries. For the review you just completed, you may use richer details from your own Completion Summary. For prior reviews, use the JSONL fields directly — they contain all required data.
Produce this markdown table:
```markdown
VIBESTACK REVIEW REPORT
| Review | Trigger | Why | Runs | Status | Findings |
|---|---|---|---|---|---|
| CEO Review | `/plan-ceo-review` | Scope & strategy | {runs} | {status} | {findings} |
| Codex Review | `/codex review` | Independent 2nd opinion | {runs} | {status} | {findings} |
| Eng Review | `/plan-eng-review` | Architecture & tests (required) | {runs} | {status} | {findings} |
| Design Review | `/plan-design-review` | UI/UX gaps | {runs} | {status} | {findings} |
| DX Review | `/plan-devex-review` | Developer experience gaps | {runs} | {status} | {findings} |
| ``` |
Below the table, add these lines (omit any that are empty/not applicable):
- CODEX: (only if codex-review ran) — one-line summary of codex fixes
- CROSS-MODEL: (only if both Claude and Codex reviews exist) — overlap analysis
- UNRESOLVED: total unresolved decisions across all reviews
- VERDICT: list reviews that are CLEAR (e.g., "CEO + ENG CLEARED — ready to implement"). If Eng Review is not CLEAR and not skipped globally, append "eng review required".
Write to the plan file
PLAN MODE EXCEPTION — ALWAYS RUN: This writes to the plan file, which is the one file you are allowed to edit in plan mode. The plan file review report is part of the plan's living status.
- Search the plan file for a `## VIBESTACK REVIEW REPORT` section anywhere in the file (not just at the end — content may have been added after it).
- If found, replace it entirely using the Edit tool. Match from `## VIBESTACK REVIEW REPORT` through either the next `## ` heading or end of file, whichever comes first. This ensures content added after the report section is preserved, not eaten. If the Edit fails (e.g., concurrent edit changed the content), re-read the plan file and retry once.
- If no such section exists, append it to the end of the plan file.
- Always place it as the very last section in the plan file. If it was found mid-file, move it: delete the old location and append at the end.
{{include lib/snippets/capture-learnings.md}}
Next Steps
After the audit, recommend:
- Fix the gaps found (specific, actionable fixes)
- Re-run /devex-review after fixes to verify improvement
- If boomerang showed significant gaps, re-run /plan-devex-review on the next feature plan
Formatting Rules
- NUMBER issues (1, 2, 3...) and LETTERS for options (A, B, C...).
- Rate every dimension with evidence source.
- Screenshots are the gold standard. File references are acceptable. Guesses are not.
Capabilities
Install
Quality
deterministic score 0.46 from registry signals: · indexed on github topic:agent-skills · 15 github stars · SKILL.md body (23,232 chars)