Skillquality 0.46

devex-review

Live developer experience audit. Uses the browse tool to actually TEST the developer experience: navigates docs, tries the getting started flow, times TTHW, screenshots error messages, evaluates CLI help text. Produces a DX scorecard with evidence. Compares against /plan-devex-re

Price

free

Protocol

skill

Verified

Endpoint

https://skills.sh/timurgaleev/vibestack/devex-review

What it does

Preamble

eval "$(~/.vibestack/bin/vibe-slug 2>/dev/null)" 2>/dev/null || SLUG="unknown"
_LEARN_FILE="${VIBESTACK_HOME:-$HOME/.vibestack}/projects/${SLUG:-unknown}/learnings.jsonl"
if [ -f "$_LEARN_FILE" ]; then
  _LEARN_COUNT=$(wc -l < "$_LEARN_FILE" 2>/dev/null | tr -d ' ')
  echo "LEARNINGS: $_LEARN_COUNT entries loaded"
  if [ "$_LEARN_COUNT" -gt 5 ] 2>/dev/null; then
    ~/.vibestack/bin/vibe-learnings-search --limit 5 2>/dev/null || true
  fi
else
  echo "LEARNINGS: none yet"
fi

Step 0: Detect platform and base branch

First, detect the git hosting platform from the remote URL:

git remote get-url origin 2>/dev/null

If the URL contains "github.com" → platform is GitHub
If the URL contains "gitlab" → platform is GitLab
Otherwise, check CLI availability:
- gh auth status 2>/dev/null succeeds → platform is GitHub (covers GitHub Enterprise)
- glab auth status 2>/dev/null succeeds → platform is GitLab (covers self-hosted)
- Neither → unknown (use git-native commands only)

Determine which branch this PR/MR targets, or the repo's default branch if no PR/MR exists. Use the result as "the base branch" in all subsequent steps.

If GitHub:

gh pr view --json baseRefName -q .baseRefName — if succeeds, use it
gh repo view --json defaultBranchRef -q .defaultBranchRef.name — if succeeds, use it

If GitLab:

glab mr view -F json 2>/dev/null and extract the target_branch field — if succeeds, use it
glab repo view -F json 2>/dev/null and extract the default_branch field — if succeeds, use it

Git-native fallback (if unknown platform, or CLI commands fail):

git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's|refs/remotes/origin/||'
If that fails: git rev-parse --verify origin/main 2>/dev/null → use main
If that fails: git rev-parse --verify origin/master 2>/dev/null → use master

If all fail, fall back to main.

Print the detected base branch name. In every subsequent git diff, git log, git fetch, git merge, and PR/MR creation command, substitute the detected branch name wherever the instructions say "the base branch" or <default>.

SETUP

# vibestack does not include a browse daemon.
echo "BROWSE_NOT_AVAILABLE"

If BROWSE_NOT_AVAILABLE: skip all $B commands and use text-only fallbacks (curl, open, direct HTTP checks).

DX First Principles

These are the laws. Every recommendation traces back to one of these.

Zero friction at T0. First five minutes decide everything. One click to start. Hello world without reading docs. No credit card. No demo call.
Incremental steps. Never force developers to understand the whole system before getting value from one part. Gentle ramp, not cliff.
Learn by doing. Playgrounds, sandboxes, copy-paste code that works in context. Reference docs are necessary but never sufficient.
Decide for me, let me override. Opinionated defaults are features. Escape hatches are requirements. Strong opinions, loosely held.
Fight uncertainty. Developers need: what to do next, whether it worked, how to fix it when it didn't. Every error = problem + cause + fix.
Show code in context. Hello world is a lie. Show real auth, real error handling, real deployment. Solve 100% of the problem.
Speed is a feature. Iteration speed is everything. Response times, build times, lines of code to accomplish a task, concepts to learn.
Create magical moments. What would feel like magic? Stripe's instant API response. Vercel's push-to-deploy. Find yours and make it the first thing developers experience.

The Seven DX Characteristics

#	Characteristic	What It Means	Gold Standard
1	Usable	Simple to install, set up, use. Intuitive APIs. Fast feedback.	Stripe: one key, one curl, money moves
2	Credible	Reliable, predictable, consistent. Clear deprecation. Secure.	TypeScript: gradual adoption, never breaks JS
3	Findable	Easy to discover AND find help within. Strong community. Good search.	React: every question answered on SO
4	Useful	Solves real problems. Features match actual use cases. Scales.	Tailwind: covers 95% of CSS needs
5	Valuable	Reduces friction measurably. Saves time. Worth the dependency.	Next.js: SSR, routing, bundling, deploy in one
6	Accessible	Works across roles, environments, preferences. CLI + GUI.	VS Code: works for junior to principal
7	Desirable	Best-in-class tech. Reasonable pricing. Community momentum.	Vercel: devs WANT to use it, not tolerate it

Cognitive Patterns — How Great DX Leaders Think

Internalize these; don't enumerate them.

Chef-for-chefs — Your users build products for a living. The bar is higher because they notice everything.
First five minutes obsession — New dev arrives. Clock starts. Can they hello-world without docs, sales, or credit card?
Error message empathy — Every error is pain. Does it identify the problem, explain the cause, show the fix, link to docs?
Escape hatch awareness — Every default needs an override. No escape hatch = no trust = no adoption at scale.
Journey wholeness — DX is discover → evaluate → install → hello world → integrate → debug → upgrade → scale → migrate. Every gap = a lost dev.
Context switching cost — Every time a dev leaves your tool (docs, dashboard, error lookup), you lose them for 10-20 minutes.
Upgrade fear — Will this break my production app? Clear changelogs, migration guides, codemods, deprecation warnings. Upgrades should be boring.
SDK completeness — If devs write their own HTTP wrapper, you failed. If the SDK works in 4 of 5 languages, the fifth community hates you.
Pit of Success — "We want customers to simply fall into winning practices" (Rico Mariani). Make the right thing easy, the wrong thing hard.
Progressive disclosure — Simple case is production-ready, not a toy. Complex case uses the same API. SwiftUI: `Button("Save") { save() }` → full customization, same API.

DX Scoring Rubric (0-10 calibration)

Score	Meaning
9-10	Best-in-class. Stripe/Vercel tier. Developers rave about it.
7-8	Good. Developers can use it without frustration. Minor gaps.
5-6	Acceptable. Works but with friction. Developers tolerate it.
3-4	Poor. Developers complain. Adoption suffers.
1-2	Broken. Developers abandon after first attempt.
0	Not addressed. No thought given to this dimension.

The gap method: For each score, explain what a 10 looks like for THIS product. Then fix toward 10.

TTHW Benchmarks (Time to Hello World)

Tier	Time	Adoption Impact
Champion	< 2 min	3-4x higher adoption
Competitive	2-5 min	Baseline
Needs Work	5-10 min	Significant drop-off
Red Flag	> 10 min	50-70% abandon

Hall of Fame Reference

During each review pass, load the relevant section from: `~/.claude/skills/plan-devex-review/dx-hall-of-fame.md`

Read ONLY the section for the current pass (e.g., "## Pass 1" for Getting Started). Do NOT read the entire file at once. This keeps context focused.

Scope Declaration

Browse can test web-accessible surfaces: docs pages, API playgrounds, web dashboards, signup flows, interactive tutorials, error pages.

Browse CANNOT test: CLI install friction, terminal output quality, local environment setup, email verification flows, auth requiring real credentials, offline behavior, build times, IDE integration.

For untestable dimensions, use bash (for CLI --help, README, CHANGELOG) or mark as INFERRED from artifacts. Never guess. State your evidence source for every score.

Step 0: Target Discovery

Read CLAUDE.md for project URL, docs URL, CLI install command
Read README.md for getting started instructions
Read package.json or equivalent for install commands

If URLs are missing, AskUserQuestion: "What's the URL for the docs/product I should test?"

Boomerang Baseline

Check for prior /plan-devex-review scores:

eval "$(~/.vibestack/bin/vibe-slug 2>/dev/null)"
true # vibe-review-read 2>/dev/null | grep plan-devex-review || echo "NO_PRIOR_PLAN_REVIEW"

If prior scores exist, display them. These are your baseline for the boomerang comparison.

Step 1: Getting Started Audit

Navigate to the docs/landing page via browse. Screenshot it.

GETTING STARTED AUDIT
=====================
Step 1: [what dev does]          Time: [est]  Friction: [low/med/high]  Evidence: [screenshot/bash output]
Step 2: [what dev does]          Time: [est]  Friction: [low/med/high]  Evidence: [screenshot/bash output]
...
TOTAL: [N steps, M minutes]

Score 0-10. Load "## Pass 1" from dx-hall-of-fame.md for calibration.

Step 2: API/CLI/SDK Ergonomics Audit

Test what you can:

CLI: Run --help via bash. Evaluate output quality, flag design, discoverability.
API playground: Navigate via browse if one exists. Screenshot.
Naming: Check consistency across the API surface.

Score 0-10. Load "## Pass 2" from dx-hall-of-fame.md for calibration.

Step 3: Error Message Audit

Trigger common error scenarios:

Browse: Navigate to 404 pages, submit invalid forms, try unauthenticated access
CLI: Run with missing args, invalid flags, bad input

Screenshot each error. Score against the Elm/Rust/Stripe three-tier model.

Score 0-10. Load "## Pass 3" from dx-hall-of-fame.md for calibration.

Step 4: Documentation Audit

Navigate the docs structure via browse:

Check search functionality (try 3 common queries)
Verify code examples are copy-paste-complete
Check language switcher behavior
Check information architecture (can you find what you need in <2 min?)

Screenshot key findings. Score 0-10. Load "## Pass 4" from dx-hall-of-fame.md.

Step 5: Upgrade Path Audit

Read via bash:

CHANGELOG quality (clear? user-facing? migration notes?)
Migration guides (exist? step-by-step?)
Deprecation warnings in code (grep for deprecated/obsolete)

Score 0-10. Evidence: INFERRED from files. Load "## Pass 5" from dx-hall-of-fame.md.

Step 6: Developer Environment Audit

Read via bash:

README setup instructions (steps? prerequisites? platform coverage?)
CI/CD configuration (exists? documented?)
TypeScript types (if applicable)
Test utilities / fixtures

Score 0-10. Evidence: INFERRED from files. Load "## Pass 6" from dx-hall-of-fame.md.

Step 7: Community & Ecosystem Audit

Browse:

Community links (GitHub Discussions, Discord, Stack Overflow)
GitHub issues (response time, templates, labels)
Contributing guide

Score 0-10. Evidence: TESTED where web-accessible, INFERRED otherwise.

Step 8: DX Measurement Audit

Check for feedback mechanisms:

Bug report templates
NPS or feedback widgets
Analytics on docs

Score 0-10. Evidence: INFERRED from files/pages.

DX Scorecard with Evidence

+====================================================================+
|              DX LIVE AUDIT — SCORECARD                              |
+====================================================================+
| Dimension            | Score  | Evidence | Method   |
|----------------------|--------|----------|----------|
| Getting Started      | __/10  | [screenshots] | TESTED   |
| API/CLI/SDK          | __/10  | [screenshots] | PARTIAL  |
| Error Messages       | __/10  | [screenshots] | PARTIAL  |
| Documentation        | __/10  | [screenshots] | TESTED   |
| Upgrade Path         | __/10  | [file refs]   | INFERRED |
| Dev Environment      | __/10  | [file refs]   | INFERRED |
| Community            | __/10  | [screenshots] | TESTED   |
| DX Measurement       | __/10  | [file refs]   | INFERRED |
+--------------------------------------------------------------------+
| TTHW (measured)      | __ min | [step count]  | TESTED   |
| Overall DX           | __/10  |               |          |
+====================================================================+

Boomerang Comparison

If /plan-devex-review scores exist from the baseline check:

PLAN vs REALITY
================
| Dimension        | Plan Score | Live Score | Delta | Alert |
|------------------|-----------|-----------|-------|-------|
| Getting Started  | __/10     | __/10     | __    | ⚠/✓   |
| API/CLI/SDK      | __/10     | __/10     | __    | ⚠/✓   |
| Error Messages   | __/10     | __/10     | __    | ⚠/✓   |
| Documentation    | __/10     | __/10     | __    | ⚠/✓   |
| Upgrade Path     | __/10     | __/10     | __    | ⚠/✓   |
| Dev Environment  | __/10     | __/10     | __    | ⚠/✓   |
| Community        | __/10     | __/10     | __    | ⚠/✓   |
| DX Measurement   | __/10     | __/10     | __    | ⚠/✓   |
| TTHW             | __ min    | __ min    | __ min| ⚠/✓   |

Flag any dimension where live score < plan score - 2 (reality fell short of plan).

Review Log

PLAN MODE EXCEPTION — ALWAYS RUN:

true # vibe-review-log '{"skill":"devex-review","timestamp":"TIMESTAMP","status":"STATUS","overall_score":N,"product_type":"TYPE","tthw_measured":"TTHW","dimensions_tested":N,"dimensions_inferred":N,"boomerang":"YES_OR_NO","commit":"COMMIT"}'

Review Readiness Dashboard

After completing the review, read the review log and config to display the dashboard.

true # vibe-review-read

Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between review (diff-scoped pre-landing review) and plan-eng-review (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between adversarial-review (new auto-scaled) and codex-review (legacy). For Design Review, show whichever is more recent between plan-design-review (full visual audit) and design-review-lite (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. For the Outside Voice row, show the most recent codex-plan-review entry — this captures outside voices from both /plan-ceo-review and /plan-eng-review.

Source attribution: If the most recent entry for a skill has a `"via"` field, append it to the status label in parentheses. Examples: plan-eng-review with via:"autoplan" shows as "CLEAR (PLAN via /autoplan)". review with via:"ship" shows as "CLEAR (DIFF via /ship)". Entries without a via field show as "CLEAR (PLAN)" or "CLEAR (DIFF)" as before.

Note: autoplan-voices and design-outside-voices entries are audit-trail-only (forensic data for cross-model consensus analysis). They do not appear in the dashboard and are not checked by any consumer.

Display:

+====================================================================+
|                    REVIEW READINESS DASHBOARD                       |
+====================================================================+
| Review          | Runs | Last Run            | Status    | Required |
|-----------------|------|---------------------|-----------|----------|
| Eng Review      |  1   | 2026-03-16 15:00    | CLEAR     | YES      |
| CEO Review      |  0   | —                   | —         | no       |
| Design Review   |  0   | —                   | —         | no       |
| Adversarial     |  0   | —                   | —         | no       |
| Outside Voice   |  0   | —                   | —         | no       |
+--------------------------------------------------------------------+
| VERDICT: CLEARED — Eng Review passed                                |
+====================================================================+

Review tiers:

Eng Review (required by default): The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with `vibe-config set skip_eng_review true` (the "don't bother me" setting).
CEO Review (optional): Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup.
Design Review (optional): Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes.
Adversarial Review (automatic): Always-on for every review. Every diff gets both Claude adversarial subagent and Codex adversarial challenge. Large diffs (200+ lines) additionally get Codex structured review with P1 gate. No configuration needed.
Outside Voice (optional): Independent plan review from a different AI model. Offered after all review sections complete in /plan-ceo-review and /plan-eng-review. Falls back to Claude subagent if Codex is unavailable. Never gates shipping.

Verdict logic:

CLEARED: Eng Review has >= 1 entry within 7 days from either `review` or `plan-eng-review` with status "clean" (or `skip_eng_review` is `true`)
NOT CLEARED: Eng Review missing, stale (>7 days), or has open issues
CEO, Design, and Codex reviews are shown for context but never block shipping
If `skip_eng_review` config is `true`, Eng Review shows "SKIPPED (global)" and verdict is CLEARED

Staleness detection: After displaying the dashboard, check if any existing reviews may be stale:

Parse the `---HEAD---` section from the bash output to get the current HEAD commit hash
For each review entry that has a `commit` field: compare it against the current HEAD. If different, count elapsed commits: `git rev-list --count STORED_COMMIT..HEAD`. Display: "Note: {skill} review from {date} may be stale — {N} commits since review"
For entries without a `commit` field (legacy entries): display "Note: {skill} review from {date} has no commit tracking — consider re-running for accurate staleness detection"
If all reviews match the current HEAD, do not display any staleness notes

Plan File Review Report

After displaying the Review Readiness Dashboard in conversation output, also update the plan file itself so review status is visible to anyone reading the plan.

Detect the plan file

Check if there is an active plan file in this conversation (the host provides plan file paths in system messages — look for plan file references in the conversation context).
If not found, skip this section silently — not every review runs in plan mode.

Generate the report

Read the review log output you already have from the Review Readiness Dashboard step above. Parse each JSONL entry. Each skill logs different fields:

plan-ceo-review: `status`, `unresolved`, `critical_gaps`, `mode`, `scope_proposed`, `scope_accepted`, `scope_deferred`, `commit` → Findings: "{scope_proposed} proposals, {scope_accepted} accepted, {scope_deferred} deferred" → If scope fields are 0 or missing (HOLD/REDUCTION mode): "mode: {mode}, {critical_gaps} critical gaps"
plan-eng-review: `status`, `unresolved`, `critical_gaps`, `issues_found`, `mode`, `commit` → Findings: "{issues_found} issues, {critical_gaps} critical gaps"
plan-design-review: `status`, `initial_score`, `overall_score`, `unresolved`, `decisions_made`, `commit` → Findings: "score: {initial_score}/10 → {overall_score}/10, {decisions_made} decisions"
plan-devex-review: `status`, `initial_score`, `overall_score`, `product_type`, `tthw_current`, `tthw_target`, `mode`, `persona`, `competitive_tier`, `unresolved`, `commit` → Findings: "score: {initial_score}/10 → {overall_score}/10, TTHW: {tthw_current} → {tthw_target}"
devex-review: `status`, `overall_score`, `product_type`, `tthw_measured`, `dimensions_tested`, `dimensions_inferred`, `boomerang`, `commit` → Findings: "score: {overall_score}/10, TTHW: {tthw_measured}, {dimensions_tested} tested/{dimensions_inferred} inferred"
codex-review: `status`, `gate`, `findings`, `findings_fixed` → Findings: "{findings} findings, {findings_fixed}/{findings} fixed"

All fields needed for the Findings column are now present in the JSONL entries. For the review you just completed, you may use richer details from your own Completion Summary. For prior reviews, use the JSONL fields directly — they contain all required data.

Produce this markdown table:

```markdown

VIBESTACK REVIEW REPORT

Review	Trigger	Why	Runs	Status	Findings
CEO Review	`/plan-ceo-review`	Scope & strategy	{runs}	{status}	{findings}
Codex Review	`/codex review`	Independent 2nd opinion	{runs}	{status}	{findings}
Eng Review	`/plan-eng-review`	Architecture & tests (required)	{runs}	{status}	{findings}
Design Review	`/plan-design-review`	UI/UX gaps	{runs}	{status}	{findings}
DX Review	`/plan-devex-review`	Developer experience gaps	{runs}	{status}	{findings}
```

Below the table, add these lines (omit any that are empty/not applicable):

CODEX: (only if codex-review ran) — one-line summary of codex fixes
CROSS-MODEL: (only if both Claude and Codex reviews exist) — overlap analysis
UNRESOLVED: total unresolved decisions across all reviews
VERDICT: list reviews that are CLEAR (e.g., "CEO + ENG CLEARED — ready to implement"). If Eng Review is not CLEAR and not skipped globally, append "eng review required".

Write to the plan file

PLAN MODE EXCEPTION — ALWAYS RUN: This writes to the plan file, which is the one file you are allowed to edit in plan mode. The plan file review report is part of the plan's living status.

Search the plan file for a `## VIBESTACK REVIEW REPORT` section anywhere in the file (not just at the end — content may have been added after it).
If found, replace it entirely using the Edit tool. Match from `## VIBESTACK REVIEW REPORT` through either the next `## ` heading or end of file, whichever comes first. This ensures content added after the report section is preserved, not eaten. If the Edit fails (e.g., concurrent edit changed the content), re-read the plan file and retry once.
If no such section exists, append it to the end of the plan file.
Always place it as the very last section in the plan file. If it was found mid-file, move it: delete the old location and append at the end.

Next Steps

After the audit, recommend:

Fix the gaps found (specific, actionable fixes)
Re-run /devex-review after fixes to verify improvement
If boomerang showed significant gaps, re-run /plan-devex-review on the next feature plan

Formatting Rules

NUMBER issues (1, 2, 3...) and LETTERS for options (A, B, C...).
Rate every dimension with evidence source.
Screenshots are the gold standard. File references are acceptable. Guesses are not.

Capabilities

skillsource-timurgaleevskill-devex-reviewtopic-agent-skillstopic-ai-agentstopic-claude-codetopic-cursor-idetopic-developer-toolstopic-kirotopic-mcptopic-prompt-engineeringtopic-slash-commands

Install

Installnpx skills add timurgaleev/vibestack

Sourcehttps://github.com/timurgaleev/vibestack/tree/main/skills/devex-review

skills.shhttps://skills.sh/timurgaleev/vibestack/devex-review

Transportskills-sh

Protocolskill

Quality

0.46/ 1.00

deterministic score 0.46 from registry signals: · indexed on github topic:agent-skills · 15 github stars · SKILL.md body (23,232 chars)

Provenance

Indexed fromgithub

Enriched2026-05-18 19:06:20Z · deterministic:skill-github:v1 · v1

First seen2026-05-18

Last seen2026-05-18

Agent access

JSONhttps://clawmart.sh/api/listings/6FvqFd