{"id":"bc0fbd1b-da55-4333-92f5-c4762447ff83","shortId":"jF6AWP","kind":"skill","title":"vibe-testing","tagline":"This skill should be used when the user asks to \"test my specs\", \"validate my design docs\", \"find gaps in my architecture\", \"stress-test the spec\", \"vibe test\", \"pressure test the docs\", or mentions spec validation before implementation begins.","description":"# Vibe Testing\n\n## Overview\n\nVibe testing validates specification documents by simulating real-world scenarios against them using LLM reasoning. Instead of writing code or test harnesses, write **natural-language scenarios** that exercise cross-cutting slices of the spec surface — then trace execution step-by-step, flagging gaps, conflicts, and ambiguities.\n\n**Core principle:** If a realistic user scenario cannot be fully traced through the specs, the specs are incomplete.\n\n**Best used:** After specs are written, before implementation begins.\n\n## When to Use\n\n- Spec docs exist but no implementation yet — validate before building\n- After major spec changes — regression-test for new gaps\n- Before implementation planning — find blocking gaps early\n- Specs span multiple documents — test cross-doc coherence\n- Designing for multiple deployment contexts — test each context separately\n\n**When NOT to use:**\n\n- Single-file specs with obvious scope — just review manually\n- Implementation bugs — use actual tests\n- API contract validation — use schema validation tools\n\n## Core Method\n\n```\n1. GATHER    — Read all spec docs in the target directory\n2. SCENARIOS — Write 3-5 vibe test cases (personas + goals + environments)\n3. SIMULATE  — Trace each scenario step-by-step against the specs\n4. CLASSIFY  — Tag findings as GAP / CONFLICT / AMBIGUITY\n5. SEVERITY  — Rate as BLOCKING / DEGRADED / COSMETIC\n6. REPORT    — Produce gap summary + spec coverage matrix\n```\n\n## Writing a Vibe Test Case\n\nEvery test case requires 7 sections:\n\n### 1. Persona (WHO)\n\nA concrete person with a name, role, and technical skill level. Not abstract — real enough to predict behavior.\n\n```markdown\n**Sarah** — First-time customer. Shopping on mobile during a commute.\nExpects checkout to take under 60 seconds. Low patience for errors.\n```\n\nNamed personas force specificity. \"A customer\" invites hand-waving. \"Sarah, shopping on mobile during a commute\" forces the spec to answer \"what happens on a slow 3G connection?\"\n\n### 2. Environment (WHERE)\n\nDeployment mode, hardware, network, access method. Different environments exercise different spec paths.\n\n```markdown\n- **Client:** Mobile browser (iOS Safari, 3G connection)\n- **Backend:** Microservices (auth, payments, inventory, orders, notifications)\n- **Scale:** Black Friday traffic — 50x normal load\n```\n\n### 3. Goal (WHAT)\n\nA single sentence in the persona's own words. Use a blockquote.\n\n```markdown\n> \"I want to buy these 3 items, pay with my credit card, and get a\n> confirmation email within a minute.\"\n```\n\n### 4. Scenario Steps (HOW)\n\n5-8 concrete steps the persona takes. Each step names:\n\n- **The user action** — what they do\n- **The primitives exercised** — which spec concepts activate\n- **Gap detection questions** — 2-3 questions the simulator must answer\n\n```markdown\n#### Step 3: Payment fails, customer retries\n\nSarah's first payment attempt is declined. She re-enters a different card.\n\n**Primitives:**\n- `payments-spec.md`: retry policy, idempotency keys\n- `inventory-spec.md`: stock hold duration during retry\n- `orders-spec.md`: order state transitions on payment failure\n\n**Questions:**\n- Q3.1: The payment spec says \"retry 3 times.\" The inventory spec\n  holds stock for 5 minutes. What if retries take longer than 5 minutes?\n- Q3.2: Does the order stay in \"pending_payment\" during retries, or\n  does it transition to \"failed\" and require a new order?\n```\n\n**Rules for good steps:**\n\n- Each step must cite at least one spec doc\n- Each step must ask at least one question the spec should answer\n- Questions use `Q<step>.<number>:` format for traceability\n- Questions must be spec-answerable (yes/no/how), not opinion questions\n\n### 5. Spec Coverage Matrix (COVERAGE)\n\nA table showing which spec docs were exercised at which steps.\n\n```markdown\n| Spec Doc | Steps Hit | Coverage |\n|----------|-----------|----------|\n| `payments-spec.md` | 3,4 | Retry covered; hold-vs-retry timing gap |\n| `inventory-spec.md` | 2,3 | Stock hold covered; expiry-during-retry unclear |\n| `shipping-spec.md` | — | Not exercised |\n```\n\nSpecs that no scenario touches are untested blind spots.\n\n### 6. Gap Detection Questions Summary\n\nCollect all Q-numbers for easy reference. The simulator answers every one.\n\n### 7. Gap Classification (After Simulation)\n\nClassify each finding by severity:\n\n| Severity | Definition | Example |\n|----------|-----------|---------|\n| **BLOCKING** | Spec cannot answer; implementation impossible | Payment retry duration can exceed inventory hold — no resolution defined |\n| **DEGRADED** | Spec is silent but a workaround exists | No spec for partial refunds on split shipments; can process manually |\n| **COSMETIC** | Missing convenience, not a correctness issue | No order timeline view for customer support |\n\n## Running a Vibe Test\n\n### Manual Simulation (Recommended)\n\nUse as a prompt to a subagent or fresh LLM context with full spec access:\n\n```\nYou are a spec validation simulator. You have been given all\nspecification documents for [system name].\n\nRead the following vibe test case. Simulate executing the scenario\nstep by step against the specs.\n\nFor each step:\n1. Identify the governing spec document and section\n2. Trace the data flow through the system primitives\n3. Answer every Q-numbered question by citing the spec\n\nFor each question, classify as:\n- COVERED: The spec answers this clearly. Cite the section.\n- GAP: The spec is silent. No document addresses this.\n- CONFLICT: Two specs give contradictory answers. Cite both.\n- AMBIGUITY: The spec addresses this but the answer is unclear.\n\nAfter all steps, produce:\n- Gap summary table (ID, description, severity, affected steps)\n- Spec coverage heatmap (which docs exercised, which not)\n- Recommended spec changes (which doc to update, what to add)\n```\n\n### Batch Execution\n\nRun all test cases and aggregate:\n\n```\nfor each test case:\n    1. Load all spec docs as context\n    2. Load one test case\n    3. Run simulator prompt\n    4. Collect gap report\n\nAggregate:\n    - Cross-test gap summary (gaps appearing in multiple tests)\n    - Spec coverage union (docs never exercised by any test)\n    - Priority ranking (blocking > degraded > cosmetic)\n```\n\n### Regression Testing\n\nAfter spec updates, re-run all vibe tests to verify:\n\n1. Previously identified gaps are now COVERED\n2. No new gaps were introduced\n3. Cross-doc references remain consistent\n\n## Designing Good Test Cases\n\n### Scenario Selection Strategy\n\nChoose scenarios that vary across dimensions:\n\n| Dimension | Variation A | Variation B | Variation C |\n|-----------|------------|------------|------------|\n| **User type** | First-time buyer | Returning customer | Admin/merchant |\n| **Device** | Mobile browser | Desktop | API client |\n| **Scale** | Single user | Normal traffic | Black Friday spike |\n| **Payment** | Happy path | Failure + retry | Partial refund |\n| **Governance** | None (consumer) | Moderate (business) | Strict (compliance) |\n| **Network** | Fast WiFi | Slow 3G | Intermittent |\n\nEach test case should differ on at least 3 dimensions. 4 test cases covering 4 quadrants give good coverage.\n\n### Question Design\n\nGood gap detection questions are:\n\n- **Specific:** \"What order state is set during payment retry?\" not \"How do orders work?\"\n- **Traceable:** Answerable by citing a spec section (or flagging its absence)\n- **Boundary-probing:** Target edges between two specs' responsibilities\n- **Scale-sensitive:** \"What happens with 10,000 concurrent checkouts?\"\n- **Failure-aware:** \"What if the payment fails after inventory is reserved?\"\n\n### Coverage Maximization\n\nAfter writing all test cases, check the coverage union. Every spec doc should appear in at least one coverage matrix. If a doc is never exercised:\n\n- Either add a step to an existing test that exercises it\n- Or the doc may be specifying something no real scenario needs (flag for review)\n\n## Gap Report Format\n\n```markdown\n## Gap Summary\n\n### BLOCKING\n| ID | Gap | Affected Tests | Recommended Fix |\n|----|-----|---------------|-----------------|\n| G-B1 | Payment retry window can exceed inventory hold | VT-1, VT-2 | Align timing in payments-spec.md and inventory-spec.md |\n\n### DEGRADED\n| ID | Gap | Affected Tests | Workaround |\n|----|-----|---------------|-----------|\n| G-D1 | No spec for partial refunds on split shipments | VT-3 | Process refunds per-shipment manually |\n\n### COSMETIC\n| ID | Gap | Affected Tests |\n|----|-----|---------------|\n| G-C1 | No order timeline view for support agents | VT-4 |\n```\n\nGap IDs use prefix: `G-B` (blocking), `G-D` (degraded), `G-C` (cosmetic).\n\n## Common Mistakes\n\n| Mistake | Fix |\n|---------|-----|\n| Abstract personas (\"a user\") | Give them names, roles, and constraints |\n| Scenario only tests happy path | Add failure steps: \"What if the payment is declined?\" |\n| Questions test opinions (\"Is this good?\") | Questions must be spec-answerable: \"Which doc defines X?\" |\n| All tests use same user type | Vary across buyer, merchant, admin, support |\n| Ignoring coverage matrix | Every spec doc must appear in at least one test |\n| Writing tests after implementation | Vibe tests validate specs BEFORE implementation |\n| Too many steps per scenario | 5-8 steps. Focused scenarios find more gaps |\n\n## Additional Resources\n\n- `references/simulator-prompt.md` — Full simulator prompt template ready to paste\n- `examples/example-vibe-test.md` — Complete example vibe test case","tags":["vibe","testing","knot0-com","agent-skills","architecture","claude-code","codex","gemini-cli","llm","skill","spec-validation","vibe-testing"],"capabilities":["skill","source-knot0-com","skill-vibe-testing","topic-agent-skills","topic-architecture","topic-claude-code","topic-codex","topic-gemini-cli","topic-llm","topic-skill","topic-spec-validation","topic-testing","topic-vibe-testing"],"categories":["vibe-testing"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/knot0-com/vibe-testing","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add knot0-com/vibe-testing","source_repo":"https://github.com/knot0-com/vibe-testing","install_from":"skills.sh"}},"qualityScore":"0.459","qualityRationale":"deterministic score 0.46 from registry signals: · indexed on github topic:agent-skills · 19 github stars · SKILL.md body (9,403 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-04-22T19:06:17.304Z","embedding":null,"createdAt":"2026-04-19T00:39:09.890Z","updatedAt":"2026-04-22T19:06:17.304Z","lastSeenAt":"2026-04-22T19:06:17.304Z","tsv":"'-1':1181 '-2':1183 '-3':444,1208 '-4':1231 '-5':214 '-8':418,1333 '000':1089 '1':200,267,770,881,939 '10':1088 '2':210,340,443,611,778,888,946 '3':213,221,377,398,452,497,600,612,787,893,952,1030 '3g':338,361,1020 '4':233,413,601,897,1032,1036 '5':241,417,505,513,577,1332 '50x':374 '6':248,633 '60':305 '7':265,651 'absenc':1072 'abstract':282,1252 'access':347,734 'across':970,1299 'action':429 'activ':439 'actual':189 'add':868,1133,1267 'addit':1340 'address':819,832 'admin':1302 'admin/merchant':987 'affect':849,1166,1193,1218 'agent':1229 'aggreg':876,901 'align':1184 'ambigu':96,240,829 'answer':332,449,560,572,648,667,788,806,826,836,1063,1287 'api':191,992 'appear':908,1119,1311 'architectur':25 'ask':12,552 'attempt':461 'auth':365 'awar':1094 'b':976,1238 'b1':1172 'backend':363 'batch':869 'begin':43,123 'behavior':287 'best':115 'black':371,999 'blind':631 'block':151,245,664,923,1163,1239 'blockquot':391 'boundari':1074 'boundary-prob':1073 'browser':358,990 'bug':187 'build':136 'busi':1013 'buy':396 'buyer':984,1300 'c':978,1246 'c1':1222 'cannot':104,666 'card':404,470 'case':217,260,263,756,874,880,892,962,1024,1034,1110,1355 'chang':140,861 'check':1111 'checkout':301,1091 'choos':966 'cite':543,795,809,827,1065 'classif':653 'classifi':234,656,801 'clear':808 'client':356,993 'code':66 'coher':162 'collect':638,898 'common':1248 'commut':299,327 'complet':1351 'complianc':1015 'concept':438 'concret':271,419 'concurr':1090 'confirm':408 'conflict':94,239,821 'connect':339,362 'consist':958 'constraint':1261 'consum':1011 'context':167,170,730,887 'contract':192 'contradictori':825 'conveni':701 'core':97,198 'correct':704 'cosmet':247,699,925,1215,1247 'cover':603,615,803,945,1035 'coverag':254,579,581,598,852,913,1040,1104,1113,1124,1305 'credit':403 'cross':78,160,903,954 'cross-cut':77 'cross-doc':159,953 'cross-test':902 'custom':293,316,455,711,986 'cut':79 'd':1242 'd1':1198 'data':781 'declin':463,1275 'defin':679,1290 'definit':662 'degrad':246,680,924,1190,1243 'deploy':166,343 'descript':847 'design':19,163,959,1042 'desktop':991 'detect':441,635,1045 'devic':988 'differ':349,352,469,1026 'dimens':971,972,1031 'directori':209 'doc':20,36,128,161,205,548,587,595,855,863,885,915,955,1117,1128,1145,1289,1309 'document':51,157,747,775,818 'durat':480,672 'earli':153 'easi':644 'edg':1077 'either':1132 'email':409 'enough':284 'enter':467 'environ':220,341,350 'error':310 'everi':261,649,789,1115,1307 'exampl':663,1352 'examples/example-vibe-test.md':1350 'exceed':674,1177 'execut':87,758,870 'exercis':76,351,435,589,623,856,917,1131,1141 'exist':129,687,1138 'expect':300 'expiri':617 'expiry-during-retri':616 'fail':454,530,1099 'failur':489,1005,1093,1268 'failure-awar':1092 'fast':1017 'file':178 'find':21,150,236,658,1337 'first':291,459,982 'first-tim':290,981 'fix':1169,1251 'flag':92,1070,1154 'flow':782 'focus':1335 'follow':753 'forc':313,328 'format':564,1159 'fresh':728 'friday':372,1000 'full':732,1343 'fulli':106 'g':1171,1197,1221,1237,1241,1245 'g-b':1236 'g-b1':1170 'g-c':1244 'g-c1':1220 'g-d':1240 'g-d1':1196 'gap':22,93,146,152,238,251,440,609,634,652,812,843,899,905,907,942,949,1044,1157,1161,1165,1192,1217,1232,1339 'gather':201 'get':406 'give':824,1038,1256 'given':744 'goal':219,378 'good':538,960,1039,1043,1281 'govern':773,1009 'hand':319 'hand-wav':318 'happen':334,1086 'happi':1003,1265 'har':69 'hardwar':345 'heatmap':853 'hit':597 'hold':479,502,605,614,676,1179 'hold-vs-retri':604 'id':846,1164,1191,1216,1233 'idempot':475 'identifi':771,941 'ignor':1304 'implement':42,122,132,148,186,668,1320,1326 'imposs':669 'incomplet':114 'instead':63 'intermitt':1021 'introduc':951 'inventori':367,500,675,1101,1178 'inventory-spec.md':477,610,1189 'invit':317 'io':359 'issu':705 'item':399 'key':476 'languag':73 'least':545,554,1029,1122,1314 'level':280 'llm':61,729 'load':376,882,889 'longer':511 'low':307 'major':138 'mani':1328 'manual':185,698,717,1214 'markdown':288,355,392,450,593,1160 'matrix':255,580,1125,1306 'maxim':1105 'may':1146 'mention':38 'merchant':1301 'method':199,348 'microservic':364 'minut':412,506,514 'miss':700 'mistak':1249,1250 'mobil':296,324,357,989 'mode':344 'moder':1012 'multipl':156,165,910 'must':448,542,551,568,1283,1310 'name':275,311,426,750,1258 'natur':72 'natural-languag':71 'need':1153 'network':346,1016 'never':916,1130 'new':145,534,948 'none':1010 'normal':375,997 'notif':369 'number':642,792 'obvious':181 'one':546,555,650,890,1123,1315 'opinion':575,1278 'order':368,484,518,535,707,1050,1060,1224 'orders-spec.md':483 'overview':46 'partial':691,1007,1202 'past':1349 'path':354,1004,1266 'patienc':308 'pay':400 'payment':366,453,460,488,493,522,670,1002,1055,1098,1173,1273 'payments-spec.md':472,599,1187 'pend':521 'per':1212,1330 'per-ship':1211 'person':272 'persona':218,268,312,385,422,1253 'plan':149 'polici':474 'predict':286 'prefix':1235 'pressur':33 'previous':940 'primit':434,471,786 'principl':98 'prioriti':921 'probe':1075 'process':697,1209 'produc':250,842 'prompt':723,896,1345 'q':563,641,791 'q-number':640,790 'q3.1':491 'q3.2':515 'quadrant':1037 'question':442,445,490,556,561,567,576,636,793,800,1041,1046,1276,1282 'rank':922 'rate':243 're':466,932 're-ent':465 're-run':931 'read':202,751 'readi':1347 'real':55,283,1151 'real-world':54 'realist':101 'reason':62 'recommend':719,859,1168 'refer':645,956 'references/simulator-prompt.md':1342 'refund':692,1008,1203,1210 'regress':142,926 'regression-test':141 'remain':957 'report':249,900,1158 'requir':264,532 'reserv':1103 'resolut':678 'resourc':1341 'respons':1081 'retri':456,473,482,496,509,524,602,607,619,671,1006,1056,1174 'return':985 'review':184,1156 'role':276,1259 'rule':536 'run':713,871,894,933 'safari':360 'sarah':289,321,457 'say':495 'scale':370,994,1083 'scale-sensit':1082 'scenario':57,74,103,211,225,414,627,760,963,967,1152,1262,1331,1336 'schema':195 'scope':182 'second':306 'section':266,777,811,1068 'select':964 'sensit':1084 'sentenc':382 'separ':171 'set':1053 'sever':242,660,661,848 'shipment':695,1206,1213 'shipping-spec.md':621 'shop':294,322 'show':584 'silent':683,816 'simul':53,222,447,647,655,718,740,757,895,1344 'singl':177,381,995 'single-fil':176 'skill':5,279 'skill-vibe-testing' 'slice':80 'slow':337,1019 'someth':1149 'source-knot0-com' 'span':155 'spec':16,30,39,83,110,112,118,127,139,154,179,204,232,253,330,353,437,494,501,547,558,571,578,586,594,624,665,681,689,733,738,766,774,797,805,814,823,831,851,860,884,912,929,1067,1080,1116,1200,1286,1308,1324 'spec-answer':570,1285 'specif':50,314,746,1048 'specifi':1148 'spike':1001 'split':694,1205 'spot':632 'state':485,1051 'stay':519 'step':89,91,227,229,415,420,425,451,539,541,550,592,596,761,763,769,841,850,1135,1269,1329,1334 'step-by-step':88,226 'stock':478,503,613 'strategi':965 'stress':27 'stress-test':26 'strict':1014 'subag':726 'summari':252,637,844,906,1162 'support':712,1228,1303 'surfac':84 'system':749,785 'tabl':583,845 'tag':235 'take':303,423,510 'target':208,1076 'technic':278 'templat':1346 'test':3,14,28,32,34,45,48,68,143,158,168,190,216,259,262,716,755,873,879,891,904,911,920,927,936,961,1023,1033,1109,1139,1167,1194,1219,1264,1277,1293,1316,1318,1322,1354 'time':292,498,608,983,1185 'timelin':708,1225 'tool':197 'topic-agent-skills' 'topic-architecture' 'topic-claude-code' 'topic-codex' 'topic-gemini-cli' 'topic-llm' 'topic-skill' 'topic-spec-validation' 'topic-testing' 'topic-vibe-testing' 'touch':628 'trace':86,107,223,779 'traceabl':566,1062 'traffic':373,998 'transit':486,528 'two':822,1079 'type':980,1297 'unclear':620,838 'union':914,1114 'untest':630 'updat':865,930 'use':8,60,116,126,175,188,194,389,562,720,1234,1294 'user':11,102,428,979,996,1255,1296 'valid':17,40,49,134,193,196,739,1323 'vari':969,1298 'variat':973,975,977 'verifi':938 'vibe':2,31,44,47,215,258,715,754,935,1321,1353 'vibe-test':1 'view':709,1226 'vs':606 'vt':1180,1182,1207,1230 'want':394 'wave':320 'wifi':1018 'window':1175 'within':410 'word':388 'work':1061 'workaround':686,1195 'world':56 'write':65,70,212,256,1107,1317 'written':120 'x':1291 'yes/no/how':573 'yet':133","prices":[{"id":"33f84808-27e7-48df-a24b-fe3c99ce6d36","listingId":"bc0fbd1b-da55-4333-92f5-c4762447ff83","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"knot0-com","category":"vibe-testing","install_from":"skills.sh"},"createdAt":"2026-04-19T00:39:09.890Z"}],"sources":[{"listingId":"bc0fbd1b-da55-4333-92f5-c4762447ff83","source":"github","sourceId":"knot0-com/vibe-testing","sourceUrl":"https://github.com/knot0-com/vibe-testing","isPrimary":false,"firstSeenAt":"2026-04-19T00:39:09.890Z","lastSeenAt":"2026-04-22T19:06:17.304Z"}],"details":{"listingId":"bc0fbd1b-da55-4333-92f5-c4762447ff83","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"knot0-com","slug":"vibe-testing","github":{"repo":"knot0-com/vibe-testing","stars":19,"topics":["agent-skills","architecture","claude-code","codex","gemini-cli","llm","skill","spec-validation","testing","vibe-testing"],"license":"mit","html_url":"https://github.com/knot0-com/vibe-testing","pushed_at":"2026-03-22T02:44:32Z","description":"Pressure-test your specs with LLM reasoning before writing code. Agent skill for Claude Code, Codex, Gemini CLI, and 14+ coding agents.","skill_md_sha":"f7f537295e09848a4e3f0bea6a26f430b4100e38","skill_md_path":"SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/knot0-com/vibe-testing"},"layout":"root","source":"github","category":"vibe-testing","frontmatter":{"name":"vibe-testing","description":"This skill should be used when the user asks to \"test my specs\", \"validate my design docs\", \"find gaps in my architecture\", \"stress-test the spec\", \"vibe test\", \"pressure test the docs\", or mentions spec validation before implementation begins."},"skills_sh_url":"https://skills.sh/knot0-com/vibe-testing"},"updatedAt":"2026-04-22T19:06:17.304Z"}}