{"id":"d258b2e5-18b2-41e9-b519-bf2781dcea30","shortId":"qtm7AY","kind":"skill","title":"ab-test-setup","tagline":"Structured guide for setting up A/B tests with mandatory gates for hypothesis, metrics, and execution readiness.","description":"# A/B Test Setup\n\n## 1️⃣ Purpose & Scope\n\nEnsure every A/B test is **valid, rigorous, and safe** before a single line of code is written.\n\n- Prevents \"peeking\"\n- Enforces statistical power\n- Blocks invalid hypotheses\n\n---\n\n## 2️⃣ Pre-Requisites\n\nYou must have:\n\n- A clear user problem\n- Access to an analytics source\n- Roughly estimated traffic volume\n\n### Hypothesis Quality Checklist\n\nA valid hypothesis includes:\n\n- Observation or evidence\n- Single, specific change\n- Directional expectation\n- Defined audience\n- Measurable success criteria\n\n---\n\n### 3️⃣ Hypothesis Lock (Hard Gate)\n\nBefore designing variants or metrics, you MUST:\n\n- Present the **final hypothesis**\n- Specify:\n  - Target audience\n  - Primary metric\n  - Expected direction of effect\n  - Minimum Detectable Effect (MDE)\n\nAsk explicitly:\n\n> “Is this the final hypothesis we are committing to for this test?”\n\n**Do NOT proceed until confirmed.**\n\n---\n\n### 4️⃣ Assumptions & Validity Check (Mandatory)\n\nExplicitly list assumptions about:\n\n- Traffic stability\n- User independence\n- Metric reliability\n- Randomization quality\n- External factors (seasonality, campaigns, releases)\n\nIf assumptions are weak or violated:\n\n- Warn the user\n- Recommend delaying or redesigning the test\n\n---\n\n### 5️⃣ Test Type Selection\n\nChoose the simplest valid test:\n\n- **A/B Test** – single change, two variants\n- **A/B/n Test** – multiple variants, higher traffic required\n- **Multivariate Test (MVT)** – interaction effects, very high traffic\n- **Split URL Test** – major structural changes\n\nDefault to **A/B** unless there is a clear reason otherwise.\n\n---\n\n### 6️⃣ Metrics Definition\n\n#### Primary Metric (Mandatory)\n\n- Single metric used to evaluate success\n- Directly tied to the hypothesis\n- Pre-defined and frozen before launch\n\n#### Secondary Metrics\n\n- Provide context\n- Explain _why_ results occurred\n- Must not override the primary metric\n\n#### Guardrail Metrics\n\n- Metrics that must not degrade\n- Used to prevent harmful wins\n- Trigger test stop if significantly negative\n\n---\n\n### 7️⃣ Sample Size & Duration\n\nDefine upfront:\n\n- Baseline rate\n- MDE\n- Significance level (typically 95%)\n- Statistical power (typically 80%)\n\nEstimate:\n\n- Required sample size per variant\n- Expected test duration\n\n**Do NOT proceed without a realistic sample size estimate.**\n\n---\n\n### 8️⃣ Execution Readiness Gate (Hard Stop)\n\nYou may proceed to implementation **only if all are true**:\n\n- Hypothesis is locked\n- Primary metric is frozen\n- Sample size is calculated\n- Test duration is defined\n- Guardrails are set\n- Tracking is verified\n\nIf any item is missing, stop and resolve it.\n\n---\n\n## Running the Test\n\n### During the Test\n\n**DO:**\n\n- Monitor technical health\n- Document external factors\n\n**DO NOT:**\n\n- Stop early due to “good-looking” results\n- Change variants mid-test\n- Add new traffic sources\n- Redefine success criteria\n\n---\n\n## Analyzing Results\n\n### Analysis Discipline\n\nWhen interpreting results:\n\n- Do NOT generalize beyond the tested population\n- Do NOT claim causality beyond the tested change\n- Do NOT override guardrail failures\n- Separate statistical significance from business judgment\n\n### Interpretation Outcomes\n\n| Result               | Action                                 |\n| -------------------- | -------------------------------------- |\n| Significant positive | Consider rollout                       |\n| Significant negative | Reject variant, document learning      |\n| Inconclusive         | Consider more traffic or bolder change |\n| Guardrail failure    | Do not ship, even if primary wins      |\n\n---\n\n## Documentation & Learning\n\n### Test Record (Mandatory)\n\nDocument:\n\n- Hypothesis\n- Variants\n- Metrics\n- Sample size vs achieved\n- Results\n- Decision\n- Learnings\n- Follow-up ideas\n\nStore records in a shared, searchable location to avoid repeated failures.\n\n---\n\n## Refusal Conditions (Safety)\n\nRefuse to proceed if:\n\n- Baseline rate is unknown and cannot be estimated\n- Traffic is insufficient to detect the MDE\n- Primary metric is undefined\n- Multiple variables are changed without proper design\n- Hypothesis cannot be clearly stated\n\nExplain why and recommend next steps.\n\n---\n\n## Key Principles (Non-Negotiable)\n\n- One hypothesis per test\n- One primary metric\n- Commit before launch\n- No peeking\n- Learning over winning\n- Statistical rigor first\n\n---\n\n## Final Reminder\n\nA/B testing is not about proving ideas right.\nIt is about **learning the truth with confidence**.\n\nIf you feel tempted to rush, simplify, or “just try it” —\nthat is the signal to **slow down and re-check the design**.\n\n## When to Use\nThis skill is applicable to execute the workflow or actions described in the overview.\n\n## Limitations\n- Use this skill only when the task clearly matches the scope described above.\n- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.\n- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.","tags":["test","setup","antigravity","awesome","skills","sickn33","agent-skills","agentic-skills","ai-agent-skills","ai-agents","ai-coding","ai-workflows"],"capabilities":["skill","source-sickn33","skill-ab-test-setup","topic-agent-skills","topic-agentic-skills","topic-ai-agent-skills","topic-ai-agents","topic-ai-coding","topic-ai-workflows","topic-antigravity","topic-antigravity-skills","topic-claude-code","topic-claude-code-skills","topic-codex-cli","topic-codex-skills"],"categories":["antigravity-awesome-skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/sickn33/antigravity-awesome-skills/ab-test-setup","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add sickn33/antigravity-awesome-skills","source_repo":"https://github.com/sickn33/antigravity-awesome-skills","install_from":"skills.sh"}},"qualityScore":"0.700","qualityRationale":"deterministic score 0.70 from registry signals: · indexed on github topic:agent-skills · 34997 github stars · SKILL.md body (4,994 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-04-25T06:50:21.707Z","embedding":null,"createdAt":"2026-04-18T21:30:16.534Z","updatedAt":"2026-04-25T06:50:21.707Z","lastSeenAt":"2026-04-25T06:50:21.707Z","tsv":"'1️⃣':24 '2️⃣':52 '3️⃣':92 '4️⃣':140 '5️⃣':177 '6️⃣':223 '7️⃣':279 '80':295 '8️⃣':314 '95':291 'a/b':10,21,29,186,215,558 'a/b/n':192 'ab':2 'ab-test-setup':1 'access':63 'achiev':470 'action':431,610 'add':388 'analysi':397 'analyt':66 'analyz':395 'applic':604 'ask':121,648 'assumpt':141,147,163 'audienc':88,110 'avoid':486 'baselin':285,496 'beyond':405,413 'block':49 'bolder':447 'boundari':656 'busi':426 'calcul':340 'campaign':160 'cannot':501,523 'causal':412 'chang':84,189,212,383,416,448,518 'check':143,595 'checklist':74 'choos':181 'claim':411 'clarif':650 'clear':60,220,525,623 'code':41 'commit':130,545 'condit':490 'confid':573 'confirm':139 'consid':434,443 'context':250 'criteria':91,394,659 'decis':472 'default':213 'defin':87,242,283,344 'definit':225 'degrad':267 'delay':172 'describ':611,627 'design':98,521,597 'detect':118,508 'direct':85,114,235 'disciplin':398 'document':370,440,458,463 'due':377 'durat':282,304,342 'earli':376 'effect':116,119,203 'enforc':46 'ensur':27 'environ':639 'environment-specif':638 'estim':69,296,313,503 'evalu':233 'even':454 'everi':28 'evid':81 'execut':19,315,606 'expect':86,113,302 'expert':644 'explain':251,527 'explicit':122,145 'extern':157,371 'factor':158,372 'failur':421,450,488 'feel':576 'final':106,126,556 'first':555 'follow':475 'follow-up':474 'frozen':244,336 'gate':14,96,317 'general':404 'good':380 'good-look':379 'guardrail':261,345,420,449 'guid':6 'hard':95,318 'harm':271 'health':369 'high':205 'higher':196 'hypothes':51 'hypothesi':16,72,77,93,107,127,239,330,464,522,539 'idea':477,564 'implement':324 'includ':78 'inconclus':442 'independ':152 'input':653 'insuffici':506 'interact':202 'interpret':400,428 'invalid':50 'item':353 'judgment':427 'key':533 'launch':246,547 'learn':441,459,473,550,569 'level':289 'limit':615 'line':39 'list':146 'locat':484 'lock':94,332 'look':381 'major':210 'mandatori':13,144,228,462 'match':624 'may':321 'mde':120,287,510 'measur':89 'metric':17,101,112,153,224,227,230,248,260,262,263,334,466,512,544 'mid':386 'mid-test':385 'minimum':117 'miss':355,661 'monitor':367 'multipl':194,515 'multivari':199 'must':57,103,255,265 'mvt':201 'negat':278,437 'negoti':537 'new':389 'next':531 'non':536 'non-negoti':535 'observ':79 'occur':254 'one':538,542 'otherwis':222 'outcom':429 'output':633 'overrid':257,419 'overview':614 'peek':45,549 'per':300,540 'permiss':654 'popul':408 'posit':433 'power':48,293 'pre':54,241 'pre-defin':240 'pre-requisit':53 'present':104 'prevent':44,270 'primari':111,226,259,333,456,511,543 'principl':534 'problem':62 'proceed':137,307,322,494 'proper':520 'prove':563 'provid':249 'purpos':25 'qualiti':73,156 'random':155 'rate':286,497 're':594 're-check':593 'readi':20,316 'realist':310 'reason':221 'recommend':171,530 'record':461,479 'redefin':392 'redesign':174 'refus':489,492 'reject':438 'releas':161 'reliabl':154 'remind':557 'repeat':487 'requir':198,297,652 'requisit':55 'resolv':358 'result':253,382,396,401,430,471 'review':645 'right':565 'rigor':33,554 'rollout':435 'rough':68 'run':360 'rush':579 'safe':35 'safeti':491,655 'sampl':280,298,311,337,467 'scope':26,626 'searchabl':483 'season':159 'secondari':247 'select':180 'separ':422 'set':8,347 'setup':4,23 'share':482 'ship':453 'signal':588 'signific':277,288,424,432,436 'simplest':183 'simplifi':580 'singl':38,82,188,229 'size':281,299,312,338,468 'skill':602,618 'skill-ab-test-setup' 'slow':590 'sourc':67,391 'source-sickn33' 'specif':83,640 'specifi':108 'split':207 'stabil':150 'state':526 'statist':47,292,423,553 'step':532 'stop':275,319,356,375,646 'store':478 'structur':5,211 'substitut':636 'success':90,234,393,658 'target':109 'task':622 'technic':368 'tempt':577 'test':3,11,22,30,134,176,178,185,187,193,200,209,274,303,341,362,365,387,407,415,460,541,559,642 'tie':236 'topic-agent-skills' 'topic-agentic-skills' 'topic-ai-agent-skills' 'topic-ai-agents' 'topic-ai-coding' 'topic-ai-workflows' 'topic-antigravity' 'topic-antigravity-skills' 'topic-claude-code' 'topic-claude-code-skills' 'topic-codex-cli' 'topic-codex-skills' 'track':348 'traffic':70,149,197,206,390,445,504 'treat':631 'tri':583 'trigger':273 'true':329 'truth':571 'two':190 'type':179 'typic':290,294 'undefin':514 'unknown':499 'unless':216 'upfront':284 'url':208 'use':231,268,600,616 'user':61,151,170 'valid':32,76,142,184,641 'variabl':516 'variant':99,191,195,301,384,439,465 'verifi':350 'violat':167 'volum':71 'vs':469 'warn':168 'weak':165 'win':272,457,552 'without':308,519 'workflow':608 'written':43","prices":[{"id":"e9517b9e-4fec-418f-b8d8-e38e8b94a62b","listingId":"d258b2e5-18b2-41e9-b519-bf2781dcea30","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"sickn33","category":"antigravity-awesome-skills","install_from":"skills.sh"},"createdAt":"2026-04-18T21:30:16.534Z"}],"sources":[{"listingId":"d258b2e5-18b2-41e9-b519-bf2781dcea30","source":"github","sourceId":"sickn33/antigravity-awesome-skills/ab-test-setup","sourceUrl":"https://github.com/sickn33/antigravity-awesome-skills/tree/main/skills/ab-test-setup","isPrimary":false,"firstSeenAt":"2026-04-18T21:30:16.534Z","lastSeenAt":"2026-04-25T06:50:21.707Z"}],"details":{"listingId":"d258b2e5-18b2-41e9-b519-bf2781dcea30","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"sickn33","slug":"ab-test-setup","github":{"repo":"sickn33/antigravity-awesome-skills","stars":34997,"topics":["agent-skills","agentic-skills","ai-agent-skills","ai-agents","ai-coding","ai-workflows","antigravity","antigravity-skills","claude-code","claude-code-skills","codex-cli","codex-skills","cursor","cursor-skills","developer-tools","gemini-cli","gemini-skills","kiro","mcp","skill-library"],"license":"mit","html_url":"https://github.com/sickn33/antigravity-awesome-skills","pushed_at":"2026-04-25T06:33:17Z","description":"Installable GitHub library of 1,400+ agentic skills for Claude Code, Cursor, Codex CLI, Gemini CLI, Antigravity, and more. Includes installer CLI, bundles, workflows, and official/community skill collections.","skill_md_sha":"e2fa2623271e6ae2397fc826dd330293ea8ae583","skill_md_path":"skills/ab-test-setup/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/sickn33/antigravity-awesome-skills/tree/main/skills/ab-test-setup"},"layout":"multi","source":"github","category":"antigravity-awesome-skills","frontmatter":{"name":"ab-test-setup","description":"Structured guide for setting up A/B tests with mandatory gates for hypothesis, metrics, and execution readiness."},"skills_sh_url":"https://skills.sh/sickn33/antigravity-awesome-skills/ab-test-setup"},"updatedAt":"2026-04-25T06:50:21.707Z"}}