{"id":"d97945f0-35b5-4f29-8163-729f01c5c764","shortId":"g8sn6C","kind":"skill","title":"usability-testing","tagline":"Plan, run, and synthesize usability tests: test plan, tasks, script, findings, recommendations.","description":"# Usability Testing\n\n## Scope\n\n**Covers**\n- Designing task-based usability studies tied to a specific product decision\n- Testing live flows, prototypes, and “faked” implementations (fake door, Wizard of Oz)\n- Running moderated sessions (remote or in-person) and capturing high-quality evidence\n- Turning findings into a prioritized fix list (including high-ROI microcopy/CTA improvements)\n\n**When to use**\n- “Create a usability test plan and script for <flow>.”\n- “We need to test a prototype with 5–8 users next week.”\n- “Validate a value proposition before building (fake door / Wizard of Oz).”\n- “Help me synthesize usability findings into a prioritized backlog.”\n\n**When NOT to use**\n- You need statistically reliable estimates or causal impact (use analytics/experimentation)\n- You need open-ended discovery (“what problems do users have?”) without a specific flow to evaluate (use `conducting-user-interviews`)\n- You need a **design critique or heuristic review** without live user sessions (use `running-design-reviews`)\n- You need to **write specs or design docs** for a feature, not test an existing flow (use `writing-specs-designs`)\n- You need to apply **behavioral/persuasion design patterns** to a flow (use `behavioral-product-design`); this skill evaluates usability, not designs behavioral nudges\n- You’re working with high-risk populations or sensitive topics (medical, legal, minors) without appropriate approvals/training\n- You don’t have a concrete scenario/flow to evaluate (clarify the decision first)\n\n## Inputs\n\n**Minimum required**\n- Product + target user segment (who, context of use)\n- The decision this test should inform (what will change) + timeline\n- What you’re testing (flow/feature) + prototype/build link (or “recommend stimulus”)\n- Platform + environment (web/mobile/desktop; remote/in-person)\n- Constraints: session type, number of participants, incentives, recording policy, privacy constraints\n\n**Missing-info strategy**\n- Ask up to 5 questions from [references/INTAKE.md](references/INTAKE.md).\n- If still unknown, proceed with explicit assumptions and list **Open questions** that would change the plan.\n\n## Outputs (deliverables)\n\nProduce a **Usability Test Pack** in Markdown (in-chat; or as files if requested):\n\n1) **Context snapshot** (decision, users, what’s being tested, constraints)\n2) **Test plan** (method, prototype strategy, hypotheses/risks, success criteria)\n3) **Participant plan** (criteria, recruiting channels, schedule + backups)\n4) **Moderator guide + task script** (neutral tasks, probes, wrap-up)\n5) **Note-taking template + issue log** (severity/impact, evidence)\n6) **Synthesis readout** (findings, prioritized issues, recommendations, quick wins)\n7) **Risks / Open questions / Next steps** (always included)\n\nTemplates: [references/TEMPLATES.md](references/TEMPLATES.md)  \nExpanded heuristics: [references/WORKFLOW.md](references/WORKFLOW.md)\n\n## Workflow (8 steps)\n\n### 1) Frame the decision and the “why now”\n- **Inputs:** User context; [references/INTAKE.md](references/INTAKE.md).\n- **Actions:** Define the decision, primary unknowns, and the minimum you need to learn to make the call.\n- **Outputs:** Context snapshot + research questions/hypotheses.\n- **Checks:** You can answer: “What will we do differently after this test?”\n\n### 2) Choose the right stimulus (real vs prototype vs faked)\n- **Inputs:** What’s being tested; constraints.\n- **Actions:** Select the cheapest valid setup: live product, clickable prototype, fake door, Wizard of Oz, or concierge flow.\n- **Outputs:** Prototype strategy + what will be real vs simulated.\n- **Checks:** The setup tests the core value/behavior (not pixel perfection).\n\n### 3) Define tasks and success criteria (keep it neutral)\n- **Inputs:** User goals + scenarios.\n- **Actions:** Write 5–8 realistic tasks (each with a starting state), success criteria, and key observables (hesitation, errors, workarounds).\n- **Outputs:** Task list (draft) + observation plan.\n- **Checks:** Tasks don’t reveal UI labels (“Click the X button”); they reflect real intent.\n\n### 4) Pick participants + recruiting plan (include buffers)\n- **Inputs:** Target segment, access to users.\n- **Actions:** Set inclusion/exclusion criteria; choose channels; build a schedule with backups and slack for no-shows and busy participants.\n- **Outputs:** Participant plan + recruiting copy/screener (as needed).\n- **Checks:** Participants match the scenario (behavior/context), not just demographics.\n\n### 5) Build the moderator guide + instrumentation\n- **Inputs:** Task list + prototype.\n- **Actions:** Create the script (intro/consent, warm-up, tasks, probes, wrap-up). Assign note-taker roles; decide what to record.\n- **Outputs:** Moderator guide + notes template + issue log.\n- **Checks:** The guide avoids leading questions and includes “what would you do next?” probes.\n\n### 6) Run sessions and capture evidence (optional “reality checks”)\n- **Inputs:** Guide, logistics, participants.\n- **Actions:** Run sessions; capture verbatims, errors, rough time-on-task, and moments of confusion. Optionally observe comparable flows “in the wild.”\n- **Outputs:** Completed notes per session + populated issue log.\n- **Checks:** Every issue has at least one concrete example (quote/screenshot/time/step) attached.\n\n### 7) Synthesize into prioritized fixes (micro wins count)\n- **Inputs:** Notes + issue log.\n- **Actions:** Cluster issues; label severity and frequency; connect to funnel/business impact; propose fixes (including microcopy/CTA tweaks).\n- **Outputs:** Synthesis readout + prioritized recommendations/backlog.\n- **Checks:** Each recommendation ties to evidence and an expected impact (directional).\n\n### 8) Share, decide, and run the quality gate\n- **Inputs:** Draft pack.\n- **Actions:** Produce a shareable readout, propose next steps (design iteration, follow-up test, experiment). Run [references/CHECKLISTS.md](references/CHECKLISTS.md) and score [references/RUBRIC.md](references/RUBRIC.md).\n- **Outputs:** Final Usability Test Pack + Risks/Open questions/Next steps.\n- **Checks:** A stakeholder can make a “ship / fix / retest” decision asynchronously.\n\n## Quality gate (required)\n- Use [references/CHECKLISTS.md](references/CHECKLISTS.md) and [references/RUBRIC.md](references/RUBRIC.md).\n- Always include: **Risks**, **Open questions**, **Next steps**.\n\n## Anti-patterns (common failure modes)\n\n1. **Task-label leakage** — Writing tasks like “Click the Settings gear icon” instead of “Change your notification preferences.” Tasks should reflect user intent, not reveal UI labels or locations.\n2. **Happy-path-only testing** — Only testing the golden path and missing error states, edge cases, and recovery flows. Include at least one task that tests what happens when things go wrong.\n3. **Moderator bias / leading** — Helping participants when they struggle (“Try clicking there”) instead of letting them work through confusion. The struggle IS the data; document it, don’t fix it.\n4. **Over-indexing on opinions** — Asking “Did you like it?” after each task instead of observing behavior. Post-task ratings are supplementary; observed friction, errors, and workarounds are the primary signal.\n5. **Severity-blind issue list** — Listing all issues as equal without severity/frequency classification. A cosmetic label issue and a flow-blocking error require different urgency; classify every finding.\n\n## Examples\n\n**Example 1 (Prototype test):** “Create a usability test plan + moderator guide to evaluate our new onboarding flow (web) with 6 first-time users next week.”\nExpected: full Usability Test Pack with neutral tasks, recruiting criteria, session logistics, and a synthesis structure.\n\n**Example 2 (Wizard of Oz):** “We want to test an ‘AI auto-triage’ feature before building it. Design a Wizard of Oz usability test plan and script for 5 sessions.”\nExpected: stimulus plan defining what’s simulated, tasks focused on value, and an issue log + readout.\n\n**Boundary example (redirect to conducting-user-interviews):** “We don’t have a prototype yet, but we want to understand what problems users face during onboarding.”\nResponse: redirect to `conducting-user-interviews` for open-ended discovery; return here once you have a concrete flow or prototype to evaluate.\n\n**Boundary example (redirect to running-design-reviews):** “Review our new checkout designs for usability issues without running user sessions.”\nResponse: redirect to `running-design-reviews` for expert heuristic evaluation; this skill requires live user sessions with task-based observation.\n\n**Boundary example (causality):** “Run a usability test to prove the redesign will increase retention by 10%.”\nResponse: explain limits of small-n usability; recommend pairing with instrumentation/experimentation for causality and use usability to diagnose friction.","tags":["usability","testing","lenny","skills","plus","liqiongyu","agent-skills","ai-agents","automation","claude","codex","prompt-engineering"],"capabilities":["skill","source-liqiongyu","skill-usability-testing","topic-agent-skills","topic-ai-agents","topic-automation","topic-claude","topic-codex","topic-prompt-engineering","topic-refoundai","topic-skillpack"],"categories":["lenny_skills_plus"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/liqiongyu/lenny_skills_plus/usability-testing","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add liqiongyu/lenny_skills_plus","source_repo":"https://github.com/liqiongyu/lenny_skills_plus","install_from":"skills.sh"}},"qualityScore":"0.474","qualityRationale":"deterministic score 0.47 from registry signals: · indexed on github topic:agent-skills · 49 github stars · SKILL.md body (8,840 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-04-22T00:56:26.067Z","embedding":null,"createdAt":"2026-04-18T22:17:12.196Z","updatedAt":"2026-04-22T00:56:26.067Z","lastSeenAt":"2026-04-22T00:56:26.067Z","tsv":"'1':332,406,833,991 '10':1186 '2':342,453,863,1033 '3':351,506,896 '4':359,559,926 '5':89,294,370,521,608,959,1061 '6':379,661,1009 '7':388,715 '8':90,404,522,759 'access':569 'action':419,469,519,572,618,674,727,770 'ai':1042 'alway':394,820 'analytics/experimentation':127 'answer':444 'anti':828 'anti-pattern':827 'appli':191 'appropri':226 'approvals/training':227 'ask':291,932 'assign':631 'assumpt':305 'asynchron':810 'attach':714 'auto':1044 'auto-triag':1043 'avoid':650 'backlog':113 'backup':358,582 'base':23,1169 'behavior':200,209,943 'behavior/context':604 'behavioral-product-design':199 'behavioral/persuasion':192 'bias':898 'blind':962 'block':981 'boundari':1079,1129,1171 'buffer':565 'build':99,578,609,1048 'busi':590 'button':554 'call':435 'captur':53,665,677 'case':879 'causal':124,1173,1200 'chang':260,312,848 'channel':356,577 'chat':326 'cheapest':472 'check':441,496,544,599,647,669,704,748,800 'checkout':1140 'choos':454,576 'clarifi':237 'classif':972 'classifi':986 'click':551,841,906 'clickabl':477 'cluster':728 'common':830 'compar':691 'complet':697 'concierg':485 'concret':233,711,1123 'conduct':147,1084,1109 'conducting-user-interview':146,1083,1108 'confus':688,914 'connect':734 'constraint':276,286,341,468 'context':249,333,416,437 'copy/screener':596 'core':501 'cosmet':974 'count':722 'cover':19 'creat':74,619,994 'criteria':350,354,511,531,575,1025 'critiqu':154 'data':919 'decid':636,761 'decis':31,239,253,335,409,422,809 'defin':420,507,1066 'deliver':316 'demograph':607 'design':20,153,165,173,187,193,202,208,778,1050,1135,1141,1154 'diagnos':1205 'differ':449,984 'direct':758 'discoveri':133,1116 'doc':174 'document':920 'door':40,101,480 'draft':541,768 'edg':878 'end':132,1115 'environ':273 'equal':969 'error':536,679,876,952,982 'estim':122 'evalu':144,205,236,1002,1128,1159 'everi':705,987 'evid':57,378,666,753 'exampl':712,989,990,1032,1080,1130,1172 'exist':181 'expand':399 'expect':756,1016,1063 'experi':784 'expert':1157 'explain':1188 'explicit':304 'face':1102 'failur':831 'fake':37,39,100,462,479 'featur':177,1046 'file':329 'final':793 'find':14,59,109,382,988 'first':240,1011 'first-tim':1010 'fix':63,719,739,807,924 'flow':34,142,182,197,486,692,882,980,1006,1124 'flow-block':979 'flow/feature':266 'focus':1071 'follow':781 'follow-up':780 'frame':407 'frequenc':733 'friction':951,1206 'full':1017 'funnel/business':736 'gate':766,812 'gear':844 'go':894 'goal':517 'golden':872 'guid':361,612,642,649,671,1000 'happen':891 'happi':865 'happy-path-on':864 'help':105,900 'hesit':535 'heurist':156,400,1158 'high':55,67,216 'high-qual':54 'high-risk':215 'high-roi':66 'hypotheses/risks':348 'icon':845 'impact':125,737,757 'implement':38 'improv':70 'in-chat':324 'in-person':49 'incent':282 'includ':65,395,564,654,740,821,883 'inclusion/exclusion':574 'increas':1183 'index':929 'info':289 'inform':257 'input':241,414,463,515,566,614,670,723,767 'instead':846,908,940 'instrument':613 'instrumentation/experimentation':1198 'intent':558,856 'interview':149,1086,1111 'intro/consent':622 'issu':375,384,645,702,706,725,729,963,967,976,1076,1144 'iter':779 'keep':512 'key':533 'label':550,730,836,860,975 'lead':651,899 'leakag':837 'learn':431 'least':709,885 'legal':223 'let':910 'like':840,935 'limit':1189 'link':268 'list':64,307,540,616,964,965 'live':33,159,475,1163 'locat':862 'log':376,646,703,726,1077 'logist':672,1027 'make':433,804 'markdown':323 'match':601 'medic':222 'method':345 'micro':720 'microcopy/cta':69,741 'minimum':242,427 'minor':224 'miss':288,875 'missing-info':287 'mode':832 'moder':45,360,611,641,897,999 'moment':686 'n':1193 'need':83,119,129,151,168,189,429,598 'neutral':364,514,1022 'new':1004,1139 'next':92,392,659,776,825,1014 'no-show':586 'note':372,633,643,698,724 'note-tak':371,632 'notif':850 'nudg':210 'number':279 'observ':534,542,690,942,950,1170 'onboard':1005,1104 'one':710,886 'open':131,308,390,823,1114 'open-end':130,1113 'opinion':931 'option':667,689 'output':315,436,487,538,592,640,696,743,792 'over-index':927 'oz':43,104,483,1036,1054 'pack':321,769,796,1020 'pair':1196 'particip':281,352,561,591,593,600,673,901 'path':866,873 'pattern':194,829 'per':699 'perfect':505 'person':51 'pick':560 'pixel':504 'plan':4,11,78,314,344,353,543,563,594,998,1057,1065 'platform':272 'polici':284 'popul':218,701 'post':945 'post-task':944 'prefer':851 'primari':423,957 'priorit':62,112,383,718,746 'privaci':285 'probe':366,627,660 'problem':135,1100 'proceed':302 'produc':317,771 'product':30,201,244,476 'propos':738,775 'proposit':97 'prototyp':35,87,346,460,478,488,617,992,1092,1126 'prototype/build':267 'prove':1179 'qualiti':56,765,811 'question':295,309,391,652,824 'questions/hypotheses':440 'questions/next':798 'quick':386 'quote/screenshot/time/step':713 'rate':947 're':212,264 'readout':381,745,774,1078 'real':458,493,557 'realist':523 'realiti':668 'recommend':15,270,385,750,1195 'recommendations/backlog':747 'record':283,639 'recoveri':881 'recruit':355,562,595,1024 'redesign':1181 'redirect':1081,1106,1131,1150 'references/checklists.md':786,787,815,816 'references/intake.md':297,298,417,418 'references/rubric.md':790,791,818,819 'references/templates.md':397,398 'references/workflow.md':401,402 'reflect':556,854 'reliabl':121 'remot':47 'remote/in-person':275 'request':331 'requir':243,813,983,1162 'research':439 'respons':1105,1149,1187 'retent':1184 'retest':808 'return':1117 'reveal':548,858 'review':157,166,1136,1137,1155 'right':456 'risk':217,389,822 'risks/open':797 'roi':68 'role':635 'rough':680 'run':5,44,164,662,675,763,785,1134,1146,1153,1174 'running-design-review':163,1133,1152 'scenario':518,603 'scenario/flow':234 'schedul':357,580 'scope':18 'score':789 'script':13,80,363,621,1059 'segment':247,568 'select':470 'sensit':220 'session':46,161,277,663,676,700,1026,1062,1148,1165 'set':573,843 'setup':474,498 'sever':731,961 'severity-blind':960 'severity/frequency':971 'severity/impact':377 'share':760 'shareabl':773 'ship':806 'show':588 'signal':958 'simul':495,1069 'skill':204,1161 'skill-usability-testing' 'slack':584 'small':1192 'small-n':1191 'snapshot':334,438 'source-liqiongyu' 'spec':171,186 'specif':29,141 'stakehold':802 'start':528 'state':529,877 'statist':120 'step':393,405,777,799,826 'still':300 'stimulus':271,457,1064 'strategi':290,347,489 'structur':1031 'struggl':904,916 'studi':25 'success':349,510,530 'supplementari':949 'synthes':7,107,716 'synthesi':380,744,1030 'take':373 'taker':634 'target':245,567 'task':12,22,362,365,508,524,539,545,615,626,684,835,839,852,887,939,946,1023,1070,1168 'task-bas':21,1167 'task-label':834 'templat':374,396,644 'test':3,9,10,17,32,77,85,179,255,265,320,340,343,452,467,499,783,795,868,870,889,993,997,1019,1040,1056,1177 'thing':893 'tie':26,751 'time':682,1012 'time-on-task':681 'timelin':261 'topic':221 'topic-agent-skills' 'topic-ai-agents' 'topic-automation' 'topic-claude' 'topic-codex' 'topic-prompt-engineering' 'topic-refoundai' 'topic-skillpack' 'tri':905 'triag':1045 'turn':58 'tweak':742 'type':278 'ui':549,859 'understand':1098 'unknown':301,424 'urgenc':985 'usability-test':1 'usabl':2,8,16,24,76,108,206,319,794,996,1018,1055,1143,1176,1194,1203 'use':73,117,126,145,162,183,198,251,814,1202 'user':91,137,148,160,246,336,415,516,571,855,1013,1085,1101,1110,1147,1164 'valid':94,473 'valu':96,1073 'value/behavior':502 'verbatim':678 'vs':459,461,494 'want':1038,1096 'warm':624 'warm-up':623 'web':1007 'web/mobile/desktop':274 'week':93,1015 'wild':695 'win':387,721 'without':139,158,225,970,1145 'wizard':41,102,481,1034,1052 'work':213,912 'workaround':537,954 'workflow':403 'would':311,656 'wrap':368,629 'wrap-up':367,628 'write':170,185,520,838 'writing-specs-design':184 'wrong':895 'x':553 'yet':1093","prices":[{"id":"88997f77-a148-4458-af31-5f4a4394e1eb","listingId":"d97945f0-35b5-4f29-8163-729f01c5c764","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"liqiongyu","category":"lenny_skills_plus","install_from":"skills.sh"},"createdAt":"2026-04-18T22:17:12.196Z"}],"sources":[{"listingId":"d97945f0-35b5-4f29-8163-729f01c5c764","source":"github","sourceId":"liqiongyu/lenny_skills_plus/usability-testing","sourceUrl":"https://github.com/liqiongyu/lenny_skills_plus/tree/main/skills/usability-testing","isPrimary":false,"firstSeenAt":"2026-04-18T22:17:12.196Z","lastSeenAt":"2026-04-22T00:56:26.067Z"}],"details":{"listingId":"d97945f0-35b5-4f29-8163-729f01c5c764","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"liqiongyu","slug":"usability-testing","github":{"repo":"liqiongyu/lenny_skills_plus","stars":49,"topics":["agent-skills","ai-agents","automation","claude","codex","prompt-engineering","refoundai","skillpack"],"license":"apache-2.0","html_url":"https://github.com/liqiongyu/lenny_skills_plus","pushed_at":"2026-04-04T06:30:11Z","description":"86 agent-executable skill packs converted from RefoundAI’s Lenny skills (unofficial). Works with Codex + Claude Code.","skill_md_sha":"0c7e2cc44c167ee940e8695aa17541da8e937fc5","skill_md_path":"skills/usability-testing/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/liqiongyu/lenny_skills_plus/tree/main/skills/usability-testing"},"layout":"multi","source":"github","category":"lenny_skills_plus","frontmatter":{"name":"usability-testing","description":"Plan, run, and synthesize usability tests: test plan, tasks, script, findings, recommendations."},"skills_sh_url":"https://skills.sh/liqiongyu/lenny_skills_plus/usability-testing"},"updatedAt":"2026-04-22T00:56:26.067Z"}}