{"id":"fa8e776d-5a6f-4bb5-8745-32a6f547576f","shortId":"ZjjWpp","kind":"skill","title":"twin-test","tagline":"GAN-style identity verification -- tests clone fidelity by comparing clone responses against real user messages. Run /twin-test to start a blind taste test, or /twin-test score to see your fidelity score over time.","description":"# Twin Test -- Adversarial Clone Fidelity Check\n\nA blind taste test where the clone generates responses to the same contexts as real user messages, then a discriminator identifies which is real and which is the clone. Specific style corrections feed back into the user model.\n\n## How It Works\n\n1. **Sample** -- Pull 3-5 real sent messages from memory (the \"ground truth\")\n2. **Generate** -- For each message, generate a clone response to the same context\n3. **Discriminate** -- A separate agent compares pairs and identifies the real message\n4. **Score** -- Calculate fidelity (% of times the discriminator is fooled)\n5. **Correct** -- Extract specific style corrections from discriminator feedback\n\n## Commands\n\n- `/twin-test` -- Run a full twin test session (3-5 message pairs)\n- `/twin-test score` -- Show fidelity score history\n\n## Session Protocol\n\nWhen the user invokes `/twin-test`, follow this exact protocol:\n\n### Phase 1: Sample Selection\n\n1. Call `memory_search` with category \"exemplar\" to find high-quality real messages\n2. If no exemplars, search for sent messages in memory (source: \"sent\", direction: \"outgoing\")\n3. Select 3-5 diverse messages that cover different contexts (work, casual, technical)\n4. For each message, extract the conversational context (what was the user replying to?)\n\n### Phase 2: Clone Generation\n\nFor each sampled message:\n\n1. Reconstruct the context: who was the user talking to, what was the conversation about\n2. Generate YOUR response to the same context, using all personality data (user model, style profiles, exemplars, values)\n3. Try your absolute best to match the user's voice -- this is the test\n\nImportant: generate responses BEFORE showing the user any results. Do not look at the real message while generating.\n\n### Phase 3: Discrimination\n\nFor each message pair (real + clone):\n\n1. Present both messages to yourself in randomized order (A and B)\n2. Act as a discriminator: which message is the real user and which is the clone?\n3. Explain your reasoning: what specific markers distinguish the messages?\n4. Note the confidence of your assessment\n\n### Phase 4: Results\n\nPresent a summary:\n\n```\nTwin Test Results\n=================\n\nFidelity Score: XX% (X/Y pairs where discriminator was fooled)\n\nPair 1: [context summary]\n  Real: \"...\" (correctly/incorrectly identified)\n  Clone: \"...\"\n  Discriminator notes: [what gave it away]\n\nPair 2: ...\n\nStyle Corrections:\n- [specific corrections based on discriminator feedback]\n```\n\n### Phase 5: Corrections\n\nFor each pair where the discriminator correctly identified the clone:\n\n1. Extract what was different (tone, word choice, length, punctuation, emoji usage, formality)\n2. Store these as corrections in the user model at confidence 0.8\n3. Update style profiles if specific style markers were identified\n4. Tell the user what you learned\n\n### Phase 6: User Review\n\nAsk the user:\n\n- \"Were my assessments accurate? Did I identify the right messages as real?\"\n- \"Any of these clone responses that were actually close to what you'd say?\"\n- Accept corrections and store them\n\n## Important Rules\n\n- **Blind generation** -- generate clone responses BEFORE comparing to real messages\n- **Honest assessment** -- if you can't tell which is real, say so (that's a good fidelity sign)\n- **Specific corrections** -- \"slightly too formal\" is better than \"didn't match\"\n- **Store corrections at 0.8** -- adversarial testing is high-quality signal\n- **Track over time** -- compare against previous twin-test scores\n- **Diverse samples** -- try to test across different contexts and platforms\n- **Don't game it** -- the goal is honest assessment of where the clone falls short","tags":["twin","test","nomos","project-nomos","agent-memory","agent-skills","agentic-ai","ai-agents","ai-assistant","autonomous-agents","claude","claude-ai"],"capabilities":["skill","source-project-nomos","skill-twin-test","topic-agent-memory","topic-agent-skills","topic-agentic-ai","topic-ai-agents","topic-ai-assistant","topic-autonomous-agents","topic-claude","topic-claude-ai","topic-claude-code","topic-claude-skills","topic-digital-clone","topic-llm"],"categories":["nomos"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/project-nomos/nomos/twin-test","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add project-nomos/nomos","source_repo":"https://github.com/project-nomos/nomos","install_from":"skills.sh"}},"qualityScore":"0.457","qualityRationale":"deterministic score 0.46 from registry signals: · indexed on github topic:agent-skills · 14 github stars · SKILL.md body (3,656 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-04-22T01:02:20.036Z","embedding":null,"createdAt":"2026-04-21T19:04:09.031Z","updatedAt":"2026-04-22T01:02:20.036Z","lastSeenAt":"2026-04-22T01:02:20.036Z","tsv":"'-5':89,151,206 '/twin-test':21,29,143,154,166 '0.8':437,537 '1':85,172,175,238,313,377,413 '2':98,189,231,253,325,391,426 '3':88,111,150,203,205,271,305,341,438 '4':123,216,351,359,448 '5':133,401 '6':456 'absolut':274 'accept':488 'accur':465 'across':560 'act':326 'actual':481 'adversari':40,538 'agent':115 'ask':459 'assess':357,464,506,573 'away':389 'b':324 'back':77 'base':396 'best':275 'better':529 'blind':25,45,495 'calcul':125 'call':176 'casual':214 'categori':180 'check':43 'choic':420 'clone':10,14,41,50,72,105,232,312,340,383,412,477,498,577 'close':482 'command':142 'compar':13,116,501,548 'confid':354,436 'context':56,110,212,223,241,260,378,562 'convers':222,251 'correct':75,134,138,393,395,402,409,430,489,524,535 'correctly/incorrectly':381 'cover':210 'd':486 'data':264 'didn':531 'differ':211,417,561 'direct':201 'discrimin':63,112,130,140,306,329,373,384,398,408 'distinguish':348 'divers':207,555 'emoji':423 'exact':169 'exemplar':181,192,269 'explain':342 'extract':135,220,414 'fall':578 'feed':76 'feedback':141,399 'fidel':11,34,42,126,157,367,521 'find':183 'follow':167 'fool':132,375 'formal':425,527 'full':146 'game':567 'gan':5 'gan-styl':4 'gave':387 'generat':51,99,103,233,254,287,303,496,497 'goal':570 'good':520 'ground':96 'high':185,542 'high-qual':184,541 'histori':159 'honest':505,572 'ident':7 'identifi':64,119,382,410,447,468 'import':286,493 'invok':165 'learn':454 'length':421 'look':297 'marker':347,445 'match':277,533 'memori':94,177,198 'messag':19,60,92,102,122,152,188,196,208,219,237,301,309,316,331,350,471,504 'model':81,266,434 'note':352,385 'order':321 'outgo':202 'pair':117,153,310,371,376,390,405 'person':263 'phase':171,230,304,358,400,455 'platform':564 'present':314,361 'previous':550 'profil':268,441 'protocol':161,170 'pull':87 'punctuat':422 'qualiti':186,543 'random':320 'real':17,58,67,90,121,187,300,311,334,380,473,503,514 'reason':344 'reconstruct':239 'repli':228 'respons':15,52,106,256,288,478,499 'result':294,360,366 'review':458 'right':470 'rule':494 'run':20,144 'sampl':86,173,236,556 'say':487,515 'score':30,35,124,155,158,368,554 'search':178,193 'see':32 'select':174,204 'sent':91,195,200 'separ':114 'session':149,160 'short':579 'show':156,290 'sign':522 'signal':544 'skill' 'skill-twin-test' 'slight':525 'sourc':199 'source-project-nomos' 'specif':73,136,346,394,443,523 'start':23 'store':427,491,534 'style':6,74,137,267,392,440,444 'summari':363,379 'talk':246 'tast':26,46 'technic':215 'tell':449,511 'test':3,9,27,39,47,148,285,365,539,553,559 'time':37,128,547 'tone':418 'topic-agent-memory' 'topic-agent-skills' 'topic-agentic-ai' 'topic-ai-agents' 'topic-ai-assistant' 'topic-autonomous-agents' 'topic-claude' 'topic-claude-ai' 'topic-claude-code' 'topic-claude-skills' 'topic-digital-clone' 'topic-llm' 'track':545 'tri':272,557 'truth':97 'twin':2,38,147,364,552 'twin-test':1,551 'updat':439 'usag':424 'use':261 'user':18,59,80,164,227,245,265,279,292,335,433,451,457,461 'valu':270 'verif':8 'voic':281 'word':419 'work':84,213 'x/y':370 'xx':369","prices":[{"id":"afc04422-c11f-49c1-84e3-15617a3029b2","listingId":"fa8e776d-5a6f-4bb5-8745-32a6f547576f","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"project-nomos","category":"nomos","install_from":"skills.sh"},"createdAt":"2026-04-21T19:04:09.031Z"}],"sources":[{"listingId":"fa8e776d-5a6f-4bb5-8745-32a6f547576f","source":"github","sourceId":"project-nomos/nomos/twin-test","sourceUrl":"https://github.com/project-nomos/nomos/tree/main/skills/twin-test","isPrimary":false,"firstSeenAt":"2026-04-21T19:04:09.031Z","lastSeenAt":"2026-04-22T01:02:20.036Z"}],"details":{"listingId":"fa8e776d-5a6f-4bb5-8745-32a6f547576f","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"project-nomos","slug":"twin-test","github":{"repo":"project-nomos/nomos","stars":14,"topics":["agent-memory","agent-skills","agentic-ai","ai-agents","ai-assistant","autonomous-agents","claude","claude-ai","claude-code","claude-skills","digital-clone","llm","mcp","multi-agent","multi-agent-systems","ollama","self-hosted"],"license":"mit","html_url":"https://github.com/project-nomos/nomos","pushed_at":"2026-04-18T00:18:33Z","description":"Your AI digital clone — learns who you are, acts on your behalf, remembers everything. Persistent vector memory, multi-agent teams, 60+ skills, smart model routing. Self-hosted, encrypted, multi-provider (Claude/Ollama/OpenRouter). Deploy to Slack, Discord, Telegram, WhatsApp & more in minutes.","skill_md_sha":"d34aee5aa90ce7c883cb66f1fb8e8c2cd934a990","skill_md_path":"skills/twin-test/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/project-nomos/nomos/tree/main/skills/twin-test"},"layout":"multi","source":"github","category":"nomos","frontmatter":{"name":"twin-test","description":"GAN-style identity verification -- tests clone fidelity by comparing clone responses against real user messages. Run /twin-test to start a blind taste test, or /twin-test score to see your fidelity score over time."},"skills_sh_url":"https://skills.sh/project-nomos/nomos/twin-test"},"updatedAt":"2026-04-22T01:02:20.036Z"}}