{"id":"1ab4e674-74db-46dd-bb99-468507c27392","shortId":"kd7sCF","kind":"skill","title":"generate-synthetic-data","tagline":"Generate synthetic test data for LLM evaluations using dimension-based tuple expansion. Use when the user needs synthetic traces, test cases, eval datasets, or when create-evaluation needs synthetic fallback data.","description":"# Generate Synthetic Data\n\nGenerate realistic synthetic traces for LLM evaluation datasets using dimension-based variation.\n\n## When to use\n\n- User needs test data for an evaluation but has no production traces.\n- `create-evaluation` delegates here when real traces are unavailable.\n- User wants to augment sparse real data with targeted synthetic examples.\n\n<HARD-GATE>\nDo NOT generate any synthetic data until scoping is complete and the user has approved the generation plan (dimensions, tuple count, trace structure, output destination).\n</HARD-GATE>\n\n<HARD-GATE>\nBEFORE the first scoping question, search for a structured question tool (e.g., `AskUserQuestion` or similar interactive widget) and load it. Use that tool for EVERY scoping question. Fall back to plain-text lettered options ONLY if no such tool exists in the environment.\n</HARD-GATE>\n\n## Scoping protocol\n\nAsk these five questions. Skip any already answered in the conversation (e.g., if `create-evaluation` already established the system type, do not re-ask).\n\n1. **System type.** What kind of AI system produces the traces?\n   - Simple RAG, tool-calling agent, multi-turn chat, support bot, classification pipeline, other\n2. **Trace structure.** What columns does each trace contain?\n   - Offer common patterns based on system type:\n     - Simple RAG: `user_query`, `retrieved_context`, `response`\n     - Tool-calling agent: `user_request`, `tool_calls`, `final_answer`\n     - Multi-turn chat: `conversation_history`, `assistant_response`\n     - Support bot: `customer_message`, `kb_lookup`, `agent_reply`\n   - Let the user rename, add, or remove columns\n3. **Dimensions of variation.** What axes should drive diversity?\n   - Propose 3-5 starter dimensions based on system type and known failure modes\n   - Each dimension needs 3-6 discrete values\n   - Example for RAG: query complexity (simple factual, multi-hop, ambiguous, comparative), domain coverage (billing, technical support, account management), context quality (perfect match, partial match, irrelevant, missing)\n4. **Dataset size.** How many final traces?\n   - Default recommendation: 50-100 for initial eval scoping, 200+ for statistical significance\n5. **Output destination.** Where should the data go?\n   - Default: MCP upload when called from `create-evaluation`, file when standalone\n   - Options: Truesight dataset (via MCP), JSONL file, CSV file, both\n\nUse the structured question tool (loaded per the HARD-GATE above) for every question. One question per message.\n\n## Core methodology\n\nFollow this sequence exactly. Do not skip steps or combine them.\n\n### Step 1: Draft seed tuples with the user\n\nGenerate ~20 tuples as dimension-value combinations. Each tuple is a row of dimension values that will become one trace.\n\nFormat tuples as a table:\n\n| # | query_complexity | domain | context_quality | edge_case |\n|---|-----------------|--------|-----------------|-----------|\n| 1 | simple factual | billing | perfect match | none |\n| 2 | multi-hop | technical support | partial match | none |\n| 3 | ambiguous | compliance | irrelevant | non-English |\n\nRules:\n- Cover every dimension value at least once across the ~20 tuples.\n- Avoid uniform distribution. Weight toward failure-prone combinations.\n- Present the table to the user and ask: \"Do these combinations represent realistic scenarios your system encounters? Any to add, remove, or adjust?\"\n\nDo not proceed until the user validates the tuples.\n\n### Step 2: LLM-expand tuples\n\nAfter user approval of seed tuples:\n- Generate 10+ additional tuples using the same dimensions.\n- No duplicate dimension-value combinations allowed.\n- Prioritize underrepresented dimension values and novel cross-dimension pairings.\n- Present the expanded set for optional user review (do not block on this).\n\n### Step 3: Convert tuples to natural language (two-step)\n\nThis is the key quality technique. Do NOT generate traces in a single step.\n\n**Step 3a: Tuple to scenario sketch.**\nFor each tuple, write a 1-2 sentence scenario description that captures the dimension values in natural terms.\n\nExample tuple: `(multi-hop, billing, partial match, none)`\nScenario: \"A customer asks whether upgrading their plan mid-cycle affects their next invoice and any unused credits. The knowledge base has pricing docs but nothing about proration.\"\n\n**Step 3b: Scenario to full trace.**\nConvert each scenario sketch into the full trace structure (matching the columns from scoping).\n\nExample trace for the above scenario:\n```json\n{\n  \"user_query\": \"If I upgrade from Basic to Pro halfway through my billing cycle, will my next invoice be higher? And what happens to the unused days on Basic?\",\n  \"retrieved_context\": \"Pro plan costs $49/month. Basic plan costs $19/month. Upgrades take effect immediately.\",\n  \"response\": \"When you upgrade mid-cycle, your next invoice will reflect the Pro plan price of $49/month. The remaining days on your Basic plan will be prorated as a credit on your next invoice.\"\n}\n```\n\nWhy two steps: single-step generation produces repetitive phrasing and shallow variation. The scenario sketch forces diverse framing before trace generation locks in wording.\n\n### Step 4: Filter for quality\n\nReview generated traces and remove:\n- Awkward or unnatural phrasing\n- Dimension-value mismatches (trace doesn't reflect its tuple)\n- Near-duplicate traces (high textual similarity despite different tuples)\n- Traces that could not plausibly come from the target system\n\nReport how many traces survived filtering and the final count.\n\n## Output\n\n### MCP-first path (default when called from create-evaluation)\n\nInvoke the `upload_dataset` tool with:\n- `name` set to a descriptive dataset name\n- `columns` set to an array of all column names (input columns + any judgment/notes columns if provided by the caller)\n- `input_columns` set to the trace structure columns\n- `rows` set to the generated trace data\n- `idempotency_key` set to a unique string for safe retries\n\nIf the caller provided `judgment_configs`, pass them through to `upload_dataset`.\n\n### File-first path (default when standalone)\n\nWrite traces to the user's chosen format:\n- **JSONL** (default): one JSON object per line. Filename: `synthetic-traces-YYYY-MM-DD.jsonl`\n- **CSV**: standard CSV with headers matching trace columns. Filename: `synthetic-traces-YYYY-MM-DD.csv`\n\nAfter writing, offer to upload to Truesight via the `upload_dataset` tool.\n\n## Optional: pipeline execution\n\nIf the user has a live system available, recommend running the synthetic inputs through it to get real outputs. Synthetic inputs with real outputs are more valuable than fully synthetic traces for evaluation scoping.\n\n## Anti-patterns\n\n- Generating traces without dimension-based variation. This produces clustered, non-diverse data.\n- Single-step tuple-to-trace generation. This reduces phrasing diversity compared to two-step.\n- Dimensions disconnected from actual failure modes. This generates variety without evaluation value.\n- Skipping user validation of seed tuples. This risks generating unrealistic scenarios.\n- Using synthetic data where real traces are available. Synthetic is a fallback, not a preference.","tags":["generate","synthetic","data","truesight","mcp","skills","goodeye-labs","agent-skills","ai-evaluation","chatgpt","claude","cursor"],"capabilities":["skill","source-goodeye-labs","skill-generate-synthetic-data","topic-agent-skills","topic-ai-evaluation","topic-chatgpt","topic-claude","topic-cursor","topic-llm","topic-mcp","topic-truesight","topic-vscode","topic-windsurf"],"categories":["truesight-mcp-skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/Goodeye-Labs/truesight-mcp-skills/generate-synthetic-data","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add Goodeye-Labs/truesight-mcp-skills","source_repo":"https://github.com/Goodeye-Labs/truesight-mcp-skills","install_from":"skills.sh"}},"qualityScore":"0.453","qualityRationale":"deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 6 github stars · SKILL.md body (7,145 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T13:22:57.435Z","embedding":null,"createdAt":"2026-05-18T13:22:57.435Z","updatedAt":"2026-05-18T13:22:57.435Z","lastSeenAt":"2026-05-18T13:22:57.435Z","tsv":"'-100':336 '-2':610 '-5':281 '-6':296 '1':187,408,448,609 '10':537 '19/month':725 '2':213,455,525 '20':416,481 '200':341 '3':270,280,295,464,575 '3a':599 '3b':661 '4':326,791 '49/month':721,747 '5':345 '50':335 'account':316 'across':479 'actual':1045 'add':266,511 'addit':538 'adjust':514 'affect':642 'agent':203,239,260 'ai':193 'allow':550 'alreadi':167,177 'ambigu':309,465 'answer':168,245 'anti':1009 'anti-pattern':1008 'approv':104,532 'array':873 'ask':161,186,499,634 'askuserquest':127 'assist':252 'augment':82 'avail':981,1072 'avoid':483 'awkward':800 'axe':275 'back':143 'base':15,52,225,284,652,1016 'basic':693,715,722,753 'becom':433 'bill':313,451,627,699 'block':571 'bot':209,255 'call':202,238,243,357,851 'caller':887,915 'captur':615 'case':26,447 'chat':207,249 'chosen':938 'classif':210 'cluster':1020 'column':217,269,677,869,876,879,882,889,895,956 'combin':405,422,491,502,549 'come':829 'common':223 'compar':310,1037 'complet':99 'complex':303,442 'complianc':466 'config':918 'contain':221 'context':234,318,444,717 'convers':171,250 'convert':576,666 'core':394 'cost':720,724 'could':826 'count':110,843 'cover':472 'coverag':312 'creat':32,70,175,360,854 'create-evalu':31,69,174,359,853 'credit':649,760 'cross':558 'cross-dimens':557 'csv':372,949,951 'custom':256,633 'cycl':641,700,736 'data':4,8,37,40,60,85,95,351,902,1024,1067 'dataset':28,48,327,367,859,867,924,969 'day':713,750 'default':333,353,849,929,941 'deleg':72 'descript':613,866 'despit':821 'destin':114,347 'differ':822 'dimens':14,51,108,271,283,293,420,429,474,543,547,553,559,617,805,1015,1042 'dimension-bas':13,50,1014 'dimension-valu':419,546,804 'disconnect':1043 'discret':297 'distribut':485 'divers':278,782,1023,1036 'doc':655 'doesn':809 'domain':311,443 'draft':409 'drive':277 'duplic':545,816 'e.g':126,172 'edg':446 'effect':728 'encount':508 'english':470 'environ':158 'establish':178 'eval':27,339 'evalu':11,33,47,63,71,176,361,855,1006,1052 'everi':139,388,473 'exact':399 'exampl':89,299,622,680 'execut':973 'exist':155 'expand':528,563 'expans':17 'factual':305,450 'failur':290,489,1046 'failure-pron':488 'fall':142 'fallback':36,1076 'file':362,371,373,926 'file-first':925 'filenam':947,957 'filter':792,839 'final':244,331,842 'first':117,847,927 'five':163 'follow':396 'forc':781 'format':436,939 'frame':783 'full':664,672 'fulli':1002 'gate':385 'generat':2,5,38,41,92,106,415,536,592,771,786,796,900,1011,1032,1049,1062 'generate-synthetic-data':1 'get':990 'go':352 'halfway':696 'happen':709 'hard':384 'hard-gat':383 'header':953 'high':818 'higher':706 'histori':251 'hop':308,458,626 'idempot':903 'immedi':729 'initi':338 'input':878,888,986,994 'interact':130 'invoic':645,704,739,764 'invok':856 'irrelev':324,467 'json':686,943 'jsonl':370,940 'judgment':917 'judgment/notes':881 'kb':258 'key':587,904 'kind':191 'knowledg':651 'known':289 'languag':580 'least':477 'let':262 'letter':148 'line':946 'live':979 'llm':10,46,527 'llm-expand':526 'load':133,380 'lock':787 'lookup':259 'manag':317 'mani':330,836 'match':321,323,453,462,629,675,954 'mcp':354,369,846 'mcp-first':845 'messag':257,393 'methodolog':395 'mid':640,735 'mid-cycl':639,734 'mismatch':807 'miss':325 'mode':291,1047 'multi':205,247,307,457,625 'multi-hop':306,456,624 'multi-turn':204,246 'name':862,868,877 'natur':579,620 'near':815 'near-dupl':814 'need':22,34,58,294 'next':644,703,738,763 'non':469,1022 'non-divers':1021 'non-english':468 'none':454,463,630 'noth':657 'novel':556 'object':944 'offer':222,961 'one':390,434,942 'option':149,365,566,971 'output':113,346,844,992,997 'pair':560 'partial':322,461,628 'pass':919 'path':848,928 'pattern':224,1010 'per':381,392,945 'perfect':320,452 'phrase':774,803,1035 'pipelin':211,972 'plain':146 'plain-text':145 'plan':107,638,719,723,744,754 'plausibl':828 'prefer':1079 'present':492,561 'price':654,745 'priorit':551 'pro':695,718,743 'proceed':517 'produc':195,772,1019 'product':67 'prone':490 'propos':279 'prorat':659,757 'protocol':160 'provid':884,916 'qualiti':319,445,588,794 'queri':232,302,441,688 'question':119,124,141,164,378,389,391 'rag':199,230,301 're':185 're-ask':184 'real':75,84,991,996,1069 'realist':42,504 'recommend':334,982 'reduc':1034 'reflect':741,811 'remain':749 'remov':268,512,799 'renam':265 'repetit':773 'repli':261 'report':834 'repres':503 'request':241 'respons':235,253,730 'retri':912 'retriev':233,716 'review':568,795 'risk':1061 'row':427,896 'rule':471 'run':983 'safe':911 'scenario':505,602,612,631,662,668,685,779,1064 'scope':97,118,140,159,340,679,1007 'search':120 'seed':410,534,1058 'sentenc':611 'sequenc':398 'set':564,863,870,890,897,905 'shallow':776 'signific':344 'similar':129,820 'simpl':198,229,304,449 'singl':596,769,1026 'single-step':768,1025 'size':328 'sketch':603,669,780 'skill' 'skill-generate-synthetic-data' 'skip':165,402,1054 'source-goodeye-labs' 'spars':83 'standalon':364,931 'standard':950 'starter':282 'statist':343 'step':403,407,524,574,583,597,598,660,767,770,790,1027,1041 'string':909 'structur':112,123,215,377,674,894 'support':208,254,315,460 'surviv':838 'synthet':3,6,23,35,39,43,88,94,985,993,1003,1066,1073 'synthetic-traces-yyyy-mm-dd.csv':958 'synthetic-traces-yyyy-mm-dd.jsonl':948 'system':180,188,194,227,286,507,833,980 'tabl':440,494 'take':727 'target':87,832 'technic':314,459 'techniqu':589 'term':621 'test':7,25,59 'text':147 'textual':819 'tool':125,137,154,201,237,242,379,860,970 'tool-cal':200,236 'topic-agent-skills' 'topic-ai-evaluation' 'topic-chatgpt' 'topic-claude' 'topic-cursor' 'topic-llm' 'topic-mcp' 'topic-truesight' 'topic-vscode' 'topic-windsurf' 'toward':487 'trace':24,44,68,76,111,197,214,220,332,435,593,665,673,681,785,797,808,817,824,837,893,901,933,955,1004,1012,1031,1070 'truesight':366,965 'tupl':16,109,411,417,424,437,482,523,529,535,539,577,600,606,623,813,823,1029,1059 'tuple-to-trac':1028 'turn':206,248 'two':582,766,1040 'two-step':581,1039 'type':181,189,228,287 'unavail':78 'underrepres':552 'uniform':484 'uniqu':908 'unnatur':802 'unrealist':1063 'unus':648,712 'upgrad':636,691,726,733 'upload':355,858,923,963,968 'use':12,18,49,56,135,375,540,1065 'user':21,57,79,102,231,240,264,414,497,520,531,567,687,936,976,1055 'valid':521,1056 'valu':298,421,430,475,548,554,618,806,1053 'valuabl':1000 'variat':53,273,777,1017 'varieti':1050 'via':368,966 'want':80 'weight':486 'whether':635 'widget':131 'without':1013,1051 'word':789 'write':607,932,960","prices":[{"id":"d22201c5-520f-4940-9773-4679c3ca59e3","listingId":"1ab4e674-74db-46dd-bb99-468507c27392","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"Goodeye-Labs","category":"truesight-mcp-skills","install_from":"skills.sh"},"createdAt":"2026-05-18T13:22:57.435Z"}],"sources":[{"listingId":"1ab4e674-74db-46dd-bb99-468507c27392","source":"github","sourceId":"Goodeye-Labs/truesight-mcp-skills/generate-synthetic-data","sourceUrl":"https://github.com/Goodeye-Labs/truesight-mcp-skills/tree/main/skills/generate-synthetic-data","isPrimary":false,"firstSeenAt":"2026-05-18T13:22:57.435Z","lastSeenAt":"2026-05-18T13:22:57.435Z"}],"details":{"listingId":"1ab4e674-74db-46dd-bb99-468507c27392","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"Goodeye-Labs","slug":"generate-synthetic-data","github":{"repo":"Goodeye-Labs/truesight-mcp-skills","stars":6,"topics":["agent-skills","ai-evaluation","chatgpt","claude","cursor","llm","mcp","truesight","vscode","windsurf"],"license":"mit","html_url":"https://github.com/Goodeye-Labs/truesight-mcp-skills","pushed_at":"2026-03-26T06:15:56Z","description":"Agent skills for the Truesight MCP. Step-by-step workflow playbooks for scoring inputs, building live evaluations, error analysis, and the review loop. Works with Claude Code, Cursor, ChatGPT, VS Code, Windsurf, and any client that supports the agent skills standard.","skill_md_sha":"b14064c09af39b664727d97cb65d29f8d28fe2ca","skill_md_path":"skills/generate-synthetic-data/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/Goodeye-Labs/truesight-mcp-skills/tree/main/skills/generate-synthetic-data"},"layout":"multi","source":"github","category":"truesight-mcp-skills","frontmatter":{"name":"generate-synthetic-data","description":"Generate synthetic test data for LLM evaluations using dimension-based tuple expansion. Use when the user needs synthetic traces, test cases, eval datasets, or when create-evaluation needs synthetic fallback data."},"skills_sh_url":"https://skills.sh/Goodeye-Labs/truesight-mcp-skills/generate-synthetic-data"},"updatedAt":"2026-05-18T13:22:57.435Z"}}