{"id":"62cf370f-ad73-4c52-995a-4dd06db1870b","shortId":"GZraXA","kind":"skill","title":"recipe-eval-prompt","tagline":"Compares original and optimized prompts by parallel execution in git worktrees. Use when evaluating prompt improvement effects or learning prompt engineering through concrete examples.","description":"# Prompt Evaluation\n\n## Orchestrator Definition\n\n**Purpose**: Provide accurate feedback on prompt optimization effects, enabling users to learn effective prompting through concrete comparison results.\n\n**Core Identity**: \"I route information between specialized agents. I pass user input to analyzers. I present agent outputs to users.\"\n\n**Pass-through Principle**: User requests flow directly to agents. Agent outputs flow directly to users. Both prompts execute under identical conditions.\n\n**Execution Protocol**:\n1. **Delegate all work** to sub-agents (orchestrator role only)\n2. **Register all steps via TaskCreate** before starting, update status via TaskUpdate upon completion\n\n## Phase Boundaries\n\nNo user confirmation required between phases unless explicitly requested.\nEach phase must complete all required outputs before proceeding.\n\n## Input\n\nThe user provides a natural language request. Pass it directly to prompt-analyzer.\n\n**Exception**: If the request lacks any identifiable target (no file, function, or scope mentioned at all), ask ONE question to establish scope, then pass through.\n\n**Extended timeout**: If the user mentions needing more time, use up to 1800 seconds (default: 300 seconds)\n\n## Execution Flow\n\n**Task Registration**: Register execution steps via TaskCreate and proceed systematically\n\n### Step 1. Run Required Skills\n\nRun worktree-execution skill.\n\n### Step 2. Prompt Analysis and Optimization\n\n**Invoke**: prompt-analyzer agent\n\nInput:\n- User's exact request text\n\nOutput:\n- Analysis results (detected patterns)\n- Optimized prompt\n- Applied optimizations list\n\n**Quality Gate**:\n- [ ] Input contains user's request text only\n- [ ] Output presented to user matches agent's output\n\n### Step 3. Execution Environment Setup\n\nExecute environment setup per worktree-execution skill \"Creation\" section.\n\n### Step 4. Parallel Execution\n\n**Invoke**: Two prompt-executor agents simultaneously (single message, parallel Task calls)\n\n```yaml\nSubagent 1:\n  agent: prompt-executor\n  working_directory: {worktree_original_path}\n  prompt: {original_request}\n\nSubagent 2:\n  agent: prompt-executor\n  working_directory: {worktree_optimized_path}\n  prompt: {optimized_request}\n```\n\nEach subagent executes the prompt as a development task within its isolated worktree.\n\n**CRITICAL**: Both Task tool calls MUST be in the same message to achieve true parallel execution.\n\n### Step 5. Environment Cleanup\n\nExecute worktree cleanup per worktree-execution skill \"Cleanup\" section.\n\n### Step 6. Report Generation\n\n**Invoke**: report-generator agent\n\nInput:\n- Original and optimized prompts\n- Execution results from both subagents\n- Applied optimizations list\n\nOutput:\n- Comparison report (markdown)\n- Improvement classification (structural / context addition / expressive / variance)\n\n**Quality Gate**:\n- [ ] Output presented to user matches agent's output\n\n### Step 7. Retrospective\n\n**Trigger**: Report generation completes\n\n**Action**: Ask user for feedback on comparison results, then delegate to knowledge-optimizer agent\n\n## Improvement Classification\n\nApply the execution quality criteria from the prompt-optimization skill.\n\n| Classification | Definition | Interpretation |\n|---------------|------------|----------------|\n| **Structural** | Prompt structure, clarity, specificity improvements | Prompt writing technique |\n| **Context Addition** | Project-specific information added from codebase investigation | Information advantage |\n| **Expressive** | Different phrasing, equivalent substance | Neutral |\n| **Variance** | Within LLM probabilistic variance | Original prompt sufficient |\n\n**Key Principle**: Distinguish between prompt writing improvements (Structural) and information additions (Context Addition).\n\n## Final Output to User\n\nPresent report-generator's complete output to user.\nOptimized prompt must appear in full. This is the core learning value of the report.\n\nThe report includes (defined in report-generator):\n- Input Prompts (original and optimized full text)\n- Optimizations Applied\n- Execution Results\n- Comparison Analysis\n- Learning Points\n\n## Error Handling\n\n| Scenario | Behavior |\n|----------|----------|\n| One subagent fails | Continue with successful result, report as \"partial\" |\n| Both subagents fail | Report full failure with diagnostics |\n| Timeout | Terminate, capture partial results, cleanup |\n| Worktree creation fails | Report git error, suggest checking repository state |\n\n## Prerequisites\n\n- Git repository (git 2.5+ for worktree support)\n- Claude Code subagent execution permissions\n- Sufficient disk space for worktree copies\n\n## Usage Examples\n\n```\n/recipe-eval-prompt\nAdd error handling to generateResponse in geminiService.ts. Handle 429, timeout, and invalid responses.\n```\n\n```\n/recipe-eval-prompt\nGenerate code following this skill: .claude/skills/my-skill/SKILL.md\n```\n\nFor complex tasks:\n```\n/recipe-eval-prompt\nRefactor the message pipeline for readability. This may take a while.\n```","tags":["recipe","eval","prompt","rashomon","shinpr","agent-skills","ai-tools","claude-code","claude-code-plugin","developer-tools","evaluation","llm"],"capabilities":["skill","source-shinpr","skill-recipe-eval-prompt","topic-agent-skills","topic-ai-tools","topic-claude-code","topic-claude-code-plugin","topic-developer-tools","topic-evaluation","topic-llm","topic-prompt-engineering","topic-prompt-evaluation","topic-prompt-optimization","topic-skills"],"categories":["rashomon"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/shinpr/rashomon/recipe-eval-prompt","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add shinpr/rashomon","source_repo":"https://github.com/shinpr/rashomon","install_from":"skills.sh"}},"qualityScore":"0.454","qualityRationale":"deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 9 github stars · SKILL.md body (4,961 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-04-24T07:03:39.702Z","embedding":null,"createdAt":"2026-04-23T13:04:21.279Z","updatedAt":"2026-04-24T07:03:39.702Z","lastSeenAt":"2026-04-24T07:03:39.702Z","tsv":"'/recipe-eval-prompt':605,619,629 '1':95,210,296 '1800':192 '2':106,220,310 '2.5':588 '3':264 '300':195 '4':279 '429':614 '5':353 '6':367 '7':410 'accur':35 'achiev':348 'action':416 'ad':462 'add':606 'addit':396,457,492,494 'advantag':467 'agent':58,67,80,81,102,229,260,287,297,311,374,406,430 'analysi':222,237,543 'analyz':64,154,228 'appear':511 'appli':243,385,433,539 'ask':171,417 'behavior':549 'boundari':121 'call':293,340 'captur':570 'check':581 'clariti':450 'classif':393,432,444 'claud':592 'claude/skills/my-skill/skill.md':625 'cleanup':355,358,364,573 'code':593,621 'codebas':464 'compar':5 'comparison':49,389,422,542 'complet':119,134,415,504 'complex':627 'concret':27,48 'condit':92 'confirm':124 'contain':249 'context':395,456,493 'continu':553 'copi':602 'core':51,517 'creation':276,575 'criteria':437 'critic':336 'default':194 'defin':526 'definit':32,445 'deleg':96,425 'detect':239 'develop':330 'diagnost':567 'differ':469 'direct':78,84,150 'directori':302,316 'disk':598 'distinguish':484 'effect':21,40,45 'enabl':41 'engin':25 'environ':266,269,354 'equival':471 'error':546,579,607 'establish':175 'eval':3 'evalu':18,30 'exact':233 'exampl':28,604 'except':155 'execut':12,89,93,197,202,217,265,268,274,281,325,351,356,362,380,435,540,595 'executor':286,300,314 'explicit':129 'express':397,468 'extend':180 'fail':552,562,576 'failur':565 'feedback':36,420 'file':164 'final':495 'flow':77,83,198 'follow':622 'full':513,536,564 'function':165 'gate':247,400 'geminiservice.ts':612 'generat':369,373,414,502,530,620 'generaterespons':610 'git':14,578,585,587 'handl':547,608,613 'ident':52,91 'identifi':161 'improv':20,392,431,452,488 'includ':525 'inform':55,461,466,491 'input':62,140,230,248,375,531 'interpret':446 'invalid':617 'investig':465 'invok':225,282,370 'isol':334 'key':482 'knowledg':428 'knowledge-optim':427 'lack':159 'languag':146 'learn':23,44,518,544 'list':245,387 'llm':476 'markdown':391 'match':259,405 'may':637 'mention':168,185 'messag':290,346,632 'must':133,341,510 'natur':145 'need':186 'neutral':473 'one':172,550 'optim':8,39,224,241,244,318,321,378,386,429,442,508,535,538 'orchestr':31,103 'origin':6,304,307,376,479,533 'output':68,82,137,236,255,262,388,401,408,496,505 'parallel':11,280,291,350 'partial':559,571 'pass':60,72,148,178 'pass-through':71 'path':305,319 'pattern':240 'per':271,359 'permiss':596 'phase':120,127,132 'phrase':470 'pipelin':633 'point':545 'prerequisit':584 'present':66,256,402,499 'principl':74,483 'probabilist':477 'proceed':139,207 'project':459 'project-specif':458 'prompt':4,9,19,24,29,38,46,88,153,221,227,242,285,299,306,313,320,327,379,441,448,453,480,486,509,532 'prompt-analyz':152,226 'prompt-executor':284,298,312 'prompt-optim':440 'protocol':94 'provid':34,143 'purpos':33 'qualiti':246,399,436 'question':173 'readabl':635 'recip':2 'recipe-eval-prompt':1 'refactor':630 'regist':107,201 'registr':200 'report':368,372,390,413,501,522,524,529,557,563,577 'report-gener':371,500,528 'repositori':582,586 'request':76,130,147,158,234,252,308,322 'requir':125,136,212 'respons':618 'result':50,238,381,423,541,556,572 'retrospect':411 'role':104 'rout':54 'run':211,214 'scenario':548 'scope':167,176 'second':193,196 'section':277,365 'setup':267,270 'simultan':288 'singl':289 'skill':213,218,275,363,443,624 'skill-recipe-eval-prompt' 'source-shinpr' 'space':599 'special':57 'specif':451,460 'start':113 'state':583 'status':115 'step':109,203,209,219,263,278,352,366,409 'structur':394,447,449,489 'sub':101 'sub-ag':100 'subag':295,309,324,384,551,561,594 'substanc':472 'success':555 'suffici':481,597 'suggest':580 'support':591 'systemat':208 'take':638 'target':162 'task':199,292,331,338,628 'taskcreat':111,205 'taskupd':117 'techniqu':455 'termin':569 'text':235,253,537 'time':188 'timeout':181,568,615 'tool':339 'topic-agent-skills' 'topic-ai-tools' 'topic-claude-code' 'topic-claude-code-plugin' 'topic-developer-tools' 'topic-evaluation' 'topic-llm' 'topic-prompt-engineering' 'topic-prompt-evaluation' 'topic-prompt-optimization' 'topic-skills' 'trigger':412 'true':349 'two':283 'unless':128 'updat':114 'upon':118 'usag':603 'use':16,189 'user':42,61,70,75,86,123,142,184,231,250,258,404,418,498,507 'valu':519 'varianc':398,474,478 'via':110,116,204 'within':332,475 'work':98,301,315 'worktre':15,216,273,303,317,335,357,361,574,590,601 'worktree-execut':215,272,360 'write':454,487 'yaml':294","prices":[{"id":"77b76a0b-9227-4ede-98c3-e82cd7809b48","listingId":"62cf370f-ad73-4c52-995a-4dd06db1870b","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"shinpr","category":"rashomon","install_from":"skills.sh"},"createdAt":"2026-04-23T13:04:21.279Z"}],"sources":[{"listingId":"62cf370f-ad73-4c52-995a-4dd06db1870b","source":"github","sourceId":"shinpr/rashomon/recipe-eval-prompt","sourceUrl":"https://github.com/shinpr/rashomon/tree/main/skills/recipe-eval-prompt","isPrimary":false,"firstSeenAt":"2026-04-23T13:04:21.279Z","lastSeenAt":"2026-04-24T07:03:39.702Z"}],"details":{"listingId":"62cf370f-ad73-4c52-995a-4dd06db1870b","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"shinpr","slug":"recipe-eval-prompt","github":{"repo":"shinpr/rashomon","stars":9,"topics":["agent-skills","ai-tools","claude-code","claude-code-plugin","developer-tools","evaluation","llm","prompt-engineering","prompt-evaluation","prompt-optimization","skills"],"license":"mit","html_url":"https://github.com/shinpr/rashomon","pushed_at":"2026-04-04T07:32:14Z","description":"Measure prompt and skill improvements with blind A/B comparison.","skill_md_sha":"de415b887ff70742dcd2778954e9594f654425ad","skill_md_path":"skills/recipe-eval-prompt/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/shinpr/rashomon/tree/main/skills/recipe-eval-prompt"},"layout":"multi","source":"github","category":"rashomon","frontmatter":{"name":"recipe-eval-prompt","description":"Compares original and optimized prompts by parallel execution in git worktrees. Use when evaluating prompt improvement effects or learning prompt engineering through concrete examples."},"skills_sh_url":"https://skills.sh/shinpr/rashomon/recipe-eval-prompt"},"updatedAt":"2026-04-24T07:03:39.702Z"}}