{"id":"83e0d71f-3bc5-490d-9ebd-47e4dd765917","shortId":"dCfEUJ","kind":"skill","title":"Arize Prompt Optimization","tagline":"Awesome Copilot skill by Github","description":"# Arize Prompt Optimization Skill\n\n## Concepts\n\n### Where Prompts Live in Trace Data\n\nLLM applications emit spans following OpenInference semantic conventions. Prompts are stored in different span attributes depending on the span kind and instrumentation:\n\n| Column | What it contains | When to use |\n|--------|-----------------|-------------|\n| `attributes.llm.input_messages` | Structured chat messages (system, user, assistant, tool) in role-based format | **Primary source** for chat-based LLM prompts |\n| `attributes.llm.input_messages.roles` | Array of roles: `system`, `user`, `assistant`, `tool` | Extract individual message roles |\n| `attributes.llm.input_messages.contents` | Array of message content strings | Extract message text |\n| `attributes.input.value` | Serialized prompt or user question (generic, all span kinds) | Fallback when structured messages are not available |\n| `attributes.llm.prompt_template.template` | Template with `{variable}` placeholders (e.g., `\"Answer {question} using {context}\"`) | When the app uses prompt templates |\n| `attributes.llm.prompt_template.variables` | Template variable values (JSON object) | See what values were substituted into the template |\n| `attributes.output.value` | Model response text | See what the LLM produced |\n| `attributes.llm.output_messages` | Structured model output (including tool calls) | Inspect tool-calling responses |\n\n### Finding Prompts by Span Kind\n\n- **LLM span** (`attributes.openinference.span.kind = 'LLM'`): Check `attributes.llm.input_messages` for structured chat messages, OR `attributes.input.value` for a serialized prompt. Check `attributes.llm.prompt_template.template` for the template.\n- **Chain/Agent span**: `attributes.input.value` contains the user's question. The actual LLM prompt lives on **child LLM spans** -- navigate down the trace tree.\n- **Tool span**: `attributes.input.value` has tool input, `attributes.output.value` has tool result. Not typically where prompts live.\n\n### Performance Signal Columns\n\nThese columns carry the feedback data used for optimization:\n\n| Column pattern | Source | What it tells you |\n|---------------|--------|-------------------|\n| `annotation.<name>.label` | Human reviewers | Categorical grade (e.g., `correct`, `incorrect`, `partial`) |\n| `annotation.<name>.score` | Human reviewers | Numeric quality score (e.g., 0.0 - 1.0) |\n| `annotation.<name>.text` | Human reviewers | Freeform explanation of the grade |\n| `eval.<name>.label` | LLM-as-judge evals | Automated categorical assessment |\n| `eval.<name>.score` | LLM-as-judge evals | Automated numeric score |\n| `eval.<name>.explanation` | LLM-as-judge evals | Why the eval gave that score -- **most valuable for optimization** |\n| `attributes.input.value` | Trace data | What went into the LLM |\n| `attributes.output.value` | Trace data | What the LLM produced |\n| `{experiment_name}.output` | Experiment runs | Output from a specific experiment |\n\n## Prerequisites\n\nProceed directly with the task — run the `ax` command you need. Do NOT check versions, env vars, or profiles upfront.\n\nIf an `ax` command fails, troubleshoot based on the error:\n- `command not found` or version error → see references/ax-setup.md\n- `401 Unauthorized` / missing API key → run `ax profiles show` to inspect the current profile. If the profile is missing or the API key is wrong: check `.env` for `ARIZE_API_KEY` and use it to create/update the profile via references/ax-profiles.md. If `.env` has no key either, ask the user for their Arize API key (https://app.arize.com/admin > API Keys)\n- Space ID unknown → check `.env` for `ARIZE_SPACE_ID`, or run `ax spaces list -o json`, or ask the user\n- Project unclear → check `.env` for `ARIZE_DEFAULT_PROJECT`, or ask, or run `ax projects list -o json --limit 100` and present as selectable options\n- LLM provider call fails (missing OPENAI_API_KEY / ANTHROPIC_API_KEY) → check `.env`, load if present, otherwise ask the user\n\n## Phase 1: Extract the Current Prompt\n\n### Find LLM spans containing prompts\n\n```bash\n# List LLM spans (where prompts live)\nax spans list PROJECT_ID --filter \"attributes.openinference.span.kind = 'LLM'\" --limit 10\n\n# Filter by model\nax spans list PROJECT_ID --filter \"attributes.llm.model_name = 'gpt-4o'\" --limit 10\n\n# Filter by span name (e.g., a specific LLM call)\nax spans list PROJECT_ID --filter \"name = 'ChatCompletion'\" --limit 10\n```\n\n### Export a trace to inspect prompt structure\n\n```bash\n# Export all spans in a trace\nax spans export --trace-id TRACE_ID --project PROJECT_ID\n\n# Export a single span\nax spans export --span-id SPAN_ID --project PROJECT_ID\n```\n\n### Extract prompts from exported JSON\n\n```bash\n# Extract structured chat messages (system + user + assistant)\njq '.[0] | {\n  messages: .attributes.llm.input_messages,\n  model: .attributes.llm.model_name\n}' trace_*/spans.json\n\n# Extract the system prompt specifically\njq '[.[] | select(.attributes.llm.input_messages.roles[]? == \"system\")] | .[0].attributes.llm.input_messages' trace_*/spans.json\n\n# Extract prompt template and variables\njq '.[0].attributes.llm.prompt_template' trace_*/spans.json\n\n# Extract from input.value (fallback for non-structured prompts)\njq '.[0].attributes.input.value' trace_*/spans.json\n```\n\n### Reconstruct the prompt as messages\n\nOnce you have the span data, reconstruct the prompt as a messages array:\n\n```json\n[\n  {\"role\": \"system\", \"content\": \"You are a helpful assistant that...\"},\n  {\"role\": \"user\", \"content\": \"Given {input}, answer the question: {question}\"}\n]\n```\n\nIf the span has `attributes.llm.prompt_template.template`, the prompt uses variables. Preserve these placeholders (`{variable}` or `{{variable}}`) -- they are substituted at runtime.\n\n## Phase 2: Gather Performance Data\n\n### From traces (production feedback)\n\n```bash\n# Find error spans -- these indicate prompt failures\nax spans list PROJECT_ID \\\n  --filter \"status_code = 'ERROR' AND attributes.openinference.span.kind = 'LLM'\" \\\n  --limit 20\n\n# Find spans with low eval scores\nax spans list PROJECT_ID \\\n  --filter \"annotation.correctness.label = 'incorrect'\" \\\n  --limit 20\n\n# Find spans with high latency (may indicate overly complex prompts)\nax spans list PROJECT_ID \\\n  --filter \"attributes.openinference.span.kind = 'LLM' AND latency_ms > 10000\" \\\n  --limit 20\n\n# Export error traces for detailed inspection\nax spans export --trace-id TRACE_ID --project PROJECT_ID\n```\n\n### From datasets and experiments\n\n```bash\n# Export a dataset (ground truth examples)\nax datasets export DATASET_ID\n# -> dataset_*/examples.json\n\n# Export experiment results (what the LLM produced)\nax experiments export EXPERIMENT_ID\n# -> experiment_*/runs.json\n```\n\n### Merge dataset + experiment for analysis\n\nJoin the two files by `example_id` to see inputs alongside outputs and evaluations:\n\n```bash\n# Count examples and runs\njq 'length' dataset_*/examples.json\njq 'length' experiment_*/runs.json\n\n# View a single joined record\njq -s '\n  .[0] as $dataset |\n  .[1][0] as $run |\n  ($dataset[] | select(.id == $run.example_id)) as $example |\n  {\n    input: $example,\n    output: $run.output,\n    evaluations: $run.evaluations\n  }\n' dataset_*/examples.json experiment_*/runs.json\n\n# Find failed examples (where eval score < threshold)\njq '[.[] | select(.evaluations.correctness.score < 0.5)]' experiment_*/runs.json\n```\n\n### Identify what to optimize\n\nLook for patterns across failures:\n\n1. **Compare outputs to ground truth**: Where does the LLM output differ from expected?\n2. **Read eval explanations**: `eval.*.explanation` tells you WHY something failed\n3. **Check annotation text**: Human feedback describes specific issues\n4. **Look for verbosity mismatches**: If outputs are too long/short vs ground truth\n5. **Check format compliance**: Are outputs in the expected format?\n\n## Phase 3: Optimize the Prompt\n\n### The Optimization Meta-Prompt\n\nUse this template to generate an improved version of the prompt. Fill in the three placeholders and send it to your LLM (GPT-4o, Claude, etc.):\n\n````\nYou are an expert in prompt optimization. Given the original baseline prompt\nand the associated performance data (inputs, outputs, evaluation labels, and\nexplanations), generate a revised version that improves results.\n\nORIGINAL BASELINE PROMPT\n========================\n\n{PASTE_ORIGINAL_PROMPT_HERE}\n\n========================\n\nPERFORMANCE DATA\n================\n\nThe following records show how the current prompt performed. Each record\nincludes the input, the LLM output, and evaluation feedback:\n\n{PASTE_RECORDS_HERE}\n\n================\n\nHOW TO USE THIS DATA\n\n1. Compare outputs: Look at what the LLM generated vs what was expected\n2. Review eval scores: Check which examples scored poorly and why\n3. Examine annotations: Human feedback shows what worked and what didn't\n4. Identify patterns: Look for common issues across multiple examples\n5. Focus on failures: The rows where the output DIFFERS from the expected\n   value are the ones that need fixing\n\nALIGNMENT STRATEGY\n\n- If outputs have extra text or reasoning not present in the ground truth,\n  remove instructions that encourage explanation or verbose reasoning\n- If outputs are missing information, add instructions to include it\n- If outputs are in the wrong format, add explicit format instructions\n- Focus on the rows where the output differs from the target -- these are\n  the failures to fix\n\nRULES\n\nMaintain Structure:\n- Use the same template variables as the current prompt ({var} or {{var}})\n- Don't change sections that are already working\n- Preserve the exact return format instructions from the original prompt\n\nAvoid Overfitting:\n- DO NOT copy examples verbatim into the prompt\n- DO NOT quote specific test data outputs exactly\n- INSTEAD: Extract the ESSENCE of what makes good vs bad outputs\n- INSTEAD: Add general guidelines and principles\n- INSTEAD: If adding few-shot examples, create SYNTHETIC examples that\n  demonstrate the principle, not real data from above\n\nGoal: Create a prompt that generalizes well to new inputs, not one that\nmemorizes the test data.\n\nOUTPUT FORMAT\n\nReturn the revised prompt as a JSON array of messages:\n\n[\n  {\"role\": \"system\", \"content\": \"...\"},\n  {\"role\": \"user\", \"content\": \"...\"}\n]\n\nAlso provide a brief reasoning section (bulleted list) explaining:\n- What problems you found\n- How the revised prompt addresses each one\n````\n\n### Preparing the performance data\n\nFormat the records as a JSON array before pasting into the template:\n\n```bash\n# From dataset + experiment: join and select relevant columns\njq -s '\n  .[0] as $ds |\n  [.[1][] | . as $run |\n    ($ds[] | select(.id == $run.example_id)) as $ex |\n    {\n      input: $ex.input,\n      expected: $ex.expected_output,\n      actual_output: $run.output,\n      eval_score: $run.evaluations.correctness.score,\n      eval_label: $run.evaluations.correctness.label,\n      eval_explanation: $run.evaluations.correctness.explanation\n    }\n  ]\n' dataset_*/examples.json experiment_*/runs.json\n\n# From exported spans: extract input/output pairs with annotations\njq '[.[] | select(.attributes.openinference.span.kind == \"LLM\") | {\n  input: .attributes.input.value,\n  output: .attributes.output.value,\n  status: .status_code,\n  model: .attributes.llm.model_name\n}]' trace_*/spans.json\n```\n\n### Applying the revised prompt\n\nAfter the LLM returns the revised messages array:\n\n1. Compare the original and revised prompts side by side\n2. Verify all template variables are preserved\n3. Check that format instructions are intact\n4. Test on a few examples before full deployment\n\n## Phase 4: Iterate\n\n### The optimization loop\n\n```\n1. Extract prompt    -> Phase 1 (once)\n2. Run experiment    -> ax experiments create ...\n3. Export results    -> ax experiments export EXPERIMENT_ID\n4. Analyze failures  -> jq to find low scores\n5. Run meta-prompt   -> Phase 3 with new failure data\n6. Apply revised prompt\n7. Repeat from step 2\n```\n\n### Measure improvement\n\n```bash\n# Compare scores across experiments\n# Experiment A (baseline)\njq '[.[] | .evaluations.correctness.score] | add / length' experiment_a/runs.json\n\n# Experiment B (optimized)\njq '[.[] | .evaluations.correctness.score] | add / length' experiment_b/runs.json\n\n# Find examples that flipped from fail to pass\njq -s '\n  [.[0][] | select(.evaluations.correctness.label == \"incorrect\")] as $fails |\n  [.[1][] | select(.evaluations.correctness.label == \"correct\") |\n    select(.example_id as $id | $fails | any(.example_id == $id))\n  ] | length\n' experiment_a/runs.json experiment_b/runs.json\n```\n\n### A/B compare two prompts\n\n1. Create two experiments against the same dataset, each using a different prompt version\n2. Export both: `ax experiments export EXP_A` and `ax experiments export EXP_B`\n3. Compare average scores, failure rates, and specific example flips\n4. Check for regressions -- examples that passed with prompt A but fail with prompt B\n\n## Prompt Engineering Best Practices\n\nApply these when writing or revising prompts:\n\n| Technique | When to apply | Example |\n|-----------|--------------|---------|\n| Clear, detailed instructions | Output is vague or off-topic | \"Classify the sentiment as exactly one of: positive, negative, neutral\" |\n| Instructions at the beginning | Model ignores later instructions | Put the task description before examples |\n| Step-by-step breakdowns | Complex multi-step processes | \"First extract entities, then classify each, then summarize\" |\n| Specific personas | Need consistent style/tone | \"You are a senior financial analyst writing for institutional investors\" |\n| Delimiter tokens | Sections blend together | Use `---`, `###`, or XML tags to separate input from instructions |\n| Few-shot examples | Output format needs clarification | Show 2-3 synthetic input/output pairs |\n| Output length specifications | Responses are too long or short | \"Respond in exactly 2-3 sentences\" |\n| Reasoning instructions | Accuracy is critical | \"Think step by step before answering\" |\n| \"I don't know\" guidelines | Hallucination is a risk | \"If the answer is not in the provided context, say 'I don't have enough information'\" |\n\n### Variable preservation\n\nWhen optimizing prompts that use template variables:\n\n- **Single braces** (`{variable}`): Python f-string / Jinja style. Most common in Arize.\n- **Double braces** (`{{variable}}`): Mustache style. Used when the framework requires it.\n- Never add or remove variable placeholders during optimization\n- Never rename variables -- the runtime substitution depends on exact names\n- If adding few-shot examples, use literal values, not variable placeholders\n\n## Workflows\n\n### Optimize a prompt from a failing trace\n\n1. Find failing traces:\n   ```bash\n   ax traces list PROJECT_ID --filter \"status_code = 'ERROR'\" --limit 5\n   ```\n2. Export the trace:\n   ```bash\n   ax spans export --trace-id TRACE_ID --project PROJECT_ID\n   ```\n3. Extract the prompt from the LLM span:\n   ```bash\n   jq '[.[] | select(.attributes.openinference.span.kind == \"LLM\")][0] | {\n     messages: .attributes.llm.input_messages,\n     template: .attributes.llm.prompt_template,\n     output: .attributes.output.value,\n     error: .attributes.exception.message\n   }' trace_*/spans.json\n   ```\n4. Identify what failed from the error message or output\n5. Fill in the optimization meta-prompt (Phase 3) with the prompt and error context\n6. Apply the revised prompt\n\n### Optimize using a dataset and experiment\n\n1. Find the dataset and experiment:\n   ```bash\n   ax datasets list\n   ax experiments list --dataset-id DATASET_ID\n   ```\n2. Export both:\n   ```bash\n   ax datasets export DATASET_ID\n   ax experiments export EXPERIMENT_ID\n   ```\n3. Prepare the joined data for the meta-prompt\n4. Run the optimization meta-prompt\n5. Create a new experiment with the revised prompt to measure improvement\n\n### Debug a prompt that produces wrong format\n\n1. Export spans where the output format is wrong:\n   ```bash\n   ax spans list PROJECT_ID \\\n     --filter \"attributes.openinference.span.kind = 'LLM' AND annotation.format.label = 'incorrect'\" \\\n     --limit 10 -o json > bad_format.json\n   ```\n2. Look at what the LLM is producing vs what was expected\n3. Add explicit format instructions to the prompt (JSON schema, examples, delimiters)\n4. Common fix: add a few-shot example showing the exact desired output format\n\n### Reduce hallucination in a RAG prompt\n\n1. Find traces where the model hallucinated:\n   ```bash\n   ax spans list PROJECT_ID \\\n     --filter \"annotation.faithfulness.label = 'unfaithful'\" \\\n     --limit 20\n   ```\n2. Export and inspect the retriever + LLM spans together:\n   ```bash\n   ax spans export --trace-id TRACE_ID --project PROJECT_ID\n   jq '[.[] | {kind: .attributes.openinference.span.kind, name, input: .attributes.input.value, output: .attributes.output.value}]' trace_*/spans.json\n   ```\n3. Check if the retrieved context actually contained the answer\n4. Add grounding instructions to the system prompt: \"Only use information from the provided context. If the answer is not in the context, say so.\"\n\n## Troubleshooting\n\n| Problem | Solution |\n|---------|----------|\n| `ax: command not found` | See references/ax-setup.md |\n| `No profile found` | No profile is configured. See references/ax-profiles.md to create one. |\n| No `input_messages` on span | Check span kind -- Chain/Agent spans store prompts on child LLM spans, not on themselves |\n| Prompt template is `null` | Not all instrumentations emit `prompt_template`. Use `input_messages` or `input.value` instead |\n| Variables lost after optimization | Verify the revised prompt preserves all `{var}` placeholders from the original |\n| Optimization makes things worse | Check for overfitting -- the meta-prompt may have memorized test data. Ensure few-shot examples are synthetic |\n| No eval/annotation columns | Run evaluations first (via Arize UI or SDK), then re-export |\n| Experiment output column not found | The column name is `{experiment_name}.output` -- check exact experiment name via `ax experiments get` |\n| `jq` errors on span JSON | Ensure you're targeting the correct file path (e.g., `trace_*/spans.json`) |","tags":["arize","prompt","optimization","awesome","copilot","github"],"capabilities":["skill","source-github","category-awesome-copilot"],"categories":["awesome-copilot"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/github/awesome-copilot/arize-prompt-optimization","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"install_from":"skills.sh"}},"qualityScore":"0.300","qualityRationale":"deterministic score 0.30 from registry signals: · indexed on skills.sh · published under github/awesome-copilot","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill:v1","enrichmentVersion":1,"enrichedAt":"2026-04-22T03:40:37.822Z","embedding":null,"createdAt":"2026-04-18T20:35:58.926Z","updatedAt":"2026-04-22T03:40:37.822Z","lastSeenAt":"2026-04-22T03:40:37.822Z","tsv":"'-3':1762,1779 '/admin':430 '/examples.json':824,866,899,1412 '/runs.json':838,870,901,914,1414 '/spans.json':622,636,647,661,1438,1945,2170,2350 '0':614,632,643,658,878,882,1381,1573,1933 '0.0':262 '0.5':912 '1':498,881,924,1085,1384,1451,1490,1494,1579,1602,1888,1983,2051,2122 '1.0':263 '10':524,540,559,2073 '100':471 '10000':787 '2':720,938,1098,1461,1496,1537,1616,1761,1778,1904,2001,2077,2140 '20':749,765,789,2139 '3':949,982,1109,1468,1502,1524,1630,1920,1965,2015,2089,2171 '4':958,1121,1475,1485,1510,1640,1946,2025,2101,2181 '401':374 '4o':538,1015 '5':971,1131,1518,1903,1956,2032 '6':1529,1972 '7':1533 'a/b':1598 'a/runs.json':1553,1595 'accuraci':1783 'across':922,1128,1543 'actual':197,1399,2177 'ad':1282,1869 'add':1179,1191,1275,1550,1559,1851,2090,2104,2182 'address':1351 'align':1151 'alongsid':854 'alreadi':1233 'also':1334 'analysi':843 'analyst':1733 'analyz':1511 'annot':244,254,264,951,1111,1422 'annotation.correctness.label':762 'annotation.faithfulness.label':2136 'annotation.format.label':2070 'answer':115,695,1791,1803,2180,2198 'anthrop':485 'api':377,395,403,426,431,483,486 'app':121 'app.arize.com':429 'app.arize.com/admin':428 'appli':1439,1530,1659,1669,1973 'applic':21 'ariz':1,9,402,425,439,458,1838,2307 'array':72,84,679,1325,1364,1450 'ask':420,450,462,494 'assess':282 'assist':56,77,612,688 'associ':1032 'attribut':34 'attributes.exception.message':1943 'attributes.input.value':92,178,190,212,310,659,1428,2166 'attributes.llm.input':49,171,616,633,1935 'attributes.llm.input_messages.contents':83 'attributes.llm.input_messages.roles':71,630 'attributes.llm.model':534,619,1435 'attributes.llm.output':148 'attributes.llm.prompt':644,1938 'attributes.llm.prompt_template.template':109,184,703 'attributes.llm.prompt_template.variables':125 'attributes.openinference.span.kind':168,521,746,782,1425,1931,2067,2163 'attributes.output.value':139,216,318,1430,1941,2168 'autom':280,290 'avail':108 'averag':1632 'avoid':1245 'awesom':4 'ax':343,358,380,444,465,515,528,550,574,589,736,756,776,796,818,832,1499,1505,1619,1625,1893,1909,1990,1993,2005,2010,2061,2130,2150,2209,2332 'b':1555,1629,1654 'b/runs.json':1562,1597 'bad':1272 'bad_format.json':2076 'base':61,68,362 'baselin':1028,1049,1547 'bash':508,567,605,728,811,858,1370,1540,1892,1908,1928,1989,2004,2060,2129,2149 'begin':1694 'best':1657 'blend':1741 'brace':1827,1840 'breakdown':1709 'brief':1337 'bullet':1340 'call':155,159,479,549 'carri':230 'categor':248,281 'category-awesome-copilot' 'chain/agent':188,2235 'chang':1229 'chat':52,67,175,608 'chat-bas':66 'chatcomplet':557 'check':170,183,349,399,436,455,488,950,972,1102,1469,1641,2172,2232,2281,2327 'child':202,2240 'clarif':1759 'classifi':1681,1719 'claud':1016 'clear':1671 'code':743,1433,1900 'column':42,227,229,237,1378,2302,2317,2321 'command':344,359,366,2210 'common':1126,1836,2102 'compar':925,1086,1452,1541,1599,1631 'complex':774,1710 'complianc':974 'concept':13 'configur':2221 'consist':1726 'contain':45,191,506,2178 'content':87,683,692,1330,1333 'context':118,1809,1971,2176,2195,2203 'convent':27 'copi':1249 'copilot':5 'correct':251,1582,2345 'count':859 'creat':1287,1300,1501,1603,2033,2225 'create/update':409 'critic':1785 'current':386,501,1063,1222 'data':19,233,312,320,672,723,1034,1056,1084,1260,1296,1315,1357,1528,2019,2292 'dataset':808,814,819,821,823,840,865,880,885,898,1372,1411,1609,1980,1986,1991,1997,1999,2006,2008 'dataset-id':1996 'debug':2044 'default':459 'delimit':1738,2100 'demonstr':1291 'depend':35,1864 'deploy':1483 'describ':955 'descript':1702 'desir':2113 'detail':794,1672 'didn':1119 'differ':32,935,1140,1202,1613 'direct':337 'doubl':1839 'ds':1383,1387 'e.g':114,250,261,545,2348 'either':419 'emit':22,2253 'encourag':1169 'engin':1656 'enough':1815 'ensur':2293,2340 'entiti':1717 'env':351,400,415,437,456,489 'error':365,371,730,744,791,1901,1942,1952,1970,2336 'essenc':1266 'etc':1017 'eval':273,279,283,289,293,299,302,754,906,940,942,1100,1402,1405,1408 'eval/annotation':2301 'evalu':857,896,1037,1075,2304 'evaluations.correctness.label':1575,1581 'evaluations.correctness.score':911,1549,1558 'ex':1393 'ex.expected':1397 'ex.input':1395 'exact':1237,1262,1685,1777,1866,2112,2328 'examin':1110 'exampl':817,849,860,891,893,904,1104,1130,1250,1286,1289,1480,1564,1584,1590,1638,1644,1670,1704,1755,1873,2099,2109,2297 'exp':1622,1628 'expect':937,979,1097,1143,1396,2088 'experi':325,328,334,810,826,833,835,837,841,869,900,913,1373,1413,1498,1500,1506,1508,1544,1545,1552,1554,1561,1594,1596,1605,1620,1626,1982,1988,1994,2011,2013,2036,2315,2324,2329,2333 'expert':1021 'explain':1342 'explan':269,294,941,943,1040,1170,1409 'explicit':1192,2091 'export':560,568,576,585,591,603,790,798,812,820,825,834,1416,1503,1507,1617,1621,1627,1905,1911,2002,2007,2012,2052,2141,2152,2314 'extra':1156 'extract':79,89,499,600,606,623,637,648,1264,1418,1491,1716,1921 'f':1831 'f-string':1830 'fail':360,480,903,948,1568,1578,1588,1651,1886,1890,1949 'failur':735,923,1134,1209,1512,1527,1634 'fallback':102,651 'feedback':232,727,954,1076,1113 'few-shot':1283,1752,1870,2106,2294 'file':847,2346 'fill':1002,1957 'filter':520,525,533,541,555,741,761,781,1898,2066,2135 'financi':1732 'find':161,503,729,750,766,902,1515,1563,1889,1984,2123 'first':1715,2305 'fix':1150,1211,2103 'flip':1566,1639 'focus':1132,1195 'follow':24,1058 'format':62,973,980,1190,1193,1239,1317,1358,1471,1757,2050,2057,2092,2115 'found':368,1346,2212,2217,2319 'framework':1847 'freeform':268 'full':1482 'gather':721 'gave':303 'general':1276,1304 'generat':995,1041,1093 'generic':98 'get':2334 'github':8 'given':693,1025 'goal':1299 'good':1270 'gpt':537,1014 'gpt-4o':536,1013 'grade':249,272 'ground':815,928,969,1164,2183 'guidelin':1277,1796 'hallucin':1797,2117,2128 'help':687 'high':769 'human':246,256,266,953,1112 'id':434,441,519,532,554,579,581,584,594,596,599,740,760,780,801,803,806,822,836,850,887,889,1389,1391,1509,1585,1587,1591,1592,1897,1914,1916,1919,1998,2000,2009,2014,2065,2134,2155,2157,2160 'identifi':915,1122,1947 'ignor':1696 'improv':997,1046,1539,2043 'includ':153,1068,1182 'incorrect':252,763,1576,2071 'indic':733,772 'individu':80 'inform':1178,1816,2191 'input':215,694,853,892,1035,1070,1308,1394,1427,1749,2165,2228,2257 'input.value':650,2260 'input/output':1419,1764 'inspect':156,384,564,795,2143 'instead':1263,1274,1280,2261 'institut':1736 'instruct':1167,1180,1194,1240,1472,1673,1691,1698,1751,1782,2093,2184 'instrument':41,2252 'intact':1474 'investor':1737 'issu':957,1127 'iter':1486 'jinja':1833 'join':844,874,1374,2018 'jq':613,628,642,657,863,867,876,909,1379,1423,1513,1548,1557,1571,1929,2161,2335 'json':129,448,469,604,680,1324,1363,2075,2097,2339 'judg':278,288,298 'key':378,396,404,418,427,432,484,487 'kind':39,101,165,2162,2234 'know':1795 'label':245,274,1038,1406 'latenc':770,785 'later':1697 'length':864,868,1551,1560,1593,1767 'limit':470,523,539,558,748,764,788,1902,2072,2138 'list':446,467,509,517,530,552,738,758,778,1341,1895,1992,1995,2063,2132 'liter':1875 'live':16,200,224,514 'llm':20,69,146,166,169,198,203,276,286,296,317,323,477,504,510,522,548,747,783,830,933,1012,1072,1092,1426,1445,1926,1932,2068,2082,2146,2241 'llm-as-judg':275,285,295 'load':490 'long':1772 'long/short':967 'look':919,959,1088,1124,2078 'loop':1489 'lost':2263 'low':753,1516 'maintain':1213 'make':1269,2278 'may':771,2288 'measur':1538,2042 'memor':1312,2290 'merg':839 'messag':50,53,81,86,90,105,149,172,176,609,615,617,634,666,678,1327,1449,1934,1936,1953,2229,2258 'meta':989,1521,1962,2023,2030,2286 'meta-prompt':988,1520,1961,2022,2029,2285 'mismatch':962 'miss':376,392,481,1177 'model':140,151,527,618,1434,1695,2127 'ms':786 'multi':1712 'multi-step':1711 'multipl':1129 'mustach':1842 'name':326,535,544,556,620,1436,1867,2164,2322,2325,2330 'navig':205 'need':346,1149,1725,1758 'negat':1689 'neutral':1690 'never':1850,1858 'new':1307,1526,2035 'non':654 'non-structur':653 'null':2249 'numer':258,291 'o':447,468,2074 'object':130 'off-top':1678 'one':1147,1310,1353,1686,2226 'openai':482 'openinfer':25 'optim':3,11,236,309,918,983,987,1024,1488,1556,1820,1857,1881,1960,1977,2028,2265,2277 'option':476 'origin':1027,1048,1052,1243,1454,2276 'otherwis':493 'output':152,327,330,855,894,926,934,964,976,1036,1073,1087,1139,1154,1175,1185,1201,1261,1273,1316,1398,1400,1429,1674,1756,1766,1940,1955,2056,2114,2167,2316,2326 'over':773 'overfit':1246,2283 'pair':1420,1765 'partial':253 'pass':1570,1646 'past':1051,1077,1366 'path':2347 'pattern':238,921,1123 'perform':225,722,1033,1055,1065,1356 'persona':1724 'phase':497,719,981,1484,1493,1523,1964 'placehold':113,710,1006,1855,1879,2273 'poor':1106 'posit':1688 'practic':1658 'prepar':1354,2016 'prerequisit':335 'present':473,492,1161 'preserv':708,1235,1467,1818,2270 'primari':63 'principl':1279,1293 'problem':1344,2207 'proceed':336 'process':1714 'produc':147,324,831,2048,2084 'product':726 'profil':354,381,387,390,411,2216,2219 'project':453,460,466,518,531,553,582,583,597,598,739,759,779,804,805,1896,1917,1918,2064,2133,2158,2159 'prompt':2,10,15,28,70,94,123,162,182,199,223,502,507,513,565,601,626,638,656,664,675,705,734,775,985,990,1001,1023,1029,1050,1053,1064,1223,1244,1254,1302,1321,1350,1442,1457,1492,1522,1532,1601,1614,1648,1653,1655,1665,1821,1883,1923,1963,1968,1976,2024,2031,2040,2046,2096,2121,2188,2238,2246,2254,2269,2287 'provid':478,1335,1808,2194 'put':1699 'python':1829 'qualiti':259 'question':97,116,195,697,698 'quot':1257 'rag':2120 'rate':1635 're':2313,2342 're-export':2312 'read':939 'real':1295 'reason':1159,1173,1338,1781 'reconstruct':662,673 'record':875,1059,1067,1078,1360 'reduc':2116 'references/ax-profiles.md':413,2223 'references/ax-setup.md':373,2214 'regress':1643 'relev':1377 'remov':1166,1853 'renam':1859 'repeat':1534 'requir':1848 'respond':1775 'respons':141,160,1769 'result':219,827,1047,1504 'retriev':2145,2175 'return':1238,1318,1446 'review':247,257,267,1099 'revis':1043,1320,1349,1441,1448,1456,1531,1664,1975,2039,2268 'risk':1800 'role':60,74,82,681,690,1328,1331 'role-bas':59 'row':1136,1198 'rule':1212 'run':329,341,379,443,464,862,884,1386,1497,1519,2026,2303 'run.evaluations':897 'run.evaluations.correctness.explanation':1410 'run.evaluations.correctness.label':1407 'run.evaluations.correctness.score':1404 'run.example':888,1390 'run.output':895,1401 'runtim':718,1862 'say':1810,2204 'schema':2098 'score':255,260,284,292,305,755,907,1101,1105,1403,1517,1542,1633 'sdk':2310 'section':1230,1339,1740 'see':131,143,372,852,2213,2222 'select':475,629,886,910,1376,1388,1424,1574,1580,1583,1930 'semant':26 'send':1008 'senior':1731 'sentenc':1780 'sentiment':1683 'separ':1748 'serial':93,181 'short':1774 'shot':1285,1754,1872,2108,2296 'show':382,1060,1114,1760,2110 'side':1458,1460 'signal':226 'singl':587,873,1826 'skill':6,12 'solut':2208 'someth':947 'sourc':64,239 'source-github' 'space':433,440,445 'span':23,33,38,100,164,167,189,204,211,505,511,516,529,543,551,570,575,588,590,593,595,671,701,731,737,751,757,767,777,797,1417,1910,1927,2053,2062,2131,2147,2151,2231,2233,2236,2242,2338 'span-id':592 'specif':333,547,627,956,1258,1637,1723,1768 'status':742,1431,1432,1899 'step':1536,1706,1708,1713,1787,1789 'step-by-step':1705 'store':30,2237 'strategi':1152 'string':88,1832 'structur':51,104,150,174,566,607,655,1214 'style':1834,1843 'style/tone':1727 'substitut':135,716,1863 'summar':1722 'synthet':1288,1763,2299 'system':54,75,610,625,631,682,1329,2187 'tag':1746 'target':1205,2343 'task':340,1701 'techniqu':1666 'tell':242,944 'templat':110,124,126,138,187,639,645,993,1218,1369,1464,1824,1937,1939,2247,2255 'test':1259,1314,1476,2291 'text':91,142,265,952,1157 'thing':2279 'think':1786 'three':1005 'threshold':908 'togeth':1742,2148 'token':1739 'tool':57,78,154,158,210,214,218 'tool-cal':157 'topic':1680 'trace':18,208,311,319,562,573,578,580,621,635,646,660,725,792,800,802,1437,1887,1891,1894,1907,1913,1915,1944,2124,2154,2156,2169,2349 'trace-id':577,799,1912,2153 'tree':209 'troubleshoot':361,2206 'truth':816,929,970,1165 'two':846,1600,1604 'typic':221 'ui':2308 'unauthor':375 'unclear':454 'unfaith':2137 'unknown':435 'upfront':355 'use':48,117,122,234,406,706,991,1082,1215,1611,1743,1823,1844,1874,1978,2190,2256 'user':55,76,96,193,422,452,496,611,691,1332 'vagu':1676 'valu':128,133,1144,1876 'valuabl':307 'var':352,1224,1226,2272 'variabl':112,127,641,707,711,713,1219,1465,1817,1825,1828,1841,1854,1860,1878,2262 'verbatim':1251 'verbos':961,1172 'verifi':1462,2266 'version':350,370,998,1044,1615 'via':412,2306,2331 'view':871 'vs':968,1094,1271,2085 'well':1305 'went':314 'work':1116,1234 'workflow':1880 'wors':2280 'write':1662,1734 'wrong':398,1189,2049,2059 'xml':1745","prices":[{"id":"bd51d92d-0f56-4823-a4c8-1970462f9b06","listingId":"83e0d71f-3bc5-490d-9ebd-47e4dd765917","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"github","category":"awesome-copilot","install_from":"skills.sh"},"createdAt":"2026-04-18T20:35:58.926Z"}],"sources":[{"listingId":"83e0d71f-3bc5-490d-9ebd-47e4dd765917","source":"github","sourceId":"github/awesome-copilot/arize-prompt-optimization","sourceUrl":"https://github.com/github/awesome-copilot/tree/main/skills/arize-prompt-optimization","isPrimary":false,"firstSeenAt":"2026-04-18T21:48:17.322Z","lastSeenAt":"2026-04-22T00:52:03.875Z"},{"listingId":"83e0d71f-3bc5-490d-9ebd-47e4dd765917","source":"skills_sh","sourceId":"github/awesome-copilot/arize-prompt-optimization","sourceUrl":"https://skills.sh/github/awesome-copilot/arize-prompt-optimization","isPrimary":true,"firstSeenAt":"2026-04-18T20:35:58.926Z","lastSeenAt":"2026-04-22T03:40:37.822Z"}],"details":{"listingId":"83e0d71f-3bc5-490d-9ebd-47e4dd765917","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"github","slug":"arize-prompt-optimization","source":"skills_sh","category":"awesome-copilot","skills_sh_url":"https://skills.sh/github/awesome-copilot/arize-prompt-optimization"},"updatedAt":"2026-04-22T03:40:37.822Z"}}