{"id":"d65c572b-f35d-4d50-9633-6072ec63b18a","shortId":"xFtTYp","kind":"skill","title":"LLM Cost Optimizer","tagline":"Audits an AI application for unnecessary token spend and recommends prompt caching, model routing, and token reduction techniques to cut costs.","description":"# LLM Cost Optimizer\n\n## What this skill does\n\nThis skill audits an LLM application's prompts, call patterns, and model selection to identify cost reduction opportunities. It covers prompt caching, model routing (right-sizing), token reduction, batching, and output length control — the techniques that typically cut LLM costs by 40–80% without sacrificing quality.\n\n## How to use\n\n### Claude Code / Cline\n\nCopy this file to `.agents/skills/llm-cost-optimizer/SKILL.md` in your project root.\n\nThen ask:\n- *\"Use the LLM Cost Optimizer to audit our AI application.\"*\n- *\"How can I reduce our OpenAI API costs? Here are our prompts...\"*\n\nProvide:\n- Your system prompt(s)\n- Approximate daily call volume\n- Which model(s) you're using\n- Typical input/output token counts if known\n- Whether calls are real-time (low latency required) or batch (latency tolerant)\n\n### Cursor / Codex\n\nPaste your prompts, call patterns, and current monthly spend alongside these instructions.\n\n## The Prompt / Instructions for the Agent\n\nWhen asked to optimize LLM costs, audit the following areas in order of typical savings impact:\n\n### Audit 1 — Prompt Caching (savings: 50–90% on repeated prefixes)\n\n**Check:** Does the system prompt stay the same across calls?\n\nIf yes, enable prompt caching. The system prompt is sent once and cached — subsequent calls only pay for the new user tokens.\n\n```python\n# Anthropic Claude — cache_control on system prompt\nresponse = client.messages.create(\n    model=\"claude-opus-4-6\",\n    system=[{\n        \"type\": \"text\",\n        \"text\": your_system_prompt,\n        \"cache_control\": {\"type\": \"ephemeral\"}  # cached for 5 minutes\n    }],\n    messages=[{\"role\": \"user\", \"content\": user_message}]\n)\n\n# OpenAI — automatic prompt caching for prompts > 1024 tokens\n# No code change needed — cached automatically, check usage.prompt_tokens_details.cached_tokens\n```\n\n**When it applies:** Any app where the system prompt is > 1024 tokens and reused across calls. Support bots, coding assistants, document analyzers.\n\n**Savings estimate:** If system prompt = 2000 tokens, 10,000 calls/day → saves ~20M tokens/day in input costs.\n\n### Audit 2 — Model Right-Sizing (savings: 60–90% on over-specified models)\n\n**Check:** Are you using a frontier model (GPT-4o, Claude Opus) for tasks that a smaller model handles just as well?\n\n| Task | Recommended Model |\n|---|---|\n| Classification, routing, yes/no decisions | GPT-4o-mini, Claude Haiku |\n| Summarization, extraction, translation | GPT-4o-mini, Claude Sonnet |\n| Complex reasoning, code generation | GPT-4o, Claude Sonnet |\n| Novel research, multi-step agent planning | Claude Opus, o1 |\n\n**Implement a model router:**\n```python\ndef route_model(task_type: str, complexity: str) -> str:\n    if task_type in (\"classify\", \"extract\", \"translate\"):\n        return \"claude-haiku-4-5-20251001\"\n    if complexity == \"high\" or task_type == \"code_generation\":\n        return \"claude-sonnet-4-6\"\n    return \"claude-haiku-4-5-20251001\"  # default to cheap\n```\n\n### Audit 3 — Token Reduction (savings: 20–40% on bloated prompts)\n\n**Check:** Is the system prompt longer than it needs to be?\n\nCommon bloat patterns:\n- Repeating the same instruction multiple ways (\"Be concise. Keep answers short. Don't ramble.\")\n- Long examples when one would do\n- Full document context when only a section is needed\n- Verbose role descriptions\n\n**Token reduction techniques:**\n\n1. **Compress examples** — use 1 example instead of 3 if the task is clear\n2. **Use structured format** — bullet points use fewer tokens than prose instructions\n3. **Trim RAG context** — retrieve top-3 chunks, not top-10; rerank before sending\n4. **Limit output length** — set `max_tokens` to the minimum needed:\n```python\n# If you only need a one-sentence answer, cap it\nresponse = client.messages.create(max_tokens=100, ...)\n```\n\n### Audit 4 — Response Caching (savings: 30–70% for repetitive queries)\n\n**Check:** Do users ask similar questions repeatedly?\n\nCache model responses by a hash of the (system_prompt + user_input) pair:\n\n```python\nimport hashlib, json\n\ndef get_cached_or_call(system: str, user: str) -> str:\n    key = hashlib.sha256(f\"{system}:{user}\".encode()).hexdigest()\n    cached = redis_client.get(key)\n    if cached:\n        return json.loads(cached)\n\n    response = call_llm(system, user)\n    redis_client.setex(key, 3600, json.dumps(response))  # cache 1hr\n    return response\n```\n\nUse semantic similarity for fuzzy cache hits if exact-match cache rate is low.\n\n### Audit 5 — Batching (savings: 50% cost + latency for async workloads)\n\n**Check:** Are you running background jobs (document processing, bulk analysis) one-at-a-time?\n\nBoth OpenAI and Anthropic offer Batch APIs at 50% discount for async workloads:\n\n```python\n# Anthropic Batch API\nbatch = client.messages.batches.create(\n    requests=[\n        {\"custom_id\": f\"doc_{i}\", \"params\": {\"model\": \"...\", \"messages\": [...]}}\n        for i, doc in enumerate(documents)\n    ]\n)\n# Results available within 24hrs at 50% of standard price\n```\n\nUse when: processing 100+ documents, nightly summarization jobs, bulk classification.\n\n### Audit 6 — Streaming Efficiency\n\n**Check:** Are you streaming responses but storing the full output anyway?\n\nIf you don't need to stream to the user, disable streaming — it has slightly higher overhead for short responses. Only stream when showing real-time output to users.\n\n### Cost Estimate Template\n\nAfter auditing, produce a cost breakdown:\n\n| Optimization | Monthly Savings Estimate | Effort |\n|---|---|---|\n| Prompt caching | $X | Low |\n| Switch summarization to Haiku | $X | Low |\n| Cap max_tokens on short-answer routes | $X | Low |\n| Response caching (top 20% queries) | $X | Medium |\n| Batch API for nightly jobs | $X | Medium |\n| **Total** | **$X** | |\n\n## Example\n\n**Input:**\n> \"We use Claude Opus for everything. System prompt is 3000 tokens. We do 5000 calls/day for customer support — mostly classifying intent and drafting short replies.\"\n\n**Output:**\n> **Critical finding: Wrong model for workload.**\n> Intent classification and short reply drafting = Haiku-level tasks. Switching to claude-haiku-4-5-20251001 saves ~85% per token.\n>\n> **Prompt caching:** 3000-token system prompt × 5000 calls = 15M cached tokens/day. Enable `cache_control` on your system prompt.\n>\n> **Combined monthly savings estimate: ~$2,800/month** based on Anthropic pricing, down from ~$3,400 to ~$600.","tags":["llm","cost","optimizer","openagentskills","notysoty","agent-skills","claude","claude-code","claude-skills","cline","cursor","llm-skills"],"capabilities":["skill","source-notysoty","skill-llm-cost-optimizer","topic-agent-skills","topic-claude","topic-claude-code","topic-claude-skills","topic-cline","topic-cursor","topic-llm","topic-llm-skills","topic-skills"],"categories":["openagentskills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/Notysoty/openagentskills/llm-cost-optimizer","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add Notysoty/openagentskills","source_repo":"https://github.com/Notysoty/openagentskills","install_from":"skills.sh"}},"qualityScore":"0.454","qualityRationale":"deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (6,402 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T19:13:22.549Z","embedding":null,"createdAt":"2026-05-18T13:20:43.787Z","updatedAt":"2026-05-18T19:13:22.549Z","lastSeenAt":"2026-05-18T19:13:22.549Z","tsv":"'-10':546 '-20251001':426,447,890 '-3':542 '-5':425,446,889 '-6':245,440 '000':314 '1':189,510,514 '10':313 '100':577,737 '1024':273,294 '15m':903 '1hr':648 '2':323,524,917 '20':456,826 '2000':311 '20m':317 '24hrs':728 '3':452,518,536,925 '30':583 '3000':850,897 '3600':644 '4':244,424,439,445,550,579,888 '40':74,457 '400':926 '4o':345,367,376,386 '5':259,667 '50':193,670,699,730 '5000':854,901 '6':745 '60':329 '600':928 '70':584 '80':75 '800/month':918 '85':892 '90':194,330 'across':206,298 'agent':171,394 'agents/skills/llm-cost-optimizer/skill.md':89 'ai':6,104 'alongsid':163 'analysi':685 'analyz':305 'answer':484,570,819 'anthrop':231,694,705,921 'anyway':758 'api':112,697,707,831 'app':288 'appli':286 'applic':7,37,105 'approxim':123 'area':181 'ask':95,173,591 'assist':303 'async':674,702 'audit':4,34,102,178,188,322,451,578,666,744,793 'automat':268,280 'avail':726 'background':680 'base':919 'batch':61,149,668,696,706,708,830 'bloat':459,473 'bot':301 'breakdown':797 'bulk':684,742 'bullet':528 'cach':15,53,191,212,220,233,253,257,270,279,581,595,614,629,633,636,647,656,662,804,824,896,904,907 'call':40,125,140,157,207,222,299,616,638,902 'calls/day':315,855 'cap':571,813 'chang':277 'cheap':450 'check':198,281,336,461,588,676,748 'chunk':543 'classif':361,743,874 'classifi':417,860 'claud':82,232,242,346,369,378,387,396,422,437,443,843,886 'claude-haiku':421,442,885 'claude-opus':241 'claude-sonnet':436 'clear':523 'client.messages.batches.create':709 'client.messages.create':239,574 'cline':84 'code':83,276,302,382,433 'codex':153 'combin':913 'common':472 'complex':380,410,428 'compress':511 'concis':482 'content':264 'context':497,539 'control':65,234,254,908 'copi':85 'cost':2,24,26,47,72,99,113,177,321,671,789,796 'count':136 'cover':51 'critic':867 'current':160 'cursor':152 'custom':711,857 'cut':23,70 'daili':124 'decis':364 'def':404,612 'default':448 'descript':506 'disabl':769 'discount':700 'doc':714,721 'document':304,496,682,724,738 'draft':863,878 'effici':747 'effort':802 'enabl':210,906 'encod':627 'enumer':723 'ephemer':256 'estim':307,790,801,916 'everyth':846 'exact':660 'exact-match':659 'exampl':490,512,515,839 'extract':372,418 'f':624,713 'fewer':531 'file':87 'find':868 'follow':180 'format':527 'frontier':341 'full':495,756 'fuzzi':655 'generat':383,434 'get':613 'gpt':344,366,375,385 'gpt-4o':343,384 'gpt-4o-mini':365,374 'haiku':370,423,444,810,880,887 'haiku-level':879 'handl':354 'hash':600 'hashlib':610 'hashlib.sha256':623 'hexdigest':628 'high':429 'higher':774 'hit':657 'id':712 'identifi':46 'impact':187 'implement':399 'import':609 'input':320,606,840 'input/output':134 'instead':516 'instruct':165,168,478,535 'intent':861,873 'job':681,741,834 'json':611 'json.dumps':645 'json.loads':635 'keep':483 'key':622,631,643 'known':138 'latenc':146,150,672 'length':64,553 'level':881 'limit':551 'llm':1,25,36,71,98,176,639 'long':489 'longer':466 'low':145,665,806,812,822 'match':661 'max':555,575,814 'medium':829,836 'messag':261,266,718 'mini':368,377 'minimum':559 'minut':260 'model':16,43,54,128,240,324,335,342,353,360,401,406,596,717,870 'month':161,799,914 'most':859 'multi':392 'multi-step':391 'multipl':479 'need':278,469,503,560,565,763 'new':227 'night':739,833 'novel':389 'o1':398 'offer':695 'one':492,568,687 'one-at-a-tim':686 'one-sent':567 'openai':111,267,692 'opportun':49 'optim':3,27,100,175,798 'opus':243,347,397,844 'order':183 'output':63,552,757,786,866 'over-specifi':332 'overhead':775 'pair':607 'param':716 'past':154 'pattern':41,158,474 'pay':224 'per':893 'plan':395 'point':529 'prefix':197 'price':733,922 'process':683,736 'produc':794 'project':92 'prompt':14,39,52,117,121,156,167,190,202,211,215,237,252,269,272,292,310,460,465,604,803,848,895,900,912 'prose':534 'provid':118 'python':230,403,561,608,704 'qualiti':78 'queri':587,827 'question':593 'rag':538 'rambl':488 'rate':663 're':131 'real':143,784 'real-tim':142,783 'reason':381 'recommend':13,359 'redis_client.get':630 'redis_client.setex':642 'reduc':109 'reduct':20,48,60,454,508 'repeat':196,475,594 'repetit':586 'repli':865,877 'request':710 'requir':147 'rerank':547 'research':390 'respons':238,573,580,597,637,646,650,752,778,823 'result':725 'retriev':540 'return':420,435,441,634,649 'reus':297 'right':57,326 'right-siz':56,325 'role':262,505 'root':93 'rout':17,55,362,405,820 'router':402 'run':679 'sacrif':77 'save':186,192,306,316,328,455,582,669,800,891,915 'section':501 'select':44 'semant':652 'send':549 'sent':217 'sentenc':569 'set':554 'short':485,777,818,864,876 'short-answ':817 'show':782 'similar':592,653 'size':58,327 'skill':30,33 'skill-llm-cost-optimizer' 'slight':773 'smaller':352 'sonnet':379,388,438 'source-notysoty' 'specifi':334 'spend':11,162 'standard':732 'stay':203 'step':393 'store':754 'str':409,411,412,618,620,621 'stream':746,751,765,770,780 'structur':526 'subsequ':221 'summar':371,740,808 'support':300,858 'switch':807,883 'system':120,201,214,236,246,251,291,309,464,603,617,625,640,847,899,911 'task':349,358,407,414,431,521,882 'techniqu':21,67,509 'templat':791 'text':248,249 'time':144,690,785 'token':10,19,59,135,229,274,283,295,312,453,507,532,556,576,815,851,894,898 'tokens/day':318,905 'toler':151 'top':541,545,825 'topic-agent-skills' 'topic-claude' 'topic-claude-code' 'topic-claude-skills' 'topic-cline' 'topic-cursor' 'topic-llm' 'topic-llm-skills' 'topic-skills' 'total':837 'translat':373,419 'trim':537 'type':247,255,408,415,432 'typic':69,133,185 'unnecessari':9 'usage.prompt_tokens_details.cached':282 'use':81,96,132,339,513,525,530,651,734,842 'user':228,263,265,590,605,619,626,641,768,788 'verbos':504 'volum':126 'way':480 'well':357 'whether':139 'within':727 'without':76 'workload':675,703,872 'would':493 'wrong':869 'x':805,811,821,828,835,838 'yes':209 'yes/no':363","prices":[{"id":"de3c7c64-c0fa-401b-8365-c873f6e55576","listingId":"d65c572b-f35d-4d50-9633-6072ec63b18a","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"Notysoty","category":"openagentskills","install_from":"skills.sh"},"createdAt":"2026-05-18T13:20:43.787Z"}],"sources":[{"listingId":"d65c572b-f35d-4d50-9633-6072ec63b18a","source":"github","sourceId":"Notysoty/openagentskills/llm-cost-optimizer","sourceUrl":"https://github.com/Notysoty/openagentskills/tree/main/skills/llm-cost-optimizer","isPrimary":false,"firstSeenAt":"2026-05-18T13:20:43.787Z","lastSeenAt":"2026-05-18T19:13:22.549Z"}],"details":{"listingId":"d65c572b-f35d-4d50-9633-6072ec63b18a","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"Notysoty","slug":"llm-cost-optimizer","github":{"repo":"Notysoty/openagentskills","stars":8,"topics":["agent-skills","claude","claude-code","claude-skills","cline","cursor","llm","llm-skills","skills"],"license":"mit","html_url":"https://github.com/Notysoty/openagentskills","pushed_at":"2026-03-28T06:50:19Z","description":"A  community-driven library of reusable AI agent skills for Claude Code, Cursor, Codex, Cline, and more.","skill_md_sha":"76873d6db439caa10182907567e9cb40142be043","skill_md_path":"skills/llm-cost-optimizer/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/Notysoty/openagentskills/tree/main/skills/llm-cost-optimizer"},"layout":"multi","source":"github","category":"openagentskills","frontmatter":{"name":"LLM Cost Optimizer","description":"Audits an AI application for unnecessary token spend and recommends prompt caching, model routing, and token reduction techniques to cut costs."},"skills_sh_url":"https://skills.sh/Notysoty/openagentskills/llm-cost-optimizer"},"updatedAt":"2026-05-18T19:13:22.549Z"}}