{"id":"3be43edb-c239-41d9-aafb-bc74c02c5cb7","shortId":"aKvjK7","kind":"skill","title":"building-with-llms","tagline":"Produce an LLM Build Pack (prompt+tool contract, data/eval plan, architecture+safety, launch checklist). See also: ai-evals (eval only), ai-product-strategy (strategy only).","description":"# Building with LLMs\n\n## Scope\n\n**Covers**\n- Building and shipping LLM-powered features/apps (assistant, copilot, light agent workflows)\n- Prompt + tool contract design (instructions, schemas, examples, guardrails)\n- Data quality + evaluation (test sets, rubrics, red teaming, iteration loop)\n- Production readiness (latency/cost budgets, logging, fallbacks, safety/security checks)\n- Using coding agents (Codex/Claude Code) to accelerate engineering safely\n\n**When to use**\n- “Turn this LLM feature idea into a build plan with prompts, evals, and launch checks.”\n- “We need a system prompt + tool definitions + output schema for our LLM workflow.”\n- “Our LLM is flaky—design an eval plan and iteration loop to stabilize quality.”\n- “Design a RAG/tool-using agent approach with safety and monitoring.”\n- “We want to use an AI coding agent to implement this—set constraints and review gates.”\n\n**When NOT to use**\n- You need product/portfolio strategy and positioning (use `ai-product-strategy`).\n- You need a full PRD/spec set for cross-functional alignment (use `writing-prds` / `writing-specs-designs`).\n- You need primary user research (use `conducting-user-interviews` / `usability-testing`).\n- You are doing model training/research, infra architecture, or bespoke model tuning (delegate to ML/eng; this skill assumes API models).\n- You only want “which model/provider should we pick?” (treat as an input; if it dominates, do a separate evaluation doc).\n- You want to design an eval/benchmark framework without building a specific feature (use `ai-evals`).\n- You need to evaluate a vendor/tool for adoption rather than build an LLM feature (use `evaluating-new-technology`).\n- You want to quickly prototype or vibe-code an idea without production planning (use `vibe-coding`).\n\n## Inputs\n\n**Minimum required**\n- Use case + target user + what “good” looks like (success metrics + failure modes)\n- The LLM’s job: generate text, transform data, classify, extract, plan, or take actions via tools\n- Constraints: privacy/compliance, data sensitivity, latency, cost, reliability, supported regions\n- Integration surface: UI/workflow, downstream systems/APIs/tools, and any required output schema\n\n**Missing-info strategy**\n- Ask up to 5 questions from [references/INTAKE.md](references/INTAKE.md) (3–5 at a time).\n- If details remain missing, proceed with explicit assumptions and provide 2–3 options (prompting vs RAG vs tool use; autonomy level).\n- If asked to write code or run commands, request confirmation and use least privilege (no secrets; avoid destructive changes).\n\n## Outputs (deliverables)\n\nProduce an **LLM Build Pack** (in chat; or as files if requested), in this order:\n\n1) **Feature brief** (goal, users, non-goals, constraints, success + guardrails)\n2) **System design sketch** (pattern + architecture, context strategy, budgets, failure handling)\n3) **Prompt + tool contract** (system prompt, tool schemas, output schema, examples, refusal/guardrails)\n4) **Data + evaluation plan** (test set, rubrics, automated checks, red-team suite, acceptance thresholds)\n5) **Build + iteration plan** (prototype slice, instrumentation, debugging loop, how to use coding agents safely)\n6) **Launch + monitoring plan** (logging, dashboards/alerts, fallback/rollback, incident playbook hooks)\n7) **Risks / Open questions / Next steps** (always included)\n\nTemplates: [references/TEMPLATES.md](references/TEMPLATES.md)\n\n## Workflow (8 steps)\n\n### 1) Frame the job, boundary, and “good”\n- **Inputs:** Use case, target user, constraints.\n- **Actions:** Write a crisp job statement (“The LLM must…”) + 3–5 non-goals. Define success metrics and guardrails (quality, safety, cost, latency).\n- **Outputs:** Draft **Feature brief**.\n- **Checks:** A stakeholder can restate what the LLM does and does not do, and how success is measured.\n\n### 2) Choose the minimum viable autonomy pattern\n- **Inputs:** Workflow + risk tolerance.\n- **Actions:** Decide assistant vs copilot vs agent-like tool use. Identify “human control points” (review/approve moments) and what the model is never allowed to do.\n- **Outputs:** Autonomy decisions captured in **Feature brief**.\n- **Checks:** Any action-taking behavior has explicit permissions, confirmations, and an undo/rollback story.\n\n### 3) Design the context strategy (prompting → RAG → tools)\n- **Inputs:** Data sources, integration points, constraints.\n- **Actions:** Decide how the model gets reliable context: instruction hierarchy, retrieval strategy, tool calls, structured inputs. Define the “source of truth” and how conflicts are handled.\n- **Outputs:** Draft **System design sketch**.\n- **Checks:** You can explain (a) what data is used, (b) where it comes from, (c) how freshness/authority is enforced.\n\n### 4) Draft the prompt + tool contract (make the system legible)\n- **Inputs:** Job statement + context strategy + output schema needs.\n- **Actions:** Write the system prompt, tool descriptions, and output schema. Add examples and explicit DO/DO NOT rules. Include safe failure behavior (ask clarifying questions, abstain, cite sources).\n- **Outputs:** **Prompt + tool contract**.\n- **Checks:** A reviewer can predict behavior for 5–10 representative inputs; contract includes at least 3 hard constraints and examples.\n\n### 5) Build the eval set + rubric (debug like software)\n- **Inputs:** Expected behaviors + failure modes + edge cases.\n- **Actions:** Create a test set covering normal cases, tricky cases, and red-team cases. Define a scoring rubric and acceptance thresholds. Add automated checks where possible (schema validity, citation presence, forbidden content).\n- **Outputs:** **Data + evaluation plan**.\n- **Checks:** You can run the same prompts repeatedly and measure improvement/regression; evals cover the top failure modes.\n\n### 6) Prototype a thin slice, using coding agents safely\n- **Inputs:** System sketch + prompt contract + eval plan.\n- **Actions:** Implement the smallest end-to-end slice. Use coding agents for “lower hanging fruit” tasks, but keep tight constraints: small diffs, tests, code review, no secret handling.\n- **Outputs:** **Build + iteration plan** (and optionally a prototype plan/checklist).\n- **Checks:** You can explain what the agent changed, why, and how it was validated (tests, evals, manual review).\n\n### 7) Production readiness: budgets, monitoring, and failure handling\n- **Inputs:** Prototype learnings + constraints.\n- **Actions:** Define cost/latency budgets, fallbacks, rate limits, logging fields, and alert thresholds. Address prompt injection/tool misuse risks; add safeguards and review processes.\n- **Outputs:** **Launch + monitoring plan**.\n- **Checks:** There is a clear path to detect regressions, cap cost, and safely degrade when the model misbehaves.\n\n### 8) Quality gate + finalize\n- **Inputs:** Full draft pack.\n- **Actions:** Run [references/CHECKLISTS.md](references/CHECKLISTS.md) and score with [references/RUBRIC.md](references/RUBRIC.md). Tighten unclear contracts, add missing tests, and always include **Risks / Open questions / Next steps**.\n- **Outputs:** Final **LLM Build Pack**.\n- **Checks:** A team can execute the plan without a meeting; unknowns are explicit and owned.\n\n## Anti-patterns (common failure modes)\n\n1. **\"Prompt and pray\"** — Shipping a system prompt with no eval set, no automated checks, and no iteration loop. Result: quality regresses silently after every model update.\n2. **Skipping the tool contract** — Giving the LLM tool access without explicit schemas, permission boundaries, and confirmation gates. Result: unintended side-effects (e.g., deleting records, sending emails) in production.\n3. **RAG without retrieval quality metrics** — Building a retrieval pipeline but never measuring recall, precision, or freshness of retrieved chunks. Result: the LLM confidently hallucinates from stale or irrelevant context.\n4. **Optimizing cost before correctness** — Compressing prompts, switching to smaller models, or removing examples to save tokens before the eval set proves the system works. Result: false savings with broken quality.\n5. **Ignoring adversarial inputs** — No red-team cases or prompt-injection tests. Result: the first creative user bypasses guardrails in ways the team never imagined.\n\n## Quality gate (required)\n- Use [references/CHECKLISTS.md](references/CHECKLISTS.md) and [references/RUBRIC.md](references/RUBRIC.md).\n- Always include: **Risks**, **Open questions**, **Next steps**.\n\n## Examples\n\n**Example 1 (RAG copilot):** “Use `building-with-llms` to plan a support-response copilot that drafts replies using our internal KB. Constraints: no PII leakage; must cite sources; p95 latency < 3s; cost < $0.10/ticket.”  \nExpected: LLM Build Pack with prompt/tool contract, eval set (including privacy red-team cases), and monitoring/rollback plan.\n\n**Example 2 (tool-using workflow):** “Use `building-with-llms` to design an LLM workflow that turns meeting notes into action items and Jira tickets (human review required). Output must be valid JSON.”  \nExpected: output schema + tool contract + eval plan for structured extraction + guardrails against over-creation.\n\n**Boundary example (out of scope — redirect):** “We need to decide whether to adopt an AI coding assistant tool for our engineering team.”\nResponse: This is a technology evaluation/adoption decision, not an LLM build task. Redirect to `evaluating-new-technology` for an options matrix, pilot plan, and decision memo. If they later decide to build a custom coding assistant, return here.\n\n**Boundary example (out of scope — ML/infra):** “Fine-tune/train a new LLM from scratch.”\nResponse: Out of scope; this skill assumes API-hosted models. Propose an API-model approach first and highlight what ML/infra work is required if training is truly needed.","tags":["building","with","llms","lenny","skills","plus","liqiongyu","agent-skills","ai-agents","automation","claude","codex"],"capabilities":["skill","source-liqiongyu","skill-building-with-llms","topic-agent-skills","topic-ai-agents","topic-automation","topic-claude","topic-codex","topic-prompt-engineering","topic-refoundai","topic-skillpack"],"categories":["lenny_skills_plus"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/liqiongyu/lenny_skills_plus/building-with-llms","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add liqiongyu/lenny_skills_plus","source_repo":"https://github.com/liqiongyu/lenny_skills_plus","install_from":"skills.sh"}},"qualityScore":"0.474","qualityRationale":"deterministic score 0.47 from registry signals: · indexed on github topic:agent-skills · 49 github stars · SKILL.md body (9,789 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-04-22T06:56:19.158Z","embedding":null,"createdAt":"2026-04-18T22:16:18.331Z","updatedAt":"2026-04-22T06:56:19.158Z","lastSeenAt":"2026-04-22T06:56:19.158Z","tsv":"'/ticket':1206 '/train':1344 '0.10':1205 '1':417,505,1009,1172 '10':742 '2':370,428,563,1036,1226 '3':355,371,439,527,621,749,1066 '3s':1203 '4':451,685,1096 '5':350,356,466,528,741,754,1127 '6':481,824 '7':491,896 '8':503,952 'abstain':727 'acceler':81 'accept':464,790 'access':1045 'action':321,518,574,610,635,703,770,840,908,960,1246 'action-tak':609 'add':713,792,925,972 'address':920 'adopt':263,1286 'adversari':1129 'agent':47,77,132,145,479,581,831,851,884 'agent-lik':580 'ai':22,27,143,166,254,1288 'ai-ev':21,253 'ai-product-strategi':26,165 'alert':918 'align':179 'allow':597 'also':20 'alway':497,976,1163 'anti':1004 'anti-pattern':1003 'api':218,1358,1364 'api-host':1357 'api-model':1363 'approach':133,1366 'architectur':15,207,433 'ask':347,382,724 'assist':44,576,1290,1332 'assum':217,1356 'assumpt':367 'autom':458,793,1022 'autonomi':379,568,601 'avoid':397 'b':675 'behavior':612,723,739,765 'bespok':209 'boundari':509,1050,1274,1335 'brief':419,544,606 'broken':1125 'budget':70,436,899,911 'build':2,8,32,37,94,248,266,405,467,755,870,986,1072,1177,1209,1233,1306,1328 'building-with-llm':1,1176,1232 'bypass':1146 'c':680 'call':648 'cap':943 'captur':603 'case':297,514,769,777,779,784,1135,1221 'chang':399,885 'chat':408 'check':74,101,459,545,607,666,734,794,807,878,934,988,1023 'checklist':18 'choos':564 'chunk':1085 'citat':799 'cite':728,1199 'clarifi':725 'classifi':316 'clear':938 'code':76,79,144,283,292,385,478,830,850,864,1289,1331 'codex/claude':78 'come':678 'command':388 'common':1006 'compress':1101 'conduct':195 'conducting-user-interview':194 'confid':1089 'confirm':390,616,1052 'conflict':658 'constraint':150,324,425,517,634,751,860,907,1194 'content':802 'context':434,624,642,698,1095 'contract':12,51,442,690,733,745,837,971,1040,1213,1263 'control':587 'copilot':45,578,1174,1186 'correct':1100 'cost':329,539,944,1098,1204 'cost/latency':910 'cover':36,775,819 'creat':771 'creation':1273 'creativ':1144 'crisp':521 'cross':177 'cross-funct':176 'custom':1330 'dashboards/alerts':486 'data':57,315,326,452,630,672,804 'data/eval':13 'debug':473,760 'decid':575,636,1283,1326 'decis':602,1302,1321 'defin':532,651,785,909 'definit':108 'degrad':947 'deleg':212 'delet':1060 'deliver':401 'descript':709 'design':52,119,129,187,243,430,622,664,1237 'destruct':398 'detail':361 'detect':941 'diff':862 'do/do':717 'doc':239 'domin':234 'downstream':336 'draft':542,662,686,958,1188 'e.g':1059 'edg':768 'effect':1058 'email':1063 'end':845,847 'end-to-end':844 'enforc':684 'engin':82,1294 'eval':23,24,98,121,255,757,818,838,893,1019,1115,1214,1264 'eval/benchmark':245 'evalu':59,238,259,272,453,805,1311 'evaluating-new-technolog':271,1310 'evaluation/adoption':1301 'everi':1033 'exampl':55,449,714,753,1109,1170,1171,1225,1275,1336 'execut':992 'expect':764,1207,1259 'explain':669,881 'explicit':366,614,716,1000,1047 'extract':317,1268 'failur':306,437,722,766,822,902,1007 'fallback':72,912 'fallback/rollback':487 'fals':1122 'featur':90,251,269,418,543,605 'features/apps':43 'field':916 'file':411 'final':955,984 'fine':1342 'fine-tun':1341 'first':1143,1367 'flaki':118 'forbidden':801 'frame':506 'framework':246 'fresh':1082 'freshness/authority':682 'fruit':855 'full':172,957 'function':178 'gate':153,954,1053,1155 'generat':312 'get':640 'give':1041 'goal':420,424,531 'good':301,511 'guardrail':56,427,536,1147,1269 'hallucin':1090 'handl':438,660,868,903 'hang':854 'hard':750 'hierarchi':644 'highlight':1369 'hook':490 'host':1359 'human':586,1251 'idea':91,285 'identifi':585 'ignor':1128 'imagin':1153 'implement':147,841 'improvement/regression':817 'incid':488 'includ':498,720,746,977,1164,1216 'info':345 'infra':206 'inject':1139 'injection/tool':922 'input':231,293,512,570,629,650,695,744,763,833,904,956,1130 'instruct':53,643 'instrument':472 'integr':333,632 'intern':1192 'interview':197 'irrelev':1094 'item':1247 'iter':65,124,468,871,1026 'jira':1249 'job':311,508,522,696 'json':1258 'kb':1193 'keep':858 'latenc':328,540,1202 'latency/cost':69 'later':1325 'launch':17,100,482,931 'leakag':1197 'learn':906 'least':393,748 'legibl':694 'level':380 'light':46 'like':303,582,761 'limit':914 'llm':7,41,89,113,116,268,309,404,525,552,985,1043,1088,1208,1239,1305,1347 'llm-power':40 'llms':4,34,1179,1235 'log':71,485,915 'look':302 'loop':66,125,474,1027 'lower':853 'make':691 'manual':894 'matrix':1317 'measur':562,816,1078 'meet':997,1243 'memo':1322 'metric':305,534,1071 'minimum':294,566 'misbehav':951 'miss':344,363,973 'missing-info':343 'misus':923 'ml/eng':214 'ml/infra':1340,1371 'mode':307,767,823,1008 'model':204,210,219,594,639,950,1034,1106,1360,1365 'model/provider':224 'moment':590 'monitor':137,483,900,932 'monitoring/rollback':1223 'must':526,1198,1255 'need':103,159,170,189,257,702,1281,1379 'never':596,1077,1152 'new':273,1312,1346 'next':495,981,1168 'non':423,530 'non-goal':422,529 'normal':776 'note':1244 'open':493,979,1166 'optim':1097 'option':372,874,1316 'order':416 'output':109,341,400,447,541,600,661,700,711,730,803,869,930,983,1254,1260 'over-cr':1271 'own':1002 'p95':1201 'pack':9,406,959,987,1210 'path':939 'pattern':432,569,1005 'permiss':615,1049 'pick':227 'pii':1196 'pilot':1318 'pipelin':1075 'plan':14,95,122,288,318,454,469,484,806,839,872,933,994,1181,1224,1265,1319 'plan/checklist':877 'playbook':489 'point':588,633 'posit':163 'possibl':796 'power':42 'pray':1012 'prd/spec':173 'prds':183 'precis':1080 'predict':738 'presenc':800 'primari':190 'privaci':1217 'privacy/compliance':325 'privileg':394 'proceed':364 'process':929 'produc':5,402 'product':28,67,167,287,897,1065 'product/portfolio':160 'prompt':10,49,97,106,373,440,444,626,688,707,731,813,836,921,1010,1016,1102,1138 'prompt-inject':1137 'prompt/tool':1212 'propos':1361 'prototyp':279,470,825,876,905 'prove':1117 'provid':369 'qualiti':58,128,537,953,1029,1070,1126,1154 'question':351,494,726,980,1167 'quick':278 'rag':375,627,1067,1173 'rag/tool-using':131 'rate':913 'rather':264 'readi':68,898 'recal':1079 'record':1061 'red':63,461,782,1133,1219 'red-team':460,781,1132,1218 'redirect':1279,1308 'references/checklists.md':962,963,1158,1159 'references/intake.md':353,354 'references/rubric.md':967,968,1161,1162 'references/templates.md':500,501 'refusal/guardrails':450 'region':332 'regress':942,1030 'reliabl':330,641 'remain':362 'remov':1108 'repeat':814 'repli':1189 'repres':743 'request':389,413 'requir':295,340,1156,1253,1374 'research':192 'respons':1185,1296,1350 'restat':549 'result':1028,1054,1086,1121,1141 'retriev':645,1069,1074,1084 'return':1333 'review':152,736,865,895,928,1252 'review/approve':589 'risk':492,572,924,978,1165 'rubric':62,457,759,788 'rule':719 'run':387,810,961 'safe':83,480,721,832,946 'safeguard':926 'safeti':16,135,538 'safety/security':73 'save':1111,1123 'schema':54,110,342,446,448,701,712,797,1048,1261 'scope':35,1278,1339,1353 'score':787,965 'scratch':1349 'secret':396,867 'see':19 'send':1062 'sensit':327 'separ':237 'set':61,149,174,456,758,774,1020,1116,1215 'ship':39,1013 'side':1057 'side-effect':1056 'silent':1031 'sketch':431,665,835 'skill':216,1355 'skill-building-with-llms' 'skip':1037 'slice':471,828,848 'small':861 'smaller':1105 'smallest':843 'softwar':762 'sourc':631,653,729,1200 'source-liqiongyu' 'spec':186 'specif':250 'stabil':127 'stakehold':547 'stale':1092 'statement':523,697 'step':496,504,982,1169 'stori':620 'strategi':29,30,161,168,346,435,625,646,699 'structur':649,1267 'success':304,426,533,560 'suit':463 'support':331,1184 'support-respons':1183 'surfac':334 'switch':1103 'system':105,429,443,663,693,706,834,1015,1119 'systems/apis/tools':337 'take':320,611 'target':298,515 'task':856,1307 'team':64,462,783,990,1134,1151,1220,1295 'technolog':274,1300,1313 'templat':499 'test':60,200,455,773,863,892,974,1140 'text':313 'thin':827 'threshold':465,791,919 'ticket':1250 'tight':859 'tighten':969 'time':359 'token':1112 'toler':573 'tool':11,50,107,323,377,441,445,583,628,647,689,708,732,1039,1044,1228,1262,1291 'tool-us':1227 'top':821 'topic-agent-skills' 'topic-ai-agents' 'topic-automation' 'topic-claude' 'topic-codex' 'topic-prompt-engineering' 'topic-refoundai' 'topic-skillpack' 'train':1376 'training/research':205 'transform':314 'treat':228 'tricki':778 'truli':1378 'truth':655 'tune':211,1343 'turn':87,1242 'ui/workflow':335 'unclear':970 'undo/rollback':619 'unintend':1055 'unknown':998 'updat':1035 'usability-test':198 'usabl':199 'use':75,86,141,157,164,180,193,252,270,289,296,378,392,477,513,584,674,829,849,1157,1175,1190,1229,1231 'user':191,196,299,421,516,1145 'valid':798,891,1257 'vendor/tool':261 'via':322 'viabl':567 'vibe':282,291 'vibe-cod':281,290 'vs':374,376,577,579 'want':139,222,241,276 'way':1149 'whether':1284 'without':247,286,995,1046,1068 'work':1120,1372 'workflow':48,114,502,571,1230,1240 'write':182,185,384,519,704 'writing-prd':181 'writing-specs-design':184","prices":[{"id":"fca01efa-c016-488d-a88c-5b582a333063","listingId":"3be43edb-c239-41d9-aafb-bc74c02c5cb7","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"liqiongyu","category":"lenny_skills_plus","install_from":"skills.sh"},"createdAt":"2026-04-18T22:16:18.331Z"}],"sources":[{"listingId":"3be43edb-c239-41d9-aafb-bc74c02c5cb7","source":"github","sourceId":"liqiongyu/lenny_skills_plus/building-with-llms","sourceUrl":"https://github.com/liqiongyu/lenny_skills_plus/tree/main/skills/building-with-llms","isPrimary":false,"firstSeenAt":"2026-04-18T22:16:18.331Z","lastSeenAt":"2026-04-22T06:56:19.158Z"}],"details":{"listingId":"3be43edb-c239-41d9-aafb-bc74c02c5cb7","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"liqiongyu","slug":"building-with-llms","github":{"repo":"liqiongyu/lenny_skills_plus","stars":49,"topics":["agent-skills","ai-agents","automation","claude","codex","prompt-engineering","refoundai","skillpack"],"license":"apache-2.0","html_url":"https://github.com/liqiongyu/lenny_skills_plus","pushed_at":"2026-04-04T06:30:11Z","description":"86 agent-executable skill packs converted from RefoundAI’s Lenny skills (unofficial). Works with Codex + Claude Code.","skill_md_sha":"7743977e92b6fc44880df7eeba02488bbc95cbf0","skill_md_path":"skills/building-with-llms/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/liqiongyu/lenny_skills_plus/tree/main/skills/building-with-llms"},"layout":"multi","source":"github","category":"lenny_skills_plus","frontmatter":{"name":"building-with-llms","description":"Produce an LLM Build Pack (prompt+tool contract, data/eval plan, architecture+safety, launch checklist). See also: ai-evals (eval only), ai-product-strategy (strategy only)."},"skills_sh_url":"https://skills.sh/liqiongyu/lenny_skills_plus/building-with-llms"},"updatedAt":"2026-04-22T06:56:19.158Z"}}