{"id":"3c7cdd73-f8b1-4442-a909-7ffb0a87e52c","shortId":"caNDTy","kind":"skill","title":"airunway-aks-setup","tagline":"Set up AI Runway on AKS — from bare cluster to running model. Covers cluster verification, controller install, GPU assessment, provider setup, and first deployment. WHEN: \"setup AI Runway\", \"onboard AKS cluster\", \"install AI Runway\", \"airunway setup\", \"deploy model to AKS\", \"GPU ","description":"# AI Runway AKS Setup\n\nThis skill walks users from a bare Kubernetes cluster to a running AI model deployment. Follow each step in sequence unless the user provides `skip-to-step N` to resume from a specific phase.\n\n> **Cost awareness:** GPU node pools incur significant compute charges (A100-80GB can cost $3–5+/hr). Confirm the user understands cost implications before provisioning GPU resources.\n\n## Prerequisites\n\nThis skill assumes an AKS cluster already exists. If the user does not have a cluster, hand off to the `azure-kubernetes` skill first to provision one (with a GPU node pool unless CPU-only inference is acceptable), then return here.\n\n## Quick Reference\n\n| Property | Value |\n|----------|-------|\n| Best for | End-to-end AI Runway onboarding on AKS |\n| CLI tools | `kubectl`, `make`, `curl` |\n| MCP tools | None |\n| Related skills | `azure-kubernetes` (cluster setup), `azure-diagnostics` (troubleshooting) |\n\n## When to Use This Skill\n\nUse this skill when the user wants to:\n- Set up AI Runway on an existing AKS cluster from scratch\n- Install the AI Runway controller and CRDs\n- Assess GPU hardware compatibility for model deployment\n- Choose and install an inference provider (KAITO, Dynamo, KubeRay)\n- Deploy their first AI model to AKS via AI Runway\n- Resume a partially-complete AI Runway setup from a specific step\n\n## MCP Tools\n\nThis skill uses no MCP tools. All cluster operations are performed directly via `kubectl` and `make`.\n\n## Rules\n\n1. Execute steps in sequence — load the reference for each step as you reach it\n2. Report cluster state at each step: ✓ healthy, ✗ missing/failed\n3. Ask for user confirmation before any install or deployment action\n4. If a step is already complete, report status and skip to the next step\n5. If the user provides `skip-to-step N`, start at step N; assume prior steps are complete\n\n## Steps\n\n| # | Step | Reference |\n|---|------|-----------|\n| 1 | **Cluster Verification** — context check, node inventory, GPU detection | [step-1-verify.md](references/steps/step-1-verify.md) |\n| 2 | **Controller Installation** — CRD + controller deployment | [step-2-controller.md](references/steps/step-2-controller.md) |\n| 3 | **GPU Assessment** — detect GPU models, flag dtype/attention constraints | [step-3-gpu.md](references/steps/step-3-gpu.md) |\n| 4 | **Provider Setup** — recommend and install inference provider | [step-4-provider.md](references/steps/step-4-provider.md) |\n| 5 | **First Deployment** — pick a model, deploy, verify Ready | [step-5-deploy.md](references/steps/step-5-deploy.md) |\n| 6 | **Summary** — recap, smoke test, next steps | [step-6-summary.md](references/steps/step-6-summary.md) |\n\n## Error Handling\n\n| Error / Symptom | Likely Cause | Remediation |\n|-----------------|--------------|-------------|\n| No kubeconfig context | Not connected to a cluster | Run `az aks get-credentials` or equivalent |\n| Controller in CrashLoopBackOff | Config or RBAC issue | `kubectl logs -n airunway-system -l control-plane=controller-manager --previous` |\n| Provider not ready | Image pull or RBAC issue | `kubectl logs <pod-name> -n <namespace>` for the provider pod |\n| ModelDeployment stuck in Pending | GPU scheduling failure or provider not ready | `kubectl describe modeldeployment <name> -n <namespace>` events |\n| `bfloat16` errors at inference | T4 or V100 lacks bfloat16 support | Add `--dtype float16` to serving args |\n\nFor full error handling and rollback procedures, see [troubleshooting.md](references/troubleshooting.md).","tags":["airunway","aks","setup","azure","skills","microsoft","agent-skills"],"capabilities":["skill","source-microsoft","skill-airunway-aks-setup","topic-agent-skills"],"categories":["azure-skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/microsoft/azure-skills/airunway-aks-setup","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add microsoft/azure-skills","source_repo":"https://github.com/microsoft/azure-skills","install_from":"skills.sh"}},"qualityScore":"0.950","qualityRationale":"deterministic score 0.95 from registry signals: · indexed on github topic:agent-skills · official publisher · 658 github stars · SKILL.md body (3,508 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-04-22T06:53:17.779Z","embedding":null,"createdAt":"2026-04-21T18:40:35.881Z","updatedAt":"2026-04-22T06:53:17.779Z","lastSeenAt":"2026-04-22T06:53:17.779Z","tsv":"'/hr':101 '1':278,350 '2':293,361 '3':99,302,369 '4':313,380 '5':100,328,390 '6':401 '80gb':96 'a100':95 'a100-80gb':94 'accept':152 'action':312 'add':495 'ai':7,31,37,46,62,166,205,216,240,245,252 'airunway':2,39,444 'airunway-aks-setup':1 'airunway-system':443 'ak':3,10,34,44,48,117,170,210,243,427 'alreadi':119,318 'arg':500 'ask':303 'assess':23,221,371 'assum':115,342 'awar':86 'az':426 'azur':134,182,187 'azure-diagnost':186 'azure-kubernet':133,181 'bare':12,56 'best':160 'bfloat16':485,493 'caus':415 'charg':93 'check':354 'choos':228 'cli':171 'cluster':13,18,35,58,118,128,184,211,268,295,351,424 'compat':224 'complet':251,319,346 'comput':92 'config':436 'confirm':102,306 'connect':421 'constraint':377 'context':353,419 'control':20,218,362,365,433,448,451 'control-plan':447 'controller-manag':450 'cost':85,98,106 'cover':17 'cpu':148 'cpu-on':147 'crashloopbackoff':435 'crd':364 'crds':220 'credenti':430 'curl':175 'deploy':28,41,64,227,237,311,366,392,396 'describ':481 'detect':358,372 'diagnost':188 'direct':272 'dtype':496 'dtype/attention':376 'dynamo':235 'end':163,165 'end-to-end':162 'equival':432 'error':410,412,486,503 'event':484 'execut':279 'exist':120,209 'failur':475 'first':27,137,239,391 'flag':375 'float16':497 'follow':65 'full':502 'get':429 'get-credenti':428 'gpu':22,45,87,110,143,222,357,370,373,473 'hand':129 'handl':411,504 'hardwar':223 'healthi':300 'imag':457 'implic':107 'incur':90 'infer':150,232,386,488 'instal':21,36,214,230,309,363,385 'inventori':356 'issu':439,461 'kaito':234 'kubeconfig':418 'kubectl':173,274,440,462,480 'kuberay':236 'kubernet':57,135,183 'l':446 'lack':492 'like':414 'load':283 'log':441,463 'make':174,276 'manag':452 'mcp':176,259,265 'missing/failed':301 'model':16,42,63,226,241,374,395 'modeldeploy':469,482 'n':78,337,341,442,464,483 'next':326,406 'node':88,144,355 'none':178 'onboard':33,168 'one':140 'oper':269 'partial':250 'partially-complet':249 'pend':472 'perform':271 'phase':84 'pick':393 'plane':449 'pod':468 'pool':89,145 'prerequisit':112 'previous':453 'prior':343 'procedur':507 'properti':158 'provid':24,73,233,332,381,387,454,467,477 'provis':109,139 'pull':458 'quick':156 'rbac':438,460 'reach':291 'readi':398,456,479 'recap':403 'recommend':383 'refer':157,285,349 'references/steps/step-1-verify.md':360 'references/steps/step-2-controller.md':368 'references/steps/step-3-gpu.md':379 'references/steps/step-4-provider.md':389 'references/steps/step-5-deploy.md':400 'references/steps/step-6-summary.md':409 'references/troubleshooting.md':510 'relat':179 'remedi':416 'report':294,320 'resourc':111 'resum':80,247 'return':154 'rollback':506 'rule':277 'run':15,61,425 'runway':8,32,38,47,167,206,217,246,253 'schedul':474 'scratch':213 'see':508 'sequenc':69,282 'serv':499 'set':5,203 'setup':4,25,30,40,49,185,254,382 'signific':91 'skill':51,114,136,180,194,197,262 'skill-airunway-aks-setup' 'skip':75,323,334 'skip-to-step':74,333 'smoke':404 'source-microsoft' 'specif':83,257 'start':338 'state':296 'status':321 'step':67,77,258,280,288,299,316,327,336,340,344,347,348,407 'step-1-verify.md':359 'step-2-controller.md':367 'step-3-gpu.md':378 'step-4-provider.md':388 'step-5-deploy.md':399 'step-6-summary.md':408 'stuck':470 'summari':402 'support':494 'symptom':413 'system':445 't4':489 'test':405 'tool':172,177,260,266 'topic-agent-skills' 'troubleshoot':189 'troubleshooting.md':509 'understand':105 'unless':70,146 'use':192,195,263 'user':53,72,104,123,200,305,331 'v100':491 'valu':159 'verif':19,352 'verifi':397 'via':244,273 'walk':52 'want':201","prices":[{"id":"3de4122f-d9d8-453d-b167-b2ac4ac0f05f","listingId":"3c7cdd73-f8b1-4442-a909-7ffb0a87e52c","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"microsoft","category":"azure-skills","install_from":"skills.sh"},"createdAt":"2026-04-21T18:40:35.881Z"}],"sources":[{"listingId":"3c7cdd73-f8b1-4442-a909-7ffb0a87e52c","source":"github","sourceId":"microsoft/azure-skills/airunway-aks-setup","sourceUrl":"https://github.com/microsoft/azure-skills/tree/main/skills/airunway-aks-setup","isPrimary":false,"firstSeenAt":"2026-04-21T18:53:19.280Z","lastSeenAt":"2026-04-22T06:53:17.779Z"},{"listingId":"3c7cdd73-f8b1-4442-a909-7ffb0a87e52c","source":"skills_sh","sourceId":"microsoft/azure-skills/airunway-aks-setup","sourceUrl":"https://skills.sh/microsoft/azure-skills/airunway-aks-setup","isPrimary":true,"firstSeenAt":"2026-04-21T18:40:35.881Z","lastSeenAt":"2026-04-22T06:40:20.597Z"}],"details":{"listingId":"3c7cdd73-f8b1-4442-a909-7ffb0a87e52c","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"microsoft","slug":"airunway-aks-setup","github":{"repo":"microsoft/azure-skills","stars":658,"topics":["agent-skills"],"license":"mit","html_url":"https://github.com/microsoft/azure-skills","pushed_at":"2026-04-21T21:13:52Z","description":"Official agent plugin providing skills and MCP server configurations for Azure scenarios.","skill_md_sha":"d13d7cd3b7d62e3627eff3cc184db53866994ea4","skill_md_path":"skills/airunway-aks-setup/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/microsoft/azure-skills/tree/main/skills/airunway-aks-setup"},"layout":"multi","source":"github","category":"azure-skills","frontmatter":{"name":"airunway-aks-setup","license":"MIT","description":"Set up AI Runway on AKS — from bare cluster to running model. Covers cluster verification, controller install, GPU assessment, provider setup, and first deployment. WHEN: \"setup AI Runway\", \"onboard AKS cluster\", \"install AI Runway\", \"airunway setup\", \"deploy model to AKS\", \"GPU inference on AKS\", \"KAITO setup on AKS\", \"run LLM on AKS\", \"vLLM on AKS\", \"set up model serving on AKS\", \"AI Runway controller\"."},"skills_sh_url":"https://skills.sh/microsoft/azure-skills/airunway-aks-setup"},"updatedAt":"2026-04-22T06:53:17.779Z"}}