{"id":"9e9ee4da-5b9c-4271-8742-5bf2344d3022","shortId":"jazmHT","kind":"skill","title":"llama.cpp Portable LLM Inference Engine in C/C++","tagline":"llama.cpp is a high-performance C/C++ implementation for running LLM inference across diverse hardware. It supports GGUF model quantization, GPU acceleration on NVIDIA/AMD/Apple Silicon, and provides both a CLI and an OpenAI-compatible HTTP server for local model serving.","description":"# llama.cpp Portable LLM Inference Engine in C/C++\n\nllama.cpp is a high-performance C/C++ implementation for running LLM inference across diverse hardware. It supports GGUF model quantization, GPU acceleration on NVIDIA/AMD/Apple Silicon, and provides both a CLI and an OpenAI-compatible HTTP server for local model serving.\n\n## Installation\n\nUse the upstream install or setup path that matches your environment:\n- Run with Docker - see our [Docker documentation](docs/docker.md)\n\nRequirements and caveats from upstream:\n- Python: [ddh0/easy-llama](https://github.com/ddh0/easy-llama)\n- Python: [abetlen/llama-cpp-python](https://github.com/abetlen/llama-cpp-python)\n- Node.js: [withcatai/node-llama-cpp](https://github.com/withcatai/node-llama-cpp)\n\nBasic usage or getting-started notes:\n- Install llama.cpp using [brew, nix or winget](docs/install.md)\n- Download pre-built binaries from the [releases page](https://github.com/ggml-org/llama.cpp/releases)\n- Build from source by cloning this repository - check out [our build guide](docs/build.md)\n\n- Source: https://github.com/ggml-org/llama.cpp\n- Extracted from upstream docs: https://raw.githubusercontent.com/ggml-org/llama.cpp/HEAD/README.md\n\n## Source\n\n- [Agent Skill Exchange](https://agentskillexchange.com/skills/llama-cpp-portable-llm-inference/)","tags":["llama","cpp","portable","llm","inference","skills","agentskillexchange","agent-skills","ai-agents","ai-tools","awesome-list","claude-code"],"capabilities":["skill","source-agentskillexchange","skill-llama-cpp-portable-llm-inference","topic-agent-skills","topic-ai-agents","topic-ai-tools","topic-awesome-list","topic-claude-code","topic-codex","topic-cursor","topic-llm","topic-mcp","topic-npx-skills","topic-openclaw","topic-skills-catalog"],"categories":["skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/agentskillexchange/skills/llama-cpp-portable-llm-inference","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add agentskillexchange/skills","source_repo":"https://github.com/agentskillexchange/skills","install_from":"skills.sh"}},"qualityScore":"0.454","qualityRationale":"deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,307 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T19:11:12.193Z","embedding":null,"createdAt":"2026-05-18T13:17:34.738Z","updatedAt":"2026-05-18T19:11:12.193Z","lastSeenAt":"2026-05-18T19:11:12.193Z","tsv":"'/abetlen/llama-cpp-python)':131 '/ddh0/easy-llama)':126 '/ggml-org/llama.cpp':180 '/ggml-org/llama.cpp/head/readme.md':187 '/ggml-org/llama.cpp/releases)':163 '/skills/llama-cpp-portable-llm-inference/)':194 '/withcatai/node-llama-cpp)':136 'abetlen/llama-cpp-python':128 'acceler':29,77 'across':20,68 'agent':189 'agentskillexchange.com':193 'agentskillexchange.com/skills/llama-cpp-portable-llm-inference/)':192 'basic':137 'binari':156 'brew':147 'build':164,174 'built':155 'c/c':7,14,55,62 'caveat':119 'check':171 'cli':37,85 'clone':168 'compat':42,90 'ddh0/easy-llama':123 'divers':21,69 'doc':184 'docker':111,114 'docs/build.md':176 'docs/docker.md':116 'docs/install.md':151 'document':115 'download':152 'engin':5,53 'environ':108 'exchang':191 'extract':181 'get':141 'getting-start':140 'gguf':25,73 'github.com':125,130,135,162,179 'github.com/abetlen/llama-cpp-python)':129 'github.com/ddh0/easy-llama)':124 'github.com/ggml-org/llama.cpp':178 'github.com/ggml-org/llama.cpp/releases)':161 'github.com/withcatai/node-llama-cpp)':134 'gpu':28,76 'guid':175 'hardwar':22,70 'high':12,60 'high-perform':11,59 'http':43,91 'implement':15,63 'infer':4,19,52,67 'instal':97,101,144 'llama.cpp':1,8,49,56,145 'llm':3,18,51,66 'local':46,94 'match':106 'model':26,47,74,95 'nix':148 'node.js':132 'note':143 'nvidia/amd/apple':31,79 'openai':41,89 'openai-compat':40,88 'page':160 'path':104 'perform':13,61 'portabl':2,50 'pre':154 'pre-built':153 'provid':34,82 'python':122,127 'quantiz':27,75 'raw.githubusercontent.com':186 'raw.githubusercontent.com/ggml-org/llama.cpp/head/readme.md':185 'releas':159 'repositori':170 'requir':117 'run':17,65,109 'see':112 'serv':48,96 'server':44,92 'setup':103 'silicon':32,80 'skill':190 'skill-llama-cpp-portable-llm-inference' 'sourc':166,177,188 'source-agentskillexchange' 'start':142 'support':24,72 'topic-agent-skills' 'topic-ai-agents' 'topic-ai-tools' 'topic-awesome-list' 'topic-claude-code' 'topic-codex' 'topic-cursor' 'topic-llm' 'topic-mcp' 'topic-npx-skills' 'topic-openclaw' 'topic-skills-catalog' 'upstream':100,121,183 'usag':138 'use':98,146 'winget':150 'withcatai/node-llama-cpp':133","prices":[{"id":"f7825cba-0c1b-4709-8477-1c7b54cdcc8c","listingId":"9e9ee4da-5b9c-4271-8742-5bf2344d3022","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"agentskillexchange","category":"skills","install_from":"skills.sh"},"createdAt":"2026-05-18T13:17:34.738Z"}],"sources":[{"listingId":"9e9ee4da-5b9c-4271-8742-5bf2344d3022","source":"github","sourceId":"agentskillexchange/skills/llama-cpp-portable-llm-inference","sourceUrl":"https://github.com/agentskillexchange/skills/tree/main/skills/llama-cpp-portable-llm-inference","isPrimary":false,"firstSeenAt":"2026-05-18T13:17:34.738Z","lastSeenAt":"2026-05-18T19:11:12.193Z"}],"details":{"listingId":"9e9ee4da-5b9c-4271-8742-5bf2344d3022","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"agentskillexchange","slug":"llama-cpp-portable-llm-inference","github":{"repo":"agentskillexchange/skills","stars":8,"topics":["agent-skills","ai-agents","ai-tools","awesome-list","claude-code","codex","cursor","llm","mcp","npx-skills","openclaw","skills-catalog"],"license":"mit","html_url":"https://github.com/agentskillexchange/skills","pushed_at":"2026-05-18T19:02:17Z","description":"The open catalog of AI agent skills — 2,000+ security-scanned skills for Claude Code, Cursor, Codex, and more.","skill_md_sha":"bd13632c7f78b218ccb7715f2f5d84b09c4f7e58","skill_md_path":"skills/llama-cpp-portable-llm-inference/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/agentskillexchange/skills/tree/main/skills/llama-cpp-portable-llm-inference"},"layout":"multi","source":"github","category":"skills","frontmatter":{"name":"llama.cpp Portable LLM Inference Engine in C/C++","description":"llama.cpp is a high-performance C/C++ implementation for running LLM inference across diverse hardware. It supports GGUF model quantization, GPU acceleration on NVIDIA/AMD/Apple Silicon, and provides both a CLI and an OpenAI-compatible HTTP server for local model serving."},"skills_sh_url":"https://skills.sh/agentskillexchange/skills/llama-cpp-portable-llm-inference"},"updatedAt":"2026-05-18T19:11:12.193Z"}}