{"id":"b516658d-7aa7-4ab6-9afc-bf0311447238","shortId":"DBnXJa","kind":"skill","title":"openai-whisper","tagline":"Speech-to-text transcription via OpenAI Whisper. Supports two modes — Local CLI (no API key, runs on-device) and Cloud API (fast, scalable, requires OPENAI_API_KEY). Use when the user needs to transcribe audio files, translate speech, or convert audio to text.","description":"# OpenAI Whisper — Speech-to-Text\n\nTranscribe audio files using OpenAI's Whisper model. Two modes available depending on your needs:\n\n| Mode | Latency | Cost | Privacy | Setup |\n|------|---------|------|---------|-------|\n| Local CLI | Slower (on-device GPU/CPU) | Free | Audio never leaves machine | Install `whisper` binary |\n| Cloud API | Fast | Per-minute pricing | Audio sent to OpenAI | `OPENAI_API_KEY` required |\n\n---\n\n## Mode 1: Local CLI\n\nRun Whisper locally with no API key required. Models download to `~/.cache/whisper` on first run.\n\n### Quick Start\n\n```bash\nwhisper /path/audio.mp3 --model medium --output_format txt --output_dir .\n```\n\n### Common Commands\n\n```bash\n# Transcribe to text file\nwhisper /path/audio.mp3 --model medium --output_format txt --output_dir .\n\n# Transcribe with translation to English\nwhisper /path/audio.m4a --task translate --output_format srt\n\n# Transcribe with specific language\nwhisper /path/audio.wav --model large --language en --output_format json\n```\n\n### Model Selection\n\n| Model | Speed | Accuracy | VRAM |\n|-------|-------|----------|------|\n| `tiny` | Fastest | Lowest | ~1 GB |\n| `base` | Fast | Low | ~1 GB |\n| `small` | Medium | Good | ~2 GB |\n| `medium` | Slow | Better | ~5 GB |\n| `large` | Slowest | Best | ~10 GB |\n| `turbo` | Fast | Good (default) | ~6 GB |\n\n### Output Formats\n\n- `txt` — Plain text transcript\n- `srt` — SubRip subtitle format with timestamps\n- `vtt` — WebVTT subtitle format\n- `json` — Detailed JSON with word-level timestamps\n- `tsv` — Tab-separated values\n\n### Notes\n\n- `--model` defaults to `turbo` on most installs\n- Use smaller models for speed, larger for accuracy\n- GPU acceleration used automatically when available\n\n---\n\n## Mode 2: Cloud API\n\nTranscribe via OpenAI's `/v1/audio/transcriptions` endpoint. Faster for large batches, no local GPU needed.\n\n### Quick Start\n\n```bash\n{baseDir}/scripts/transcribe.sh /path/to/audio.m4a\n```\n\nDefaults:\n- Model: `whisper-1`\n- Output: `<input>.txt`\n\n### Common Commands\n\n```bash\n# Basic transcription\n{baseDir}/scripts/transcribe.sh /path/to/audio.m4a\n\n# Specify model and output\n{baseDir}/scripts/transcribe.sh /path/to/audio.ogg --model whisper-1 --out /tmp/transcript.txt\n\n# With language hint\n{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --language en\n\n# With speaker name hints (improves accuracy)\n{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --prompt \"Speaker names: Peter, Daniel\"\n\n# JSON output with timestamps\n{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --json --out /tmp/transcript.json\n```\n\n### Raw curl Example\n\n```bash\ncurl https://api.openai.com/v1/audio/transcriptions \\\n  -H \"Authorization: Bearer $OPENAI_API_KEY\" \\\n  -H \"Content-Type: multipart/form-data\" \\\n  -F file=\"@/path/to/audio.m4a\" \\\n  -F model=\"whisper-1\" \\\n  -F response_format=\"text\"\n```\n\n### API Key Setup\n\nSet `OPENAI_API_KEY` environment variable, or configure in `~/.clawdbot/clawdbot.json`:\n\n```json5\n{\n  skills: {\n    \"openai-whisper-api\": {\n      apiKey: \"OPENAI_KEY_HERE\"\n    }\n  }\n}\n```\n\n---\n\n## Choosing Between Modes\n\n| Consideration | Local CLI | Cloud API |\n|---------------|-----------|-----------|\n| Privacy-sensitive audio | Best | Audio sent to OpenAI |\n| Large batch processing | Slow without GPU | Fast and parallel |\n| Offline usage | Works offline | Requires internet |\n| Cost | Free (hardware cost) | Per-minute pricing |\n| Setup complexity | Install binary + models | API key only |\n| Audio format support | Most formats | Most formats |","tags":["openai","whisper","coco","rkz91","agent-skills","agents-md","ai-agents","claude-code","codex","cursor","developer-tools","llm-tools"],"capabilities":["skill","source-rkz91","skill-openai-whisper","topic-agent-skills","topic-agents-md","topic-ai-agents","topic-claude-code","topic-codex","topic-cursor","topic-developer-tools","topic-llm-tools","topic-mcp","topic-pm-tools","topic-product-management","topic-productivity"],"categories":["coco"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/rkz91/coco/openai-whisper","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add rkz91/coco","source_repo":"https://github.com/rkz91/coco","install_from":"skills.sh"}},"qualityScore":"0.453","qualityRationale":"deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 7 github stars · SKILL.md body (3,483 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T19:14:08.145Z","embedding":null,"createdAt":"2026-05-18T13:21:40.925Z","updatedAt":"2026-05-18T19:14:08.145Z","lastSeenAt":"2026-05-18T19:14:08.145Z","tsv":"'-1':292,312,372 '/.cache/whisper':120 '/.clawdbot/clawdbot.json':389 '/path/audio.m4a':158 '/path/audio.mp3':128,144 '/path/audio.wav':169 '/path/to/audio.m4a':288,302,320,331,343,368 '/path/to/audio.ogg':309 '/scripts/transcribe.sh':287,301,308,319,330,342 '/tmp/transcript.json':346 '/tmp/transcript.txt':314 '/v1/audio/transcriptions':273,354 '1':106,186,191 '10':206 '2':196,266 '5':201 '6':212 'acceler':260 'accuraci':181,258,328 'api':18,26,31,91,102,114,268,359,377,382,395,407,445 'api.openai.com':353 'api.openai.com/v1/audio/transcriptions':352 'apikey':396 'audio':40,46,56,83,97,411,413,448 'author':356 'automat':262 'avail':65,264 'base':188 'basedir':286,300,307,318,329,341 'bash':126,138,285,297,350 'basic':298 'batch':278,418 'bearer':357 'best':205,412 'better':200 'binari':89,443 'choos':400 'cli':16,76,108,405 'cloud':25,90,267,406 'command':137,296 'common':136,295 'complex':441 'configur':387 'consider':403 'content':363 'content-typ':362 'convert':45 'cost':72,432,435 'curl':348,351 'daniel':336 'default':211,245,289 'depend':66 'detail':231 'devic':23,80 'dir':135,151 'download':118 'en':173,322 'endpoint':274 'english':156 'environ':384 'exampl':349 'f':366,369,373 'fast':27,92,189,209,423 'faster':275 'fastest':184 'file':41,57,142,367 'first':122 'format':132,148,162,175,215,223,229,375,449,452,454 'free':82,433 'gb':187,192,197,202,207,213 'good':195,210 'gpu':259,281,422 'gpu/cpu':81 'h':355,361 'hardwar':434 'hint':317,326 'improv':327 'instal':87,250,442 'internet':431 'json':176,230,232,337,344 'json5':390 'key':19,32,103,115,360,378,383,398,446 'languag':167,172,316,321 'larg':171,203,277,417 'larger':256 'latenc':71 'leav':85 'level':236 'local':15,75,107,111,280,404 'low':190 'lowest':185 'machin':86 'medium':130,146,194,198 'minut':95,438 'mode':14,64,70,105,265,402 'model':62,117,129,145,170,177,179,244,253,290,304,310,370,444 'multipart/form-data':365 'name':325,334 'need':37,69,282 'never':84 'note':243 'offlin':426,429 'on-devic':21,78 'openai':2,10,30,49,59,100,101,271,358,381,393,397,416 'openai-whisp':1 'openai-whisper-api':392 'output':131,134,147,150,161,174,214,293,306,338 'parallel':425 'per':94,437 'per-minut':93,436 'peter':335 'plain':217 'price':96,439 'privaci':73,409 'privacy-sensit':408 'process':419 'prompt':332 'quick':124,283 'raw':347 'requir':29,104,116,430 'respons':374 'run':20,109,123 'scalabl':28 'select':178 'sensit':410 'sent':98,414 'separ':241 'set':380 'setup':74,379,440 'skill':391 'skill-openai-whisper' 'slow':199,420 'slower':77 'slowest':204 'small':193 'smaller':252 'source-rkz91' 'speaker':324,333 'specif':166 'specifi':303 'speech':5,43,52 'speech-to-text':4,51 'speed':180,255 'srt':163,220 'start':125,284 'subrip':221 'subtitl':222,228 'support':12,450 'tab':240 'tab-separ':239 'task':159 'text':7,48,54,141,218,376 'timestamp':225,237,340 'tini':183 'topic-agent-skills' 'topic-agents-md' 'topic-ai-agents' 'topic-claude-code' 'topic-codex' 'topic-cursor' 'topic-developer-tools' 'topic-llm-tools' 'topic-mcp' 'topic-pm-tools' 'topic-product-management' 'topic-productivity' 'transcrib':39,55,139,152,164,269 'transcript':8,219,299 'translat':42,154,160 'tsv':238 'turbo':208,247 'two':13,63 'txt':133,149,216,294 'type':364 'usag':427 'use':33,58,251,261 'user':36 'valu':242 'variabl':385 'via':9,270 'vram':182 'vtt':226 'webvtt':227 'whisper':3,11,50,61,88,110,127,143,157,168,291,311,371,394 'without':421 'word':235 'word-level':234 'work':428","prices":[{"id":"2a213b3d-ef8b-4c87-a196-6dbbc844dcce","listingId":"b516658d-7aa7-4ab6-9afc-bf0311447238","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"rkz91","category":"coco","install_from":"skills.sh"},"createdAt":"2026-05-18T13:21:40.925Z"}],"sources":[{"listingId":"b516658d-7aa7-4ab6-9afc-bf0311447238","source":"github","sourceId":"rkz91/coco/openai-whisper","sourceUrl":"https://github.com/rkz91/coco/tree/main/skills/openai-whisper","isPrimary":false,"firstSeenAt":"2026-05-18T13:21:40.925Z","lastSeenAt":"2026-05-18T19:14:08.145Z"}],"details":{"listingId":"b516658d-7aa7-4ab6-9afc-bf0311447238","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"rkz91","slug":"openai-whisper","github":{"repo":"rkz91/coco","stars":7,"topics":["agent-skills","agents-md","ai","ai-agents","claude-code","codex","cursor","developer-tools","llm-tools","mcp","pm-tools","product-management","productivity","prompt-engineering","workflow-automation"],"license":"mit","html_url":"https://github.com/rkz91/coco","pushed_at":"2026-04-26T01:51:27Z","description":"Open-source library of AI superpowers — 59 skills, 34 commands, 10 agents + 24 GSD subagents, 3 system bundles. An entire team, wherever your AI lives. Vendor-neutral across Claude Code, Cursor, Codex, and any AGENTS.md tool.","skill_md_sha":"df5510217fed7e312c95689205426beac3ffeea7","skill_md_path":"skills/openai-whisper/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/rkz91/coco/tree/main/skills/openai-whisper"},"layout":"multi","source":"github","category":"coco","frontmatter":{"name":"openai-whisper","description":"Speech-to-text transcription via OpenAI Whisper. Supports two modes — Local CLI (no API key, runs on-device) and Cloud API (fast, scalable, requires OPENAI_API_KEY). Use when the user needs to transcribe audio files, translate speech, or convert audio to text."},"skills_sh_url":"https://skills.sh/rkz91/coco/openai-whisper"},"updatedAt":"2026-05-18T19:14:08.145Z"}}