Skillquality 0.45

openai-whisper

Speech-to-text transcription via OpenAI Whisper. Supports two modes — Local CLI (no API key, runs on-device) and Cloud API (fast, scalable, requires OPENAI_API_KEY). Use when the user needs to transcribe audio files, translate speech, or convert audio to text.

Price
free
Protocol
skill
Verified
no

What it does

OpenAI Whisper — Speech-to-Text

Transcribe audio files using OpenAI's Whisper model. Two modes available depending on your needs:

ModeLatencyCostPrivacySetup
Local CLISlower (on-device GPU/CPU)FreeAudio never leaves machineInstall whisper binary
Cloud APIFastPer-minute pricingAudio sent to OpenAIOPENAI_API_KEY required

Mode 1: Local CLI

Run Whisper locally with no API key required. Models download to ~/.cache/whisper on first run.

Quick Start

whisper /path/audio.mp3 --model medium --output_format txt --output_dir .

Common Commands

# Transcribe to text file
whisper /path/audio.mp3 --model medium --output_format txt --output_dir .

# Transcribe with translation to English
whisper /path/audio.m4a --task translate --output_format srt

# Transcribe with specific language
whisper /path/audio.wav --model large --language en --output_format json

Model Selection

ModelSpeedAccuracyVRAM
tinyFastestLowest~1 GB
baseFastLow~1 GB
smallMediumGood~2 GB
mediumSlowBetter~5 GB
largeSlowestBest~10 GB
turboFastGood (default)~6 GB

Output Formats

  • txt — Plain text transcript
  • srt — SubRip subtitle format with timestamps
  • vtt — WebVTT subtitle format
  • json — Detailed JSON with word-level timestamps
  • tsv — Tab-separated values

Notes

  • --model defaults to turbo on most installs
  • Use smaller models for speed, larger for accuracy
  • GPU acceleration used automatically when available

Mode 2: Cloud API

Transcribe via OpenAI's /v1/audio/transcriptions endpoint. Faster for large batches, no local GPU needed.

Quick Start

{baseDir}/scripts/transcribe.sh /path/to/audio.m4a

Defaults:

  • Model: whisper-1
  • Output: <input>.txt

Common Commands

# Basic transcription
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a

# Specify model and output
{baseDir}/scripts/transcribe.sh /path/to/audio.ogg --model whisper-1 --out /tmp/transcript.txt

# With language hint
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --language en

# With speaker name hints (improves accuracy)
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --prompt "Speaker names: Peter, Daniel"

# JSON output with timestamps
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --json --out /tmp/transcript.json

Raw curl Example

curl https://api.openai.com/v1/audio/transcriptions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F file="@/path/to/audio.m4a" \
  -F model="whisper-1" \
  -F response_format="text"

API Key Setup

Set OPENAI_API_KEY environment variable, or configure in ~/.clawdbot/clawdbot.json:

{
  skills: {
    "openai-whisper-api": {
      apiKey: "OPENAI_KEY_HERE"
    }
  }
}

Choosing Between Modes

ConsiderationLocal CLICloud API
Privacy-sensitive audioBestAudio sent to OpenAI
Large batch processingSlow without GPUFast and parallel
Offline usageWorks offlineRequires internet
CostFree (hardware cost)Per-minute pricing
Setup complexityInstall binary + modelsAPI key only
Audio format supportMost formatsMost formats

Capabilities

skillsource-rkz91skill-openai-whispertopic-agent-skillstopic-agents-mdtopic-ai-agentstopic-claude-codetopic-codextopic-cursortopic-developer-toolstopic-llm-toolstopic-mcptopic-pm-toolstopic-product-managementtopic-productivity

Install

Installnpx skills add rkz91/coco
Transportskills-sh
Protocolskill

Quality

0.45/ 1.00

deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 7 github stars · SKILL.md body (3,483 chars)

Provenance

Indexed fromgithub
Enriched2026-05-18 19:14:08Z · deterministic:skill-github:v1 · v1
First seen2026-05-18
Last seen2026-05-18

Agent access