{"id":"e7479509-4184-4e28-8fac-0795ff6279d2","shortId":"GkLWtG","kind":"skill","title":"WhisperX Speech Recognition with Word-Level Timestamps and Diarization","tagline":"WhisperX extends OpenAI Whisper with batched inference for 70x realtime transcription, phoneme-based word-level timestamp alignment via wav2vec2, voice activity detection, and speaker diarization. It produces accurate per-word timestamps and speaker labels from audio files.","description":"# WhisperX Speech Recognition with Word-Level Timestamps and Diarization\n\nWhisperX extends OpenAI Whisper with batched inference for 70x realtime transcription, phoneme-based word-level timestamp alignment via wav2vec2, voice activity detection, and speaker diarization. It produces accurate per-word timestamps and speaker labels from audio files.\n\n## Installation\n\nUse the upstream install or setup path that matches your environment:\n- pip install whisperx\n- git clone https://github.com/m-bain/whisperX.git\n- uv sync --all-extras --dev\n\nRequirements and caveats from upstream:\n- 🪶 [faster-whisper](https://github.com/guillaumekln/faster-whisper) backend, requires <8GB gpu memory for large-v2 with beam_size=5\n- ## Python usage 🐍\n- python\n\nBasic usage or getting-started notes:\n- **Phoneme-Based ASR** A suite of models finetuned to recognise the smallest unit of speech distinguishing one word from another, e.g. the element p in \"tap\". A popular example model is [wav2vec2.0](https://huggingface...\n- You may also need to install ffmpeg, rust etc. Follow openAI instructions here https://github.com/openai/whisper#setup.\n- <h2 align=\"left\" id=\"example\">Usage 💬 (command line)</h2>\n\n- Source: https://github.com/m-bain/whisperX\n- Extracted from upstream docs: https://raw.githubusercontent.com/m-bain/whisperX/HEAD/README.md\n\n## Source\n\n- [Agent Skill Exchange](https://agentskillexchange.com/skills/whisperx-speech-recognition-timestamps-diarization/)","tags":["whisperx","speech","recognition","timestamps","diarization","skills","agentskillexchange","agent-skills","ai-agents","ai-tools","awesome-list","claude-code"],"capabilities":["skill","source-agentskillexchange","skill-whisperx-speech-recognition-timestamps-diarization","topic-agent-skills","topic-ai-agents","topic-ai-tools","topic-awesome-list","topic-claude-code","topic-codex","topic-cursor","topic-llm","topic-mcp","topic-npx-skills","topic-openclaw","topic-skills-catalog"],"categories":["skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/agentskillexchange/skills/whisperx-speech-recognition-timestamps-diarization","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add agentskillexchange/skills","source_repo":"https://github.com/agentskillexchange/skills","install_from":"skills.sh"}},"qualityScore":"0.454","qualityRationale":"deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,460 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T19:13:05.892Z","embedding":null,"createdAt":"2026-05-18T13:20:18.550Z","updatedAt":"2026-05-18T19:13:05.892Z","lastSeenAt":"2026-05-18T19:13:05.892Z","tsv":"'/guillaumekln/faster-whisper)':137 '/m-bain/whisperx':217 '/m-bain/whisperx.git':120 '/m-bain/whisperx/head/readme.md':224 '/openai/whisper#setup.':210 '/skills/whisperx-speech-recognition-timestamps-diarization/)':231 '5':150 '70x':19,69 '8gb':140 'accur':40,90 'activ':33,83 'agent':226 'agentskillexchange.com':230 'agentskillexchange.com/skills/whisperx-speech-recognition-timestamps-diarization/)':229 'align':29,79 'all-extra':123 'also':197 'anoth':181 'asr':164 'audio':49,99 'backend':138 'base':24,74,163 'basic':154 'batch':16,66 'beam':148 'caveat':129 'clone':117 'command':212 'detect':34,84 'dev':126 'diariz':10,37,60,87 'distinguish':177 'doc':221 'e.g':182 'element':184 'environ':112 'etc':203 'exampl':190 'exchang':228 'extend':12,62 'extra':125 'extract':218 'faster':133 'faster-whisp':132 'ffmpeg':201 'file':50,100 'finetun':169 'follow':204 'get':158 'getting-start':157 'git':116 'github.com':119,136,209,216 'github.com/guillaumekln/faster-whisper)':135 'github.com/m-bain/whisperx':215 'github.com/m-bain/whisperx.git':118 'github.com/openai/whisper#setup.':208 'gpu':141 'huggingfac':194 'infer':17,67 'instal':101,105,114,200 'instruct':206 'label':47,97 'larg':145 'large-v2':144 'level':7,27,57,77 'line':213 'match':110 'may':196 'memori':142 'model':168,191 'need':198 'note':160 'one':178 'openai':13,63,205 'p':185 'path':108 'per':42,92 'per-word':41,91 'phonem':23,73,162 'phoneme-bas':22,72,161 'pip':113 'popular':189 'produc':39,89 'python':151,153 'raw.githubusercontent.com':223 'raw.githubusercontent.com/m-bain/whisperx/head/readme.md':222 'realtim':20,70 'recognis':171 'recognit':3,53 'requir':127,139 'rust':202 'setup':107 'size':149 'skill':227 'skill-whisperx-speech-recognition-timestamps-diarization' 'smallest':173 'sourc':214,225 'source-agentskillexchange' 'speaker':36,46,86,96 'speech':2,52,176 'start':159 'suit':166 'sync':122 'tap':187 'timestamp':8,28,44,58,78,94 'topic-agent-skills' 'topic-ai-agents' 'topic-ai-tools' 'topic-awesome-list' 'topic-claude-code' 'topic-codex' 'topic-cursor' 'topic-llm' 'topic-mcp' 'topic-npx-skills' 'topic-openclaw' 'topic-skills-catalog' 'transcript':21,71 'unit':174 'upstream':104,131,220 'usag':152,155,211 'use':102 'uv':121 'v2':146 'via':30,80 'voic':32,82 'wav2vec2':31,81 'wav2vec2.0':193 'whisper':14,64,134 'whisperx':1,11,51,61,115 'word':6,26,43,56,76,93,179 'word-level':5,25,55,75","prices":[{"id":"7104363d-16d0-4ee0-b223-66954833d821","listingId":"e7479509-4184-4e28-8fac-0795ff6279d2","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"agentskillexchange","category":"skills","install_from":"skills.sh"},"createdAt":"2026-05-18T13:20:18.550Z"}],"sources":[{"listingId":"e7479509-4184-4e28-8fac-0795ff6279d2","source":"github","sourceId":"agentskillexchange/skills/whisperx-speech-recognition-timestamps-diarization","sourceUrl":"https://github.com/agentskillexchange/skills/tree/main/skills/whisperx-speech-recognition-timestamps-diarization","isPrimary":false,"firstSeenAt":"2026-05-18T13:20:18.550Z","lastSeenAt":"2026-05-18T19:13:05.892Z"}],"details":{"listingId":"e7479509-4184-4e28-8fac-0795ff6279d2","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"agentskillexchange","slug":"whisperx-speech-recognition-timestamps-diarization","github":{"repo":"agentskillexchange/skills","stars":8,"topics":["agent-skills","ai-agents","ai-tools","awesome-list","claude-code","codex","cursor","llm","mcp","npx-skills","openclaw","skills-catalog"],"license":"mit","html_url":"https://github.com/agentskillexchange/skills","pushed_at":"2026-05-18T19:02:17Z","description":"The open catalog of AI agent skills — 2,000+ security-scanned skills for Claude Code, Cursor, Codex, and more.","skill_md_sha":"bbc9d9849b64c71786d4f47a30a144ca30696b70","skill_md_path":"skills/whisperx-speech-recognition-timestamps-diarization/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/agentskillexchange/skills/tree/main/skills/whisperx-speech-recognition-timestamps-diarization"},"layout":"multi","source":"github","category":"skills","frontmatter":{"name":"WhisperX Speech Recognition with Word-Level Timestamps and Diarization","description":"WhisperX extends OpenAI Whisper with batched inference for 70x realtime transcription, phoneme-based word-level timestamp alignment via wav2vec2, voice activity detection, and speaker diarization. It produces accurate per-word timestamps and speaker labels from audio files."},"skills_sh_url":"https://skills.sh/agentskillexchange/skills/whisperx-speech-recognition-timestamps-diarization"},"updatedAt":"2026-05-18T19:13:05.892Z"}}