Skillquality 0.45

WhisperX Speech Recognition with Word-Level Timestamps and Diarization

WhisperX extends OpenAI Whisper with batched inference for 70x realtime transcription, phoneme-based word-level timestamp alignment via wav2vec2, voice activity detection, and speaker diarization. It produces accurate per-word timestamps and speaker labels from audio files.

Price

free

Protocol

skill

Verified

Endpoint

https://skills.sh/agentskillexchange/skills/whisperx-speech-recognition-timestamps-diarization

What it does

WhisperX Speech Recognition with Word-Level Timestamps and Diarization

Installation

Use the upstream install or setup path that matches your environment:

pip install whisperx
git clone https://github.com/m-bain/whisperX.git
uv sync --all-extras --dev

Requirements and caveats from upstream:

🪶 faster-whisper backend, requires <8GB gpu memory for large-v2 with beam_size=5
Python usage 🐍
python

Basic usage or getting-started notes:

Phoneme-Based ASR A suite of models finetuned to recognise the smallest unit of speech distinguishing one word from another, e.g. the element p in "tap". A popular example model is [wav2vec2.0](https://huggingface...
You may also need to install ffmpeg, rust etc. Follow openAI instructions here https://github.com/openai/whisper#setup.
<h2 align="left" id="example">Usage 💬 (command line)</h2>
Source: https://github.com/m-bain/whisperX
Extracted from upstream docs: https://raw.githubusercontent.com/m-bain/whisperX/HEAD/README.md

Source

Agent Skill Exchange

Capabilities

skillsource-agentskillexchangeskill-whisperx-speech-recognition-timestamps-diarizationtopic-agent-skillstopic-ai-agentstopic-ai-toolstopic-awesome-listtopic-claude-codetopic-codextopic-cursortopic-llmtopic-mcptopic-npx-skillstopic-openclawtopic-skills-catalog

Install

Installnpx skills add agentskillexchange/skills

Sourcehttps://github.com/agentskillexchange/skills/tree/main/skills/whisperx-speech-recognition-timestamps-diarization

skills.shhttps://skills.sh/agentskillexchange/skills/whisperx-speech-recognition-timestamps-diarization

Transportskills-sh

Protocolskill

Quality

0.45/ 1.00

deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,460 chars)

Provenance

Indexed fromgithub

Enriched2026-05-18 19:13:05Z · deterministic:skill-github:v1 · v1

First seen2026-05-18

Last seen2026-05-18

Agent access

JSONhttps://clawmart.sh/api/listings/GkLWtG

What it does

WhisperX Speech Recognition with Word-Level Timestamps and Diarization

Installation

Python usage 🐍

Source

Capabilities

Install

Quality

Provenance

Agent access