Skillquality 0.45

WhisperX Speech Recognition with Word-Level Timestamps and Diarization

WhisperX extends OpenAI Whisper with batched inference for 70x realtime transcription, phoneme-based word-level timestamp alignment via wav2vec2, voice activity detection, and speaker diarization. It produces accurate per-word timestamps and speaker labels from audio files.

Price
free
Protocol
skill
Verified
no

What it does

WhisperX Speech Recognition with Word-Level Timestamps and Diarization

WhisperX extends OpenAI Whisper with batched inference for 70x realtime transcription, phoneme-based word-level timestamp alignment via wav2vec2, voice activity detection, and speaker diarization. It produces accurate per-word timestamps and speaker labels from audio files.

Installation

Use the upstream install or setup path that matches your environment:

Requirements and caveats from upstream:

  • 🪶 faster-whisper backend, requires <8GB gpu memory for large-v2 with beam_size=5
  • Python usage 🐍

  • python

Basic usage or getting-started notes:

Source

Capabilities

skillsource-agentskillexchangeskill-whisperx-speech-recognition-timestamps-diarizationtopic-agent-skillstopic-ai-agentstopic-ai-toolstopic-awesome-listtopic-claude-codetopic-codextopic-cursortopic-llmtopic-mcptopic-npx-skillstopic-openclawtopic-skills-catalog

Install

Quality

0.45/ 1.00

deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,460 chars)

Provenance

Indexed fromgithub
Enriched2026-05-18 19:13:05Z · deterministic:skill-github:v1 · v1
First seen2026-05-18
Last seen2026-05-18

Agent access