faster-whisper High-Performance Speech Transcription Engine
faster-whisper is a reimplementation of OpenAI's Whisper model using CTranslate2 that delivers up to 4x faster transcription with lower memory usage. It supports CPU and GPU inference with 8-bit quantization, batch processing, word-level timestamps, and VAD filtering for accurate
What it does
faster-whisper High-Performance Speech Transcription Engine
faster-whisper is a reimplementation of OpenAI's Whisper model using CTranslate2 that delivers up to 4x faster transcription with lower memory usage. It supports CPU and GPU inference with 8-bit quantization, batch processing, word-level timestamps, and VAD filtering for accurate speech-to-text conversion.
Installation
Use the upstream install or setup path that matches your environment:
-
Use Docker
- pip install nvidia-cublas-cu12 nvidia-cudnn-cu12==9.*
- pip install faster-whisper
- pip install --force-reinstall "faster-whisper @ https://github.com/SYSTRAN/faster-whisper/archive/refs/heads/master.tar.gz"
Requirements and caveats from upstream:
- Python 3.9 or greater
- Unlike openai-whisper, FFmpeg does not need to be installed on the system. The audio is decoded with the Python library PyAV which bundles the FFmpeg libraries in its package.
- GPU execution requires the following NVIDIA libraries to be installed:
Basic usage or getting-started notes:
-
For reference, here's the time and memory usage that are required to transcribe 13 minutes of audio using different implementations:
-
| Implementation | Precision | Beam size | Time | VRAM Usage |
-
| Implementation | Precision | Beam size | Time | RAM Usage |
-
Extracted from upstream docs: https://raw.githubusercontent.com/SYSTRAN/faster-whisper/HEAD/README.md
Source
Capabilities
Install
Quality
deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,684 chars)