Force-align narration and transcript text into subtitle or SMIL timing maps
Use aeneas when an agent already has audio and text, but still needs timing. The workflow aligns spoken narration against fragments of plain text or XML and emits sync maps that can be turned into subtitles, EPUB 3 media overlays, JSON timing data, or other downstream caption ass
What it does
Force-align narration and transcript text into subtitle or SMIL timing maps
Use aeneas when an agent already has audio and text, but still needs timing. The workflow aligns spoken narration against fragments of plain text or XML and emits sync maps that can be turned into subtitles, EPUB 3 media overlays, JSON timing data, or other downstream caption assets.
Prerequisites
Python, pip, FFmpeg, and eSpeak
Installation
Use the upstream install or setup path that matches your environment:
- pip install numpy
- pip install aeneas
Requirements and caveats from upstream:
- aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment).
- Quick Links: Home - GitHub - PyPI - Docs - [Tutorial](http:...
- Python 2.7 (Linux, OS X, Windows) or 3.5 or later (Linux, OS X)
Basic usage or getting-started notes:
-
For example, given
-
All-in-one installers are available for Mac OS X and Windows,
-
and a Bash script for deb-based Linux distributions (Debian, Ubuntu)
-
Extracted from upstream docs: https://raw.githubusercontent.com/readbeyond/aeneas/HEAD/README.md
Documentation
Source
Capabilities
Install
Quality
deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,564 chars)