Pay-per-call text-to-speech via xAI Grok models, settled on Tempo L2.
What it does
This endpoint provides text-to-speech synthesis powered by xAI's Grok models, accessible through the Locus MPP (Micropayment Protocol). It converts text up to 15,000 characters into audio, supporting multiple output codecs (MP3, WAV, PCM, mulaw, alaw) and five distinct voice options: eve (energetic), ara (warm), rex (confident), sal (smooth), and leo (authoritative). The service supports inline speech tags such as [pause], [laugh], and <whisper> for expressive control, and accepts BCP-47 language codes including English, Chinese, French, German, Japanese, or automatic detection.
Payment is handled via the MPP charge intent using the Tempo method, settling in pathUSD on Tempo L2. Pricing is approximately $0.005 per 1,000 characters according to the OpenAPI spec. The endpoint is a POST-only route; the probe returned 404 on HEAD and GET, which is expected since TTS requires a POST request body with text and language fields.
This endpoint is part of a broader Grok service suite hosted at grok.mpp.paywithlocus.com that also includes chat completions, web/X search, code execution, and image generation/editing on separate paths. Documentation is referenced at https://docs.x.ai and https://beta.paywithlocus.com/mpp/grok.md.
Capabilities
Use cases
- —Converting article or blog text into spoken audio for podcasts or accessibility
- —Adding voice narration to applications or chatbots
- —Generating multilingual speech output for localization workflows
- —Producing expressive audio with inline speech tags for creative content
- —Building voice interfaces that pay per call without API key management
Fit
Best for
- —Agents needing pay-per-call TTS without subscription commitments
- —Multilingual speech synthesis with language auto-detection
- —Applications requiring multiple distinct voice personas
- —Developers wanting crypto-settled micropayments for audio generation
Not for
- —Real-time streaming speech synthesis (this is a charge-per-call endpoint, not a session)
- —Free or high-volume batch TTS where per-call pricing is uneconomical
- —Use cases requiring voices outside the five provided options
Quick start
curl -X POST https://grok.mpp.paywithlocus.com/grok/tts \
-H "Content-Type: application/json" \
-d '{
"text": "Hello, this is a test of Grok text-to-speech.",
"language": "en",
"voice_id": "eve"
}'Example
Request
{
"text": "Hello, this is a test of Grok text-to-speech. [pause] Pretty cool, right?",
"language": "en",
"voice_id": "eve",
"output_format": "{ \"codec\": \"mp3\", \"sample_rate\": 24000, \"bit_rate\": 128000 }"
}Endpoint
Quality
The OpenAPI spec provides a clear schema with field descriptions, voice options, language support, and approximate pricing. However, the probe did not capture a 402 challenge on this specific endpoint (only HEAD/GET were tried on a POST-only route), no example response is available, and all crawled pages returned 404 with no additional documentation content. Liveness is plausible but not confirmed by the probe.
Warnings
- —Probe returned 404 on HEAD and GET; endpoint is POST-only so liveness is not confirmed by the probe bundle
- —No example response schema or sample response available in the OpenAPI spec
- —Pricing is approximate ('~$0.005 per 1,000 characters') and the exact amount field is null in the spec
- —The output_format field is typed as string but documented as a JSON object — callers may need to pass a serialized JSON string
Citations
- —TTS endpoint accepts text up to 15,000 characters with inline speech tagshttps://grok.mpp.paywithlocus.com
- —Five voice options: eve, ara, rex, sal, leohttps://grok.mpp.paywithlocus.com
- —Pricing is approximately $0.005 per 1,000 charactershttps://grok.mpp.paywithlocus.com
- —Supports codecs mp3, wav, pcm, mulaw, alawhttps://grok.mpp.paywithlocus.com
- —Payment settled via Tempo method with pathUSDhttps://grok.mpp.paywithlocus.com
- —API reference at docs.x.ai and grok.md skill filehttps://grok.mpp.paywithlocus.com