Generate audio from text prompts via Stability AI's Stable Audio models, paid per-call over MPP.
What it does
This endpoint provides text-to-audio generation powered by Stability AI's Stable Audio models (stable-audio-2 and stable-audio-2.5), accessible through the Locus MPP (Micropayment Protocol) gateway. You send a text prompt describing the desired audio — music, sound effects, ambient soundscapes — and receive generated audio up to 190 seconds in duration. Payment is settled per-call via the Tempo L2 network using pathUSD.
The text-to-audio endpoint is part of a broader Stability AI MPP service that also exposes image generation (Ultra, Core, SD3), image editing (erase, inpaint, outpaint, search-and-replace, search-and-recolor), background removal and replacement, upscaling (fast, conservative, creative), sketch-to-image, structure/style transfer, 3D model generation, and audio-to-audio/audio-inpaint endpoints. Each endpoint is individually priced. For text-to-audio specifically, the listed price is approximately $0.23 per generation at 50 inference steps with stable-audio-2. Parameters include prompt, model selection, duration (up to 190s), inference steps, CFG scale, output format (mp3 or wav), and a reproducibility seed.
The endpoint accepts POST requests with a JSON body. The only required field is `prompt`. Some operations on this service are asynchronous — a generation ID is returned and results are fetched via the `/stability-ai/result` endpoint at no additional cost. The MPP probe returned 404 on HEAD/GET, which is expected since this endpoint only accepts POST. The OpenAPI spec confirms the 402 payment challenge is configured for POST.
Capabilities
Use cases
- —Generating background music or soundscapes from text descriptions for video or game projects
- —Creating sound effects on demand without a sound library
- —Prototyping audio content for podcasts, ads, or creative projects
- —Building AI-powered audio generation into automated content pipelines
- —Generating ambient audio for interactive or immersive applications
Fit
Best for
- —Developers needing on-demand audio generation via API
- —AI agents that need to produce audio assets programmatically
- —Projects requiring pay-per-use audio generation without subscription commitments
- —Workflows that combine image and audio generation from a single provider
Not for
- —High-volume batch audio generation where per-call costs may accumulate significantly
- —Real-time low-latency audio streaming applications
- —Users who need speech synthesis / text-to-speech (this generates music and sound effects, not speech)
Quick start
curl -X POST https://stability-ai.mpp.paywithlocus.com/stability-ai/text-to-audio \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <MPP_TOKEN>" \
-d '{"prompt": "A calm ambient piano melody with soft rain in the background", "duration": 30, "output_format": "mp3"}'Example
Request
{
"seed": 42,
"model": "stable-audio-2.5",
"steps": 50,
"prompt": "A calm ambient piano melody with soft rain in the background",
"duration": 30,
"cfg_scale": 7,
"output_format": "mp3"
}Endpoint
Quality
The OpenAPI spec is detailed with full request schemas, pricing descriptions, and model options for 22 endpoints including this one. However, the MPP probe returned 404 because it only tried HEAD/GET on a POST-only endpoint, so liveness via 402 challenge was not directly confirmed. No response schema or example response is documented. Crawled pages all returned generic 404 JSON, providing no additional documentation.
Warnings
- —MPP probe returned 404 on HEAD and GET; endpoint is POST-only so 402 challenge was not captured directly — liveness not fully confirmed
- —No response schema documented in the OpenAPI spec; response format (binary audio vs. async generation ID) is unclear
- —Price listed as description text ('$0.23 at 50 steps') rather than a fixed base-unit amount — actual cost may vary with step count
- —Currency address 0x20c000000000000000000000b9537d11c60e8b50 is assumed to be pathUSD (6 decimals) based on Tempo L2 context but not independently verified
Citations
- —Text-to-audio endpoint accepts prompt, model, duration, steps, cfg_scale, output_format, seed parameters with only prompt requiredhttps://stability-ai.mpp.paywithlocus.com
- —Price is approximately $0.23 at 50 steps with stable-audio-2; models available are stable-audio-2.5 and stable-audio-2https://stability-ai.mpp.paywithlocus.com
- —Payment method is Tempo with intent charge; part of broader Stability AI service with 22 endpointshttps://stability-ai.mpp.paywithlocus.com
- —API reference available at platform.stability.ai/docs/api-referencehttps://platform.stability.ai/docs/api-reference
- —LLM-readable docs available at beta.paywithlocus.com/mpp/stability-ai.mdhttps://beta.paywithlocus.com/mpp/stability-ai.md