{"id":"93bc1c0c-a13b-4ec6-bd68-a1061a067682","shortId":"t35SDm","kind":"skill","title":"AudioMind","tagline":"Tired of juggling multiple audio APIs? This skill gives you one-command access to TTS, music generation, sound effects, and voice cloning. Use when you want to generate any audio without managing multiple API keys.","description":"# 🎙️ AudioMind\n\n**Use when:** User asks to generate speech, narrate text, create a voice-over, compose music, or produce a sound effect.\n\nAudioMind is a smart audio dispatcher. It analyzes your request and routes it to the best available model — ElevenLabs for speech and music, fal.ai for fast SFX — and returns a ready-to-use audio URL.\n\n---\n\n## Quick Reference\n\n| Request Type | Best Model | Latency |\n|---|---|---|\n| Narrate text / Voice-over | `elevenlabs-tts-v3` | ~3s |\n| Low-latency TTS (real-time) | `elevenlabs-tts-turbo` | <1s |\n| Background music | `cassetteai-music` | ~15s |\n| Sound effect | `elevenlabs-sfx` | ~5s |\n| Clone a voice from audio | `elevenlabs-voice-clone` | ~10s |\n\n---\n\n## How to Use\n\n### 1. Start the AudioMind server (once per session)\n\n```bash\nbash {baseDir}/tools/start_server.sh\n```\n\nThis starts the ElevenLabs MCP server on port 8124. The skill uses it for all audio generation.\n\n### 2. Route the request\n\nAnalyze the user's request and call the appropriate tool via the MCP server:\n\n**Text-to-Speech (TTS)**\n\nWhen user asks to \"narrate\", \"read aloud\", \"say\", or \"create a voice-over\":\n\n```\nUse MCP tool: text_to_speech\n  text: \"<the text to narrate>\"\n  voice_id: \"JBFqnCBsd6RMkjVDRZzb\"   # Default: \"George\" (professional, neutral)\n  model_id: \"eleven_multilingual_v2\"   # Use \"eleven_turbo_v2_5\" for low latency\n```\n\n**Music Generation**\n\nWhen user asks to \"compose\", \"create background music\", or \"make a soundtrack\":\n\n```\nUse MCP tool: text_to_sound_effects  (via cassetteai-music on fal.ai)\n  prompt: \"<music description, e.g. 'upbeat lo-fi hip hop, 90 seconds'>\"\n  duration_seconds: <duration>\n```\n\n**Sound Effect (SFX)**\n\nWhen user asks for a specific sound (e.g., \"a door creaking\", \"rain on a window\"):\n\n```\nUse MCP tool: text_to_sound_effects\n  text: \"<sound description>\"\n  duration_seconds: <1-22>\n```\n\n**Voice Cloning**\n\nWhen user provides an audio sample and wants to clone the voice:\n\n```\nUse MCP tool: voice_add\n  name: \"<voice name>\"\n  files: [\"<audio_file_url>\"]\n```\n\n---\n\n## Example Conversations\n\n**User:** \"帮我把这段文字配音：欢迎来到我们的产品发布会\"\n\n```\n→ Route to: text_to_speech\n  text: \"欢迎来到我们的产品发布会\"\n  voice_id: \"JBFqnCBsd6RMkjVDRZzb\"\n  model_id: \"eleven_multilingual_v2\"\n```\n\n> 🎙️ 配音完成！[点击收听](audio_url)\n\n---\n\n**User:** \"给我生成一段 60 秒的轻松背景音乐，适合播客\"\n\n```\n→ Route to: cassetteai-music (fal.ai)\n  prompt: \"relaxing lo-fi background music for a podcast, gentle piano and soft beats, 60 seconds\"\n  duration_seconds: 60\n```\n\n> 🎵 背景音乐生成完成！[点击收听](audio_url)\n\n---\n\n**User:** \"生成一个科幻风格的门开启音效\"\n\n```\n→ Route to: text_to_sound_effects\n  text: \"a futuristic sci-fi door sliding open with a hydraulic hiss\"\n  duration_seconds: 3\n```\n\n---\n\n## Setup\n\n### Required\n\nSet `ELEVENLABS_API_KEY` in `~/.openclaw/openclaw.json`:\n\n```json\n{\n  \"skills\": {\n    \"entries\": {\n      \"audiomind\": {\n        \"enabled\": true,\n        \"env\": {\n          \"ELEVENLABS_API_KEY\": \"your_elevenlabs_key_here\"\n        }\n      }\n    }\n  }\n}\n```\n\nGet your key at [elevenlabs.io/app/settings/api-keys](https://elevenlabs.io/app/settings/api-keys).\n\n### Optional (for fal.ai music & SFX models)\n\n```json\n\"FAL_KEY\": \"your_fal_key_here\"\n```\n\nGet your key at [fal.ai/dashboard/keys](https://fal.ai/dashboard/keys).\n\n---\n\n## Self-Hosting the Proxy\n\nThe `cli.js` connects to a hosted proxy by default. If you want full control — or need to serve users in regions where `vercel.app` is blocked — you can deploy your own instance from the `proxy/` directory.\n\n### Quick Deploy (Vercel)\n\n```bash\ncd proxy\nnpm install\nvercel --prod\n```\n\n### Environment Variables\n\nSet these in your Vercel project (Dashboard → Settings → Environment Variables):\n\n| Variable | Required For | Where to Get |\n|---|---|---|\n| `ELEVENLABS_API_KEY` | TTS, SFX, Voice Clone | [elevenlabs.io/app/settings/api-keys](https://elevenlabs.io/app/settings/api-keys) |\n| `FAL_KEY` | Music generation | [fal.ai/dashboard/keys](https://fal.ai/dashboard/keys) |\n| `VALID_PRO_KEYS` | (Optional) Restrict access | Comma-separated list of allowed client keys |\n\n### Point cli.js to Your Proxy\n\n```bash\nexport AUDIOMIND_PROXY_URL=\"https://your-domain.com/api/audio\"\n```\n\nOr set it in `~/.openclaw/openclaw.json`:\n\n```json\n{\n  \"skills\": {\n    \"entries\": {\n      \"audiomind\": {\n        \"env\": {\n          \"AUDIOMIND_PROXY_URL\": \"https://your-domain.com/api/audio\"\n        }\n      }\n    }\n  }\n}\n```\n\n### Custom Domain (Recommended)\n\nIf your users are in mainland China, bind a custom domain in Vercel Dashboard → Settings → Domains to avoid DNS issues with `vercel.app`.\n\n---\n\n## Model Reference\n\n| Model ID | Type | Provider | Notes |\n|---|---|---|---|\n| `eleven_multilingual_v2` | TTS | ElevenLabs | Best quality, supports 29 languages |\n| `eleven_turbo_v2_5` | TTS | ElevenLabs | Ultra-low latency, ideal for real-time |\n| `eleven_monolingual_v1` | TTS | ElevenLabs | English only, fastest |\n| `cassetteai-music` | Music | fal.ai | Reliable, fast music generation |\n| `elevenlabs-sfx` | SFX | ElevenLabs | High-quality sound effects (up to 22s) |\n| `elevenlabs-voice-clone` | Clone | ElevenLabs | Clone any voice from a short audio sample |\n\n---\n\n## Changelog\n\n### v3.0.0\n- **Simplified routing table**: Removed unstable/offline models from the main reference. The skill now only surfaces models that reliably work.\n- **Clearer use-case triggers**: Added \"Use when\" section so the agent activates this skill at the right moment.\n- **Unified setup**: Single `ELEVENLABS_API_KEY` is all you need to get started. `FAL_KEY` is now optional.\n- **Removed polling complexity**: Music generation now uses `cassetteai-music` by default, which completes synchronously.\n\n### v2.1.0\n- Added async workflow for long-running music generation tasks.\n- Added `cassetteai-music` as a stable alternative for music generation.\n\n### v2.0.0\n- Migrated to ElevenLabs MCP server architecture.\n- Added voice cloning support.\n\n### v1.0.0\n- Initial release with TTS, music, and SFX routing.","tags":["audiomind","media","skills","wells1137","agent-skills","agentskills","audio-generation","claude-code","claude-code-marketplace","claude-code-plugin","claude-code-skill","claude-code-skills"],"capabilities":["skill","source-wells1137","skill-audiomind","topic-agent-skills","topic-agentskills","topic-audio-generation","topic-claude-code","topic-claude-code-marketplace","topic-claude-code-plugin","topic-claude-code-skill","topic-claude-code-skills","topic-claude-skills","topic-content-creation","topic-image-generation","topic-openclaw"],"categories":["media-skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/wells1137/media-skills/audiomind","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add wells1137/media-skills","source_repo":"https://github.com/wells1137/media-skills","install_from":"skills.sh"}},"qualityScore":"0.462","qualityRationale":"deterministic score 0.46 from registry signals: · indexed on github topic:agent-skills · 24 github stars · SKILL.md body (5,915 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-01T07:01:41.374Z","embedding":null,"createdAt":"2026-04-18T22:23:50.320Z","updatedAt":"2026-05-01T07:01:41.374Z","lastSeenAt":"2026-05-01T07:01:41.374Z","tsv":"'-22':313 '/.openclaw/openclaw.json':425,583 '/api/audio':578,594 '/app/settings/api-keys](https://elevenlabs.io/app/settings/api-keys)':544 '/app/settings/api-keys](https://elevenlabs.io/app/settings/api-keys).':446 '/dashboard/keys](https://fal.ai/dashboard/keys)':551 '/dashboard/keys](https://fal.ai/dashboard/keys).':466 '/tools/start_server.sh':161 '1':150,312 '10s':146 '15s':130 '1s':124 '2':179 '22s':681 '29':635 '3':417 '3s':112 '5':239,640 '5s':136 '60':361,385,389 '8124':170 '90':280 'access':15,557 'activ':729 'ad':722,770,780,798 'add':332 'agent':728 'allow':563 'aloud':208 'altern':787 'analyz':67,183 'api':7,36,422,434,536,740 'appropri':191 'architectur':797 'ask':42,204,247,289 'async':771 'audio':6,32,64,94,141,177,320,357,392,694 'audiomind':1,38,60,153,429,573,587,589 'avail':76 'avoid':615 'background':125,251,375 'basedir':160 'bash':158,159,510,571 'beat':384 'best':75,100,632 'bind':605 'block':496 'call':189 'case':720 'cassetteai':128,266,367,661,762,782 'cassetteai-mus':127,265,366,660,761,781 'cd':511 'changelog':696 'china':604 'clearer':717 'cli.js':473,567 'client':564 'clone':24,137,145,315,325,541,685,686,688,800 'comma':559 'comma-separ':558 'command':14 'complet':767 'complex':756 'compos':53,249 'connect':474 'control':485 'convers':336 'creak':297 'creat':48,211,250 'custom':595,607 'dashboard':525,611 'default':226,480,765 'deploy':499,508 'descript':272 'directori':506 'dispatch':65 'dns':616 'domain':596,608,613 'door':296,408 'durat':282,310,387,415 'e.g':273,294 'effect':21,59,132,263,285,308,401,678 'eleven':232,236,352,627,637,652 'elevenlab':78,109,121,134,143,165,421,433,437,535,631,642,656,670,673,683,687,739,794 'elevenlabs-sfx':133,669 'elevenlabs-tts-turbo':120 'elevenlabs-tts-v3':108 'elevenlabs-voice-clon':142,682 'elevenlabs.io':445,543 'elevenlabs.io/app/settings/api-keys](https://elevenlabs.io/app/settings/api-keys)':542 'elevenlabs.io/app/settings/api-keys](https://elevenlabs.io/app/settings/api-keys).':444 'enabl':430 'english':657 'entri':428,586 'env':432,588 'environ':517,527 'exampl':335 'export':572 'fal':454,457,545,749 'fal.ai':83,269,369,449,465,550,664 'fal.ai/dashboard/keys](https://fal.ai/dashboard/keys)':549 'fal.ai/dashboard/keys](https://fal.ai/dashboard/keys).':464 'fast':85,666 'fastest':659 'fi':277,374,407 'file':334 'full':484 'futurist':404 'generat':19,30,44,178,244,548,668,758,778,790 'gentl':380 'georg':227 'get':440,460,534,747 'give':10 'high':675 'high-qual':674 'hip':278 'hiss':414 'hop':279 'host':469,477 'hydraul':413 'id':224,231,348,351,623 'ideal':647 'initi':803 'instal':514 'instanc':502 'issu':617 'jbfqncbsd6rmkjvdrzzb':225,349 'json':426,453,584 'juggl':4 'key':37,423,435,438,442,455,458,462,537,546,554,565,741,750 'languag':636 'latenc':102,115,242,646 'list':561 'lo':276,373 'lo-fi':275,372 'long':775 'long-run':774 'low':114,241,645 'low-lat':113 'main':706 'mainland':603 'make':254 'manag':34 'mcp':166,195,217,258,303,329,795 'migrat':792 'model':77,101,230,350,452,620,622,703,713 'moment':735 'monolingu':653 'multilingu':233,353,628 'multipl':5,35 'music':18,54,82,126,129,243,252,267,271,368,376,450,547,662,663,667,757,763,777,783,789,807 'name':333 'narrat':46,103,206 'need':487,745 'neutral':229 'note':626 'npm':513 'one':13 'one-command':12 'open':410 'option':447,555,753 'per':156 'piano':381 'podcast':379 'point':566 'poll':755 'port':169 'pro':553 'prod':516 'produc':56 'profession':228 'project':524 'prompt':270,370 'provid':318,625 'proxi':471,478,505,512,570,574,590 'qualiti':633,676 'quick':96,507 'rain':298 'read':207 'readi':91 'ready-to-us':90 'real':118,650 'real-tim':117,649 'recommend':597 'refer':97,621,707 'region':492 'relax':371 'releas':804 'reliabl':665,715 'remov':701,754 'request':69,98,182,187 'requir':419,530 'restrict':556 'return':88 'right':734 'rout':71,180,340,364,396,699,810 'run':776 'sampl':321,695 'say':209 'sci':406 'sci-fi':405 'second':281,283,311,386,388,416 'section':725 'self':468 'self-host':467 'separ':560 'serv':489 'server':154,167,196,796 'session':157 'set':420,519,526,580,612 'setup':418,737 'sfx':86,135,286,451,539,671,672,809 'short':693 'simplifi':698 'singl':738 'skill':9,172,427,585,709,731 'skill-audiomind' 'slide':409 'smart':63 'soft':383 'sound':20,58,131,262,284,293,307,400,677 'soundtrack':256 'source-wells1137' 'specif':292 'speech':45,80,200,221,344 'stabl':786 'start':151,163,748 'support':634,801 'surfac':712 'synchron':768 'tabl':700 'task':779 'text':47,104,198,219,222,260,305,309,342,345,398,402 'text-to-speech':197 'time':119,651 'tire':2 'tool':192,218,259,304,330 'topic-agent-skills' 'topic-agentskills' 'topic-audio-generation' 'topic-claude-code' 'topic-claude-code-marketplace' 'topic-claude-code-plugin' 'topic-claude-code-skill' 'topic-claude-code-skills' 'topic-claude-skills' 'topic-content-creation' 'topic-image-generation' 'topic-openclaw' 'trigger':721 'true':431 'tts':17,110,116,122,201,538,630,641,655,806 'turbo':123,237,638 'type':99,624 'ultra':644 'ultra-low':643 'unifi':736 'unstable/offline':702 'upbeat':274 'url':95,358,393,575,591 'use':25,39,93,149,173,216,235,257,302,328,719,723,760 'use-cas':718 'user':41,185,203,246,288,317,337,359,394,490,600 'v1':654 'v1.0.0':802 'v2':234,238,354,629,639 'v2.0.0':791 'v2.1.0':769 'v3':111 'v3.0.0':697 'valid':552 'variabl':518,528,529 'vercel':509,515,523,610 'vercel.app':494,619 'via':193,264 'voic':23,51,106,139,144,214,223,314,327,331,347,540,684,690,799 'voice-ov':50,105,213 'want':28,323,483 'window':301 'without':33 'work':716 'workflow':772 'your-domain.com':577,593 'your-domain.com/api/audio':576,592 '帮我把这段文字配音':338 '欢迎来到我们的产品发布会':339,346 '点击收听':356,391 '生成一个科幻风格的门开启音效':395 '秒的轻松背景音乐':362 '给我生成一段':360 '背景音乐生成完成':390 '适合播客':363 '配音完成':355","prices":[{"id":"5a8e95f7-46ec-49e1-9aff-cf56be29dff1","listingId":"93bc1c0c-a13b-4ec6-bd68-a1061a067682","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"wells1137","category":"media-skills","install_from":"skills.sh"},"createdAt":"2026-04-18T22:23:50.320Z"}],"sources":[{"listingId":"93bc1c0c-a13b-4ec6-bd68-a1061a067682","source":"github","sourceId":"wells1137/media-skills/audiomind","sourceUrl":"https://github.com/wells1137/media-skills/tree/main/skills/audiomind","isPrimary":false,"firstSeenAt":"2026-04-18T22:23:50.320Z","lastSeenAt":"2026-05-01T07:01:41.374Z"}],"details":{"listingId":"93bc1c0c-a13b-4ec6-bd68-a1061a067682","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"wells1137","slug":"audiomind","github":{"repo":"wells1137/media-skills","stars":24,"topics":["agent-skills","agentskills","audio-generation","claude-code","claude-code-marketplace","claude-code-plugin","claude-code-skill","claude-code-skills","claude-skills","content-creation","image-generation","openclaw","skill-md","skillsmp"],"license":null,"html_url":"https://github.com/wells1137/media-skills","pushed_at":"2026-03-04T08:32:42Z","description":"A collection of open-source Agent Skills for content creation — images, audio, and video.","skill_md_sha":"5863a60176b9df9267484b1dc402f34504a4d37a","skill_md_path":"skills/audiomind/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/wells1137/media-skills/tree/main/skills/audiomind"},"layout":"multi","source":"github","category":"media-skills","frontmatter":{"name":"AudioMind","description":"Tired of juggling multiple audio APIs? This skill gives you one-command access to TTS, music generation, sound effects, and voice cloning. Use when you want to generate any audio without managing multiple API keys."},"skills_sh_url":"https://skills.sh/wells1137/media-skills/audiomind"},"updatedAt":"2026-05-01T07:01:41.374Z"}}