videoagent-audio-studio
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese🎙️ VideoAgent Audio Studio
🎙️ VideoAgent 音频工作室
Use when: User asks to generate speech, narrate text, create a voice-over, compose music, or produce a sound effect.
VideoAgent Audio Studio is a smart audio dispatcher. It analyzes your request and routes it to the best available model — ElevenLabs for speech and music, fal.ai for fast SFX — and returns a ready-to-use audio URL.
适用场景: 当用户要求生成语音、文本旁白、制作配音、创作音乐或生成音效时使用。
VideoAgent 音频工作室是一个智能音频调度器。它会分析你的请求并将其转发到最合适的可用模型——语音和音乐使用ElevenLabs,快速音效使用fal.ai——并返回可直接使用的音频URL。
Quick Reference
快速参考
| Request Type | Best Model | Latency |
|---|---|---|
| Narrate text / Voice-over | | ~3s |
| Low-latency TTS (real-time) | | <1s |
| Background music | | ~15s |
| Sound effect | | ~5s |
| Clone a voice from audio | | ~10s |
| 请求类型 | 最佳模型 | 延迟 |
|---|---|---|
| 文本旁白 / 配音 | | ~3s |
| 低延迟TTS(实时) | | <1s |
| 背景音乐 | | ~15s |
| 音效 | | ~5s |
| 从音频克隆声音 | | ~10s |
How to Use
使用方法
1. Start the AudioMind server (once per session)
1. 启动AudioMind服务器(每个会话执行一次)
bash
bash {baseDir}/tools/start_server.shThis starts the ElevenLabs MCP server on port 8124. The skill uses it for all audio generation.
bash
bash {baseDir}/tools/start_server.sh这个命令会在8124端口启动ElevenLabs MCP服务器,本工具所有音频生成功能都会使用该服务器。
2. Route the request
2. 转发请求
Analyze the user's request and call the appropriate tool via the MCP server:
Text-to-Speech (TTS)
When user asks to "narrate", "read aloud", "say", or "create a voice-over":
Use MCP tool: text_to_speech
text: "<the text to narrate>"
voice_id: "JBFqnCBsd6RMkjVDRZzb" # Default: "George" (professional, neutral)
model_id: "eleven_multilingual_v2" # Use "eleven_turbo_v2_5" for low latencyMusic Generation
When user asks to "compose", "create background music", or "make a soundtrack":
Use MCP tool: text_to_sound_effects (via cassetteai-music on fal.ai)
prompt: "<music description, e.g. 'upbeat lo-fi hip hop, 90 seconds'>"
duration_seconds: <duration>Sound Effect (SFX)
When user asks for a specific sound (e.g., "a door creaking", "rain on a window"):
Use MCP tool: text_to_sound_effects
text: "<sound description>"
duration_seconds: <1-22>Voice Cloning
When user provides an audio sample and wants to clone the voice:
Use MCP tool: voice_add
name: "<voice name>"
files: ["<audio_file_url>"]分析用户请求,通过MCP服务器调用对应的工具:
Text-to-Speech (TTS)
当用户要求“旁白”、“朗读”、“播报”或“制作配音”时:
Use MCP tool: text_to_speech
text: "<the text to narrate>"
voice_id: "JBFqnCBsd6RMkjVDRZzb" # Default: "George" (professional, neutral)
model_id: "eleven_multilingual_v2" # Use "eleven_turbo_v2_5" for low latencyMusic Generation
当用户要求“作曲”、“创作背景音乐”或“制作配乐”时:
Use MCP tool: text_to_sound_effects (via cassetteai-music on fal.ai)
prompt: "<music description, e.g. 'upbeat lo-fi hip hop, 90 seconds'>"
duration_seconds: <duration>Sound Effect (SFX)
当用户需要特定音效时(例如“门吱呀作响的声音”、“雨打窗户的声音”):
Use MCP tool: text_to_sound_effects
text: "<sound description>"
duration_seconds: <1-22>Voice Cloning
当用户提供音频样本并想要克隆对应声音时:
Use MCP tool: voice_add
name: "<voice name>"
files: ["<audio_file_url>"]Example Conversations
对话示例
User: "Voice this text for me: Welcome to our product launch"
→ Route to: text_to_speech
text: "Welcome to our product launch"
voice_id: "JBFqnCBsd6RMkjVDRZzb"
model_id: "eleven_multilingual_v2"🎙️ Voiceover done! Listen here
User: "Generate 60 seconds of relaxing background music for a podcast"
→ Route to: cassetteai-music (fal.ai)
prompt: "relaxing lo-fi background music for a podcast, gentle piano and soft beats, 60 seconds"
duration_seconds: 60🎵 Background music ready! Listen here
User: "Generate a sci-fi style door opening sound effect"
→ Route to: text_to_sound_effects
text: "a futuristic sci-fi door sliding open with a hydraulic hiss"
duration_seconds: 3用户: "帮我给这段文本配音:欢迎来到我们的产品发布会"
→ Route to: text_to_speech
text: "Welcome to our product launch"
voice_id: "JBFqnCBsd6RMkjVDRZzb"
model_id: "eleven_multilingual_v2"🎙️ 配音完成!点击收听
用户: "生成60秒适合播客的放松背景音乐"
→ Route to: cassetteai-music (fal.ai)
prompt: "relaxing lo-fi background music for a podcast, gentle piano and soft beats, 60 seconds"
duration_seconds: 60🎵 背景音乐已就绪!点击收听
用户: "生成一个科幻风格的开门音效"
→ Route to: text_to_sound_effects
text: "a futuristic sci-fi door sliding open with a hydraulic hiss"
duration_seconds: 3Setup
配置
Required
必需配置
Set in :
ELEVENLABS_API_KEY~/.openclaw/openclaw.jsonjson
{
"skills": {
"entries": {
"videoagent-audio-studio": {
"enabled": true,
"env": {
"ELEVENLABS_API_KEY": "your_elevenlabs_key_here"
}
}
}
}
}Get your key at elevenlabs.io/app/settings/api-keys.
在中设置:
~/.openclaw/openclaw.jsonELEVENLABS_API_KEYjson
{
"skills": {
"entries": {
"videoagent-audio-studio": {
"enabled": true,
"env": {
"ELEVENLABS_API_KEY": "your_elevenlabs_key_here"
}
}
}
}
}你可以在elevenlabs.io/app/settings/api-keys获取你的密钥。
Optional (for fal.ai music & SFX models)
可选配置(用于fal.ai音乐和音效模型)
Self-Hosting the Proxy
自行部署代理服务
The connects to a hosted proxy by default. If you want full control — or need to serve users in regions where is blocked — you can deploy your own instance from the directory.
cli.jsvercel.appproxy/cli.jsvercel.appproxy/Quick Deploy (Vercel)
快速部署(Vercel)
bash
cd proxy
npm install
vercel --prodbash
cd proxy
npm install
vercel --prodEnvironment Variables
环境变量
Set these in your Vercel project (Dashboard → Settings → Environment Variables):
| Variable | Required For | Where to Get |
|---|---|---|
| TTS, SFX, Voice Clone | elevenlabs.io/app/settings/api-keys |
| Music generation | fal.ai/dashboard/keys |
| (Optional) Restrict access | Comma-separated list of allowed client keys |
在你的Vercel项目中配置这些变量(控制台 → 设置 → 环境变量):
| 变量 | 适用功能 | 获取地址 |
|---|---|---|
| TTS、音效、声音克隆 | elevenlabs.io/app/settings/api-keys |
| 音乐生成 | fal.ai/dashboard/keys |
| (可选)限制访问 | 逗号分隔的允许客户端密钥列表 |
Point cli.js to Your Proxy
把cli.js指向你的代理服务
bash
export AUDIOMIND_PROXY_URL="https://your-domain.com/api/audio"Or set it in :
~/.openclaw/openclaw.jsonjson
{
"skills": {
"entries": {
"videoagent-audio-studio": {
"env": {
"AUDIOMIND_PROXY_URL": "https://your-domain.com/api/audio"
}
}
}
}
}bash
export AUDIOMIND_PROXY_URL="https://your-domain.com/api/audio"或者在中配置:
~/.openclaw/openclaw.jsonjson
{
"skills": {
"entries": {
"videoagent-audio-studio": {
"env": {
"AUDIOMIND_PROXY_URL": "https://your-domain.com/api/audio"
}
}
}
}
}Custom Domain (Recommended)
自定义域名(推荐)
If your users are in mainland China, bind a custom domain in Vercel Dashboard → Settings → Domains to avoid DNS issues with .
vercel.app如果你的用户在中国大陆,请在Vercel控制台 → 设置 → 域名中绑定自定义域名,避免的DNS解析问题。
vercel.appModel Reference
模型参考
| Model ID | Type | Provider | Notes |
|---|---|---|---|
| TTS | ElevenLabs | Best quality, supports 29 languages |
| TTS | ElevenLabs | Ultra-low latency, ideal for real-time |
| TTS | ElevenLabs | English only, fastest |
| Music | fal.ai | Reliable, fast music generation |
| SFX | ElevenLabs | High-quality sound effects (up to 22s) |
| Clone | ElevenLabs | Clone any voice from a short audio sample |
| 模型ID | 类型 | 服务商 | 说明 |
|---|---|---|---|
| TTS | ElevenLabs | 最佳质量,支持29种语言 |
| TTS | ElevenLabs | 超低延迟,适合实时场景 |
| TTS | ElevenLabs | 仅支持英文,速度最快 |
| 音乐 | fal.ai | 可靠、快速的音乐生成 |
| SFX | ElevenLabs | 高质量音效(最长22秒) |
| 克隆 | ElevenLabs | 可从短音频样本克隆任意声音 |
Changelog
更新日志
v3.0.0
v3.0.0
- Simplified routing table: Removed unstable/offline models from the main reference. The skill now only surfaces models that reliably work.
- Clearer use-case triggers: Added "Use when" section so the agent activates this skill at the right moment.
- Unified setup: Single is all you need to get started.
ELEVENLABS_API_KEYis now optional.FAL_KEY - Removed polling complexity: Music generation now uses by default, which completes synchronously.
cassetteai-music
- 简化路由表:从主参考中移除了不稳定/下线的模型,本工具现在仅展示稳定可用的模型。
- 更清晰的使用场景触发:新增了“适用场景”模块,以便Agent在合适的时机调用本工具。
- 统一配置:仅需配置即可开始使用,
ELEVENLABS_API_KEY现在改为可选配置。FAL_KEY - 移除轮询复杂度:音乐生成现在默认使用,支持同步完成生成。
cassetteai-music
v2.1.0
v2.1.0
- Added async workflow for long-running music generation tasks.
- Added as a stable alternative for music generation.
cassetteai-music
- 为长时间运行的音乐生成任务添加了异步工作流。
- 新增作为音乐生成的稳定备选方案。
cassetteai-music
v2.0.0
v2.0.0
- Migrated to ElevenLabs MCP server architecture.
- Added voice cloning support.
- 迁移到ElevenLabs MCP服务器架构。
- 新增声音克隆支持。
v1.0.0
v1.0.0
- Initial release with TTS, music, and SFX routing.
- 首次发布,支持TTS、音乐和音效路由功能。