audio-generation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAudio Generation
音频生成
Generate audio using with . Supports speech (TTS), music, and sound effects. ElevenLabs is preferred when available, with OpenAI as fallback.
generate_mediamode="audio"使用 并设置 生成音频,支持语音(TTS)、音乐和音效。如果条件允许会优先使用 ElevenLabs,OpenAI 作为 fallback 备选方案。
generate_mediamode="audio"Quick Start
快速开始
python
undefinedpython
undefinedText-to-speech (auto-selects ElevenLabs if key available)
Text-to-speech (auto-selects ElevenLabs if key available)
generate_media(prompt="Hello, welcome to our presentation!", mode="audio")
generate_media(prompt="Hello, welcome to our presentation!", mode="audio")
With specific voice
With specific voice
generate_media(prompt="Hello!", mode="audio", voice="Rachel")
generate_media(prompt="Hello!", mode="audio", voice="Rachel")
Music generation (ElevenLabs only)
Music generation (ElevenLabs only)
generate_media(prompt="Upbeat jazz piano with soft drums", mode="audio",
audio_type="music", duration=30)
generate_media(prompt="Upbeat jazz piano with soft drums", mode="audio",
audio_type="music", duration=30)
Sound effects (ElevenLabs only)
Sound effects (ElevenLabs only)
generate_media(prompt="Thunder rolling across a mountain valley", mode="audio",
audio_type="sound_effect", duration=5)
undefinedgenerate_media(prompt="Thunder rolling across a mountain valley", mode="audio",
audio_type="sound_effect", duration=5)
undefinedAudio Types
音频类型
| Type | Backends | Description |
|---|---|---|
| ElevenLabs, OpenAI | Text-to-speech with voice selection |
| ElevenLabs only | Music generation from text prompt |
| ElevenLabs only | Sound effect generation |
| ElevenLabs only | Change voice of existing audio (speech-to-speech) |
| ElevenLabs only | Remove background noise, isolate vocals |
| ElevenLabs only | Create a new synthetic voice from text description |
| ElevenLabs only | Clone a voice from audio samples |
| ElevenLabs only | Translate and dub audio to another language |
| 类型 | 支持后端 | 说明 |
|---|---|---|
| ElevenLabs, OpenAI | 支持音色选择的文本转语音 |
| 仅支持 ElevenLabs | 基于文本提示生成音乐 |
| 仅支持 ElevenLabs | 音效生成 |
| 仅支持 ElevenLabs | 更改现有音频的音色(语音转语音) |
| 仅支持 ElevenLabs | 去除背景噪音,分离人声 |
| 仅支持 ElevenLabs | 基于文本描述创建全新的合成音色 |
| 仅支持 ElevenLabs | 基于音频样本克隆音色 |
| 仅支持 ElevenLabs | 将音频翻译并配音为其他语言 |
Backend Comparison
后端对比
| Backend | Default Model | Supports | API Key |
|---|---|---|---|
| ElevenLabs (priority 1) | | Speech, music, SFX | |
| OpenAI (priority 2) | | Speech only | |
If ElevenLabs TTS fails, the system automatically falls back to OpenAI TTS.
| 后端 | 默认模型 | 支持能力 | 所需 API Key |
|---|---|---|---|
| ElevenLabs (优先级1) | | 语音、音乐、音效 | |
| OpenAI (优先级2) | | 仅支持语音 | |
如果 ElevenLabs TTS 调用失败,系统会自动回退到 OpenAI TTS。
Key Parameters
核心参数
| Parameter | Description | Example |
|---|---|---|
| Text to speak (speech) or description (music/SFX) | |
| Voice name or ID | |
| Type of audio | |
| Length in seconds (music/SFX only) | |
| Speaking style (OpenAI | |
| Output format | |
| 参数 | 说明 | 示例 |
|---|---|---|
| 要生成语音的文本(语音场景)或者对音频的描述(音乐/音效场景) | |
| 音色名称或ID | |
| 音频类型 | |
| 音频时长,单位为秒(仅音乐/音效场景可用) | |
| 说话风格(仅 OpenAI | |
| 输出格式 | |
Voice Quick Reference
音色快速参考
ElevenLabs (top voices):
| Voice | Character |
|---|---|
| Rachel | Warm, conversational female |
| Sarah | Clear, professional female |
| Josh | Friendly male |
| Adam | Deep, authoritative male |
| Emily | Bright, energetic female |
OpenAI voices: , , , , , , ,
alloyechofableonyxnovashimmercoralsageElevenLabs (热门音色):
| 音色 | 特点 |
|---|---|
| Rachel | 温暖、健谈的女性音色 |
| Sarah | 清晰、专业的女性音色 |
| Josh | 友好的男性音色 |
| Adam | 低沉、权威的男性音色 |
| Emily | 明亮、有活力的女性音色 |
OpenAI 音色: , , , , , , ,
alloyechofableonyxnovashimmercoralsageImportant: prompt vs instructions
重要提示:prompt 和 instructions 的区别
For speech, is the literal text to speak. Style guidance goes in :
promptinstructionspython
undefined对于语音生成场景, 是要朗读的具体文本,风格引导需要放在 中:
promptinstructionspython
undefinedCORRECT: prompt = text to speak, instructions = how to speak it
CORRECT: prompt = text to speak, instructions = how to speak it
generate_media(
prompt="Welcome to the annual report presentation.",
mode="audio",
voice="alloy",
instructions="warm, reflective tone with measured pacing",
backend_type="openai"
)
generate_media(
prompt="Welcome to the annual report presentation.",
mode="audio",
voice="alloy",
instructions="warm, reflective tone with measured pacing",
backend_type="openai"
)
WRONG: Don't put style instructions in prompt
WRONG: Don't put style instructions in prompt
generate_media(prompt="Say this warmly: Welcome...", mode="audio") # Bad!
`instructions` only works with OpenAI `gpt-4o-mini-tts`. ElevenLabs uses voice selection for tone.generate_media(prompt="Say this warmly: Welcome...", mode="audio") # Bad!
`instructions` 仅支持 OpenAI `gpt-4o-mini-tts`,ElevenLabs 通过选择不同音色来调整语气风格。Audio Understanding
音频理解
Use (not ) to analyze existing audio:
read_mediagenerate_mediapython
read_media(path="recording.mp3", prompt="Transcribe and summarize this audio")使用 (而非 )分析已有音频:
read_mediagenerate_mediapython
read_media(path="recording.mp3", prompt="Transcribe and summarize this audio")Need More Control?
需要更多自定义能力?
- Full ElevenLabs voice catalog (28+ voices): See references/voices.md
- Music and sound effects details: See references/music_and_sfx.md
- Advanced audio capabilities (voice conversion, cloning, isolation, dubbing): See references/advanced.md
- 完整 ElevenLabs 音色目录(28+ 款音色): 查看 references/voices.md
- 音乐与音效详情: 查看 references/music_and_sfx.md
- 高级音频能力(音色转换、克隆、分离、配音): 查看 references/advanced.md