ai-voice-cloning
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseInstall the belt CLI skill:npx skills add belt-sh/cli
安装belt CLI技能:npx skills add belt-sh/cli
AI Voice Generation
AI语音生成
Generate natural AI voices via inference.sh CLI.

通过inference.sh CLI生成自然AI语音。

Quick Start
快速开始
Requires inference.sh CLI (). Install instructionsbelt
bash
belt login需要inference.sh CLI()。安装说明belt
bash
belt loginGenerate speech
Generate speech
belt app run infsh/kokoro-tts --input '{
"prompt": "Hello! This is an AI-generated voice that sounds natural and engaging.",
"voice": "af_sarah"
}'
undefinedbelt app run infsh/kokoro-tts --input '{
"prompt": "Hello! This is an AI-generated voice that sounds natural and engaging.",
"voice": "af_sarah"
}'
undefinedAvailable Models
可用模型
| Model | App ID | Best For |
|---|---|---|
| Inworld TTS-2 | | 100+ languages, emotion/non-verbal steering, delivery modes |
| Inworld TTS 1.5 Max | | Low latency (<200ms), 15 languages |
| Inworld TTS 1.5 Mini | | Ultra-low latency (~120ms), 15 languages, real-time |
| ElevenLabs TTS | | Premium quality, 22+ voices, 32 languages |
| ElevenLabs Voice Changer | | Transform existing voice recordings |
| Kokoro TTS | | Natural, multiple voices |
| DIA | | Conversational, expressive |
| Chatterbox | | Casual, entertainment |
| Higgs | | Professional narration |
| VibeVoice | | Emotional range |
| 模型 | App ID | 最佳适用场景 |
|---|---|---|
| Inworld TTS-2 | | 100+种语言,情绪/非语言表达调控,多种交付模式 |
| Inworld TTS 1.5 Max | | 低延迟(<200ms),15种语言 |
| Inworld TTS 1.5 Mini | | 超低延迟(~120ms),15种语言,实时响应 |
| ElevenLabs TTS | | 高品质,22+种语音,32种语言 |
| ElevenLabs Voice Changer | | 转换现有语音录音 |
| Kokoro TTS | | 自然语音,多语音选择 |
| DIA | | 对话式,富有表现力 |
| Chatterbox | | 休闲风格,娱乐场景 |
| Higgs | | 专业旁白 |
| VibeVoice | | 丰富情绪表达 |
Kokoro Voice Library
Kokoro语音库
American English
美式英语
| Voice ID | Gender | Style |
|---|---|---|
| Female | Warm, friendly |
| Female | Professional |
| Female | Youthful |
| Male | Authoritative |
| Male | Conversational |
| Male | Clear, neutral |
| 语音ID | 性别 | 风格 |
|---|---|---|
| 女 | 温暖友好 |
| 女 | 专业正式 |
| 女 | 年轻活力 |
| 男 | 权威沉稳 |
| 男 | 轻松对话 |
| 男 | 清晰中立 |
British English
英式英语
| Voice ID | Gender | Style |
|---|---|---|
| Female | Refined |
| Female | Warm |
| Male | Classic |
| Male | Modern |
| 语音ID | 性别 | 风格 |
|---|---|---|
| 女 | 优雅精致 |
| 女 | 温暖亲切 |
| 男 | 经典庄重 |
| 男 | 现代随性 |
Inworld TTS — Character & Emotion Voices
Inworld TTS — 角色与情绪语音
Inworld TTS-2 is purpose-built for character voices, gaming, and expressive speech. Use inline for emotion, non-verbals, and delivery control:
[brackets]bash
undefinedInworld TTS-2专为角色语音、游戏场景及富有表现力的语音打造。使用内联标记来控制情绪、非语言表达和交付模式:
[方括号]bash
undefinedExpressive character voice with emotion steering
Expressive character voice with emotion steering
belt app run inworld/text-to-speech-2 --input '{
"text": "[excited] Oh wow, you actually found the ancient artifact! [gasp] I cannot believe it... [whisper] We need to keep this between us.",
"voice_id": "Sarah",
"delivery_mode": "CREATIVE"
}'
belt app run inworld/text-to-speech-2 --input '{
"text": "[excited] Oh wow, you actually found the ancient artifact! [gasp] I cannot believe it... [whisper] We need to keep this between us.",
"voice_id": "Sarah",
"delivery_mode": "CREATIVE"
}'
Calm narrator with stable delivery
Calm narrator with stable delivery
belt app run inworld/text-to-speech-2 --input '{
"text": "The sun set behind the mountains, casting long shadows across the valley. A new chapter was about to begin.",
"voice_id": "Sarah",
"delivery_mode": "STABLE"
}'
**Delivery modes:** `STABLE` (consistent, narration), `BALANCED` (natural, default), `CREATIVE` (expressive, characters)
**Steering examples:** `[laugh]`, `[sigh]`, `[whisper]`, `[excited]`, `[sad]`, `[angry]`, `[pause]`, `[gasp]`
**Built-in voices** (271+ across 15 languages): `Sarah`, `Alex`, `Ashley`, `Dennis`, `Hana`, `Blake`, `Luna`, `Clive`, and many more. Browse all at the [Inworld TTS Playground](https://platform.inworld.ai/tts-playground).belt app run inworld/text-to-speech-2 --input '{
"text": "The sun set behind the mountains, casting long shadows across the valley. A new chapter was about to begin.",
"voice_id": "Sarah",
"delivery_mode": "STABLE"
}'
**交付模式:** `STABLE`(稳定一致,适用于旁白)、`BALANCED`(自然流畅,默认模式)、`CREATIVE`(富有表现力,适用于角色)
**调控示例:** `[laugh]`、`[sigh]`、`[whisper]`、`[excited]`、`[sad]`、`[angry]`、`[pause]`、`[gasp]`
**内置语音**(15种语言共271+种):`Sarah`、`Alex`、`Ashley`、`Dennis`、`Hana`、`Blake`、`Luna`、`Clive`等。可前往[Inworld TTS Playground](https://platform.inworld.ai/tts-playground)浏览全部语音。Low-Latency for Real-Time / Conversational AI
低延迟实时/对话式AI场景
bash
undefinedbash
undefinedUltra-fast response for chatbots & game NPCs (~120ms)
Ultra-fast response for chatbots & game NPCs (~120ms)
belt app run inworld/text-to-speech-1-5-mini --input '{
"text": "Welcome, traveler. What brings you to our village?",
"voice_id": "Clive",
"speaking_rate": 0.9
}'
undefinedbelt app run inworld/text-to-speech-1-5-mini --input '{
"text": "Welcome, traveler. What brings you to our village?",
"voice_id": "Clive",
"speaking_rate": 0.9
}'
undefinedVoice Generation Examples
语音生成示例
Professional Narration
专业旁白
bash
belt app run infsh/kokoro-tts --input '{
"prompt": "Welcome to our quarterly earnings call. Today we will discuss the financial performance and strategic initiatives for the past quarter.",
"voice": "am_michael",
"speed": 1.0
}'bash
belt app run infsh/kokoro-tts --input '{
"prompt": "Welcome to our quarterly earnings call. Today we will discuss the financial performance and strategic initiatives for the past quarter.",
"voice": "am_michael",
"speed": 1.0
}'Conversational Style
对话风格
bash
belt app run infsh/dia-tts --input '{
"text": "Hey, so I was thinking about that project we discussed. What if we tried a different approach?",
"voice": "conversational"
}'bash
belt app run infsh/dia-tts --input '{
"text": "Hey, so I was thinking about that project we discussed. What if we tried a different approach?",
"voice": "conversational"
}'Audiobook Narration
有声书旁白
bash
belt app run infsh/kokoro-tts --input '{
"prompt": "Chapter One. The morning mist hung low over the valley as Sarah made her way down the winding path. She had been walking for hours.",
"voice": "bf_emma",
"speed": 0.9
}'bash
belt app run infsh/kokoro-tts --input '{
"prompt": "Chapter One. The morning mist hung low over the valley as Sarah made her way down the winding path. She had been walking for hours.",
"voice": "bf_emma",
"speed": 0.9
}'Video Voiceover
视频配音
bash
belt app run infsh/kokoro-tts --input '{
"prompt": "Introducing the next generation of productivity. Work smarter, not harder.",
"voice": "af_nicole",
"speed": 1.1
}'bash
belt app run infsh/kokoro-tts --input '{
"prompt": "Introducing the next generation of productivity. Work smarter, not harder.",
"voice": "af_nicole",
"speed": 1.1
}'Podcast Host
播客主持人
bash
belt app run infsh/kokoro-tts --input '{
"prompt": "Welcome back to Tech Talk! Im your host, and today we are diving deep into the world of artificial intelligence.",
"voice": "am_adam"
}'bash
belt app run infsh/kokoro-tts --input '{
"prompt": "Welcome back to Tech Talk! Im your host, and today we are diving deep into the world of artificial intelligence.",
"voice": "am_adam"
}'Multi-Voice Conversation
多语音对话
bash
undefinedbash
undefinedGenerate dialogue between two speakers
Generate dialogue between two speakers
Speaker 1
Speaker 1
belt app run infsh/kokoro-tts --input '{
"prompt": "Have you seen the latest AI developments? Its incredible how fast things are moving.",
"voice": "am_michael"
}' > speaker1.json
belt app run infsh/kokoro-tts --input '{
"prompt": "Have you seen the latest AI developments? Its incredible how fast things are moving.",
"voice": "am_michael"
}' > speaker1.json
Speaker 2
Speaker 2
belt app run infsh/kokoro-tts --input '{
"prompt": "I know, right? Just last week I tried that new image generator and was blown away.",
"voice": "af_sarah"
}' > speaker2.json
belt app run infsh/kokoro-tts --input '{
"prompt": "I know, right? Just last week I tried that new image generator and was blown away.",
"voice": "af_sarah"
}' > speaker2.json
Merge conversation
Merge conversation
belt app run infsh/media-merger --input '{
"audio_files": ["<speaker1-url>", "<speaker2-url>"],
"crossfade_ms": 300
}'
undefinedbelt app run infsh/media-merger --input '{
"audio_files": ["<speaker1-url>", "<speaker2-url>"],
"crossfade_ms": 300
}'
undefinedLong-Form Content
长篇内容处理
Chunked Processing
分块处理
For content over 5000 characters, split into chunks:
bash
undefined对于超过5000字符的内容,需拆分为多个块:
bash
undefinedProcess long text in chunks
Process long text in chunks
TEXT="Your very long text here..."
TEXT="Your very long text here..."
Split and generate
Split and generate
Chunk 1
Chunk 1
belt app run infsh/kokoro-tts --input '{
"prompt": "<chunk-1>",
"voice": "bf_emma"
}' > chunk1.json
belt app run infsh/kokoro-tts --input '{
"prompt": "<chunk-1>",
"voice": "bf_emma"
}' > chunk1.json
Chunk 2
Chunk 2
belt app run infsh/kokoro-tts --input '{
"prompt": "<chunk-2>",
"voice": "bf_emma"
}' > chunk2.json
belt app run infsh/kokoro-tts --input '{
"prompt": "<chunk-2>",
"voice": "bf_emma"
}' > chunk2.json
Merge chunks
Merge chunks
belt app run infsh/media-merger --input '{
"audio_files": ["<chunk1-url>", "<chunk2-url>"],
"crossfade_ms": 100
}'
undefinedbelt app run infsh/media-merger --input '{
"audio_files": ["<chunk1-url>", "<chunk2-url>"],
"crossfade_ms": 100
}'
undefinedVoice + Video Workflow
语音+视频工作流
Add Voiceover to Video
为视频添加配音
bash
undefinedbash
undefined1. Generate voiceover
1. Generate voiceover
belt app run infsh/kokoro-tts --input '{
"prompt": "This stunning footage shows the beauty of nature in its purest form.",
"voice": "am_michael"
}' > voiceover.json
belt app run infsh/kokoro-tts --input '{
"prompt": "This stunning footage shows the beauty of nature in its purest form.",
"voice": "am_michael"
}' > voiceover.json
2. Merge with video
2. Merge with video
belt app run infsh/media-merger --input '{
"video_url": "https://your-video.mp4",
"audio_url": "<voiceover-url>"
}'
undefinedbelt app run infsh/media-merger --input '{
"video_url": "https://your-video.mp4",
"audio_url": "<voiceover-url>"
}'
undefinedCreate Talking Head
创建虚拟形象说话视频
bash
undefinedbash
undefined1. Generate speech
1. Generate speech
belt app run infsh/kokoro-tts --input '{
"prompt": "Hi, Im excited to share some updates with you today.",
"voice": "af_sarah"
}' > speech.json
belt app run infsh/kokoro-tts --input '{
"prompt": "Hi, Im excited to share some updates with you today.",
"voice": "af_sarah"
}' > speech.json
2. Animate with avatar
2. Animate with avatar
belt app run bytedance/omnihuman-1-5 --input '{
"image_url": "https://portrait.jpg",
"audio_url": "<speech-url>"
}'
undefinedbelt app run bytedance/omnihuman-1-5 --input '{
"image_url": "https://portrait.jpg",
"audio_url": "<speech-url>"
}'
undefinedSpeed and Pacing
语速与节奏
| Speed | Effect | Use For |
|---|---|---|
| 0.8 | Slow, deliberate | Audiobooks, meditation |
| 0.9 | Slightly slow | Education, tutorials |
| 1.0 | Normal | General purpose |
| 1.1 | Slightly fast | Commercials, energy |
| 1.2 | Fast | Quick announcements |
bash
undefined| 语速 | 效果 | 适用场景 |
|---|---|---|
| 0.8 | 缓慢沉稳 | 有声书、冥想内容 |
| 0.9 | 稍慢 | 教育内容、教程 |
| 1.0 | 正常 | 通用场景 |
| 1.1 | 稍快 | 广告、活力内容 |
| 1.2 | 快速 | 简短公告 |
bash
undefinedSlow narration
Slow narration
belt app run infsh/kokoro-tts --input '{
"prompt": "Take a deep breath. Let yourself relax.",
"voice": "bf_emma",
"speed": 0.8
}'
undefinedbelt app run infsh/kokoro-tts --input '{
"prompt": "Take a deep breath. Let yourself relax.",
"voice": "bf_emma",
"speed": 0.8
}'
undefinedPunctuation for Pacing
标点符号控制节奏
Use punctuation to control speech rhythm:
| Punctuation | Effect |
|---|---|
Period | Full pause |
Comma | Brief pause |
| Extended pause |
| Emphasis |
| Question intonation |
| Quick break |
bash
belt app run infsh/kokoro-tts --input '{
"prompt": "Wait... Did you hear that? Something is coming. Something big!",
"voice": "am_adam"
}'使用标点符号控制语音节奏:
| 标点 | 效果 |
|---|---|
句号 | 完全停顿 |
逗号 | 短暂停顿 |
| 延长停顿 |
| 加重语气 |
| 疑问语调 |
| 快速停顿 |
bash
belt app run infsh/kokoro-tts --input '{
"prompt": "Wait... Did you hear that? Something is coming. Something big!",
"voice": "am_adam"
}'Best Practices
最佳实践
- Match voice to content - Professional voice for business, casual for social
- Use punctuation - Control pacing with periods and commas
- Keep sentences short - Easier to generate and sounds more natural
- Test different voices - Same text sounds different across voices
- Adjust speed - Slightly slower often sounds more natural
- Break long content - Process in chunks for consistency
- 语音匹配内容 - 商务内容用专业语音,社交内容用休闲语音
- 使用标点符号 - 用句号和逗号控制节奏
- 保持短句 - 更易生成且听起来更自然
- 测试不同语音 - 同一文本在不同语音下效果不同
- 调整语速 - 稍慢的语速通常更自然
- 拆分长篇内容 - 分块处理以保证一致性
Use Cases
适用场景
- Voiceovers - Video narration, commercials
- Audiobooks - Full book narration
- Podcasts - AI hosts and guests
- E-learning - Course narration
- Accessibility - Screen reader content
- IVR - Phone system messages
- Content localization - Translate and voice
- 配音 - 视频旁白、广告
- 有声书 - 全书旁白
- 播客 - AI主持人与嘉宾
- 在线学习 - 课程旁白
- 无障碍服务 - 屏幕阅读器内容
- IVR - 电话系统语音
- 内容本地化 - 翻译并生成语音
Related Skills
相关技能
bash
undefinedbash
undefinedElevenLabs TTS (premium, 22+ voices)
ElevenLabs TTS (premium, 22+ voices)
npx skills add inference-sh/skills@elevenlabs-tts
npx skills add inference-sh/skills@elevenlabs-tts
ElevenLabs voice changer (transform recordings)
ElevenLabs voice changer (transform recordings)
npx skills add inference-sh/skills@elevenlabs-voice-changer
npx skills add inference-sh/skills@elevenlabs-voice-changer
All TTS models
All TTS models
npx skills add inference-sh/skills@text-to-speech
npx skills add inference-sh/skills@text-to-speech
Podcast creation
Podcast creation
npx skills add inference-sh/skills@ai-podcast-creation
npx skills add inference-sh/skills@ai-podcast-creation
AI avatars
AI avatars
npx skills add inference-sh/skills@ai-avatar-video
npx skills add inference-sh/skills@ai-avatar-video
Video generation
Video generation
npx skills add inference-sh/skills@ai-video-generation
npx skills add inference-sh/skills@ai-video-generation
Full platform skill
Full platform skill
npx skills add inference-sh/skills@infsh-cli
Browse audio apps: `belt app store --category audio`npx skills add inference-sh/skills@infsh-cli
浏览音频应用:`belt app store --category audio`