ai-voice-cloning

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
Install the belt CLI skill:
npx skills add belt-sh/cli
安装belt CLI技能:
npx skills add belt-sh/cli

AI Voice Generation

AI语音生成

Generate natural AI voices via inference.sh CLI.
AI Voice Generation
通过inference.sh CLI生成自然AI语音。
AI Voice Generation

Quick Start

快速开始

Requires inference.sh CLI (
belt
). Install instructions
bash
belt login
需要inference.sh CLI(
belt
)。安装说明
bash
belt login

Generate speech

Generate speech

belt app run infsh/kokoro-tts --input '{ "prompt": "Hello! This is an AI-generated voice that sounds natural and engaging.", "voice": "af_sarah" }'
undefined
belt app run infsh/kokoro-tts --input '{ "prompt": "Hello! This is an AI-generated voice that sounds natural and engaging.", "voice": "af_sarah" }'
undefined

Available Models

可用模型

ModelApp IDBest For
Inworld TTS-2
inworld/text-to-speech-2
100+ languages, emotion/non-verbal steering, delivery modes
Inworld TTS 1.5 Max
inworld/text-to-speech-1-5-max
Low latency (<200ms), 15 languages
Inworld TTS 1.5 Mini
inworld/text-to-speech-1-5-mini
Ultra-low latency (~120ms), 15 languages, real-time
ElevenLabs TTS
elevenlabs/tts
Premium quality, 22+ voices, 32 languages
ElevenLabs Voice Changer
elevenlabs/voice-changer
Transform existing voice recordings
Kokoro TTS
infsh/kokoro-tts
Natural, multiple voices
DIA
infsh/dia-tts
Conversational, expressive
Chatterbox
infsh/chatterbox
Casual, entertainment
Higgs
infsh/higgs-tts
Professional narration
VibeVoice
infsh/vibevoice
Emotional range
模型App ID最佳适用场景
Inworld TTS-2
inworld/text-to-speech-2
100+种语言,情绪/非语言表达调控,多种交付模式
Inworld TTS 1.5 Max
inworld/text-to-speech-1-5-max
低延迟(<200ms),15种语言
Inworld TTS 1.5 Mini
inworld/text-to-speech-1-5-mini
超低延迟(~120ms),15种语言,实时响应
ElevenLabs TTS
elevenlabs/tts
高品质,22+种语音,32种语言
ElevenLabs Voice Changer
elevenlabs/voice-changer
转换现有语音录音
Kokoro TTS
infsh/kokoro-tts
自然语音,多语音选择
DIA
infsh/dia-tts
对话式,富有表现力
Chatterbox
infsh/chatterbox
休闲风格,娱乐场景
Higgs
infsh/higgs-tts
专业旁白
VibeVoice
infsh/vibevoice
丰富情绪表达

Kokoro Voice Library

Kokoro语音库

American English

美式英语

Voice IDGenderStyle
af_sarah
FemaleWarm, friendly
af_nicole
FemaleProfessional
af_sky
FemaleYouthful
am_michael
MaleAuthoritative
am_adam
MaleConversational
am_echo
MaleClear, neutral
语音ID性别风格
af_sarah
温暖友好
af_nicole
专业正式
af_sky
年轻活力
am_michael
权威沉稳
am_adam
轻松对话
am_echo
清晰中立

British English

英式英语

Voice IDGenderStyle
bf_emma
FemaleRefined
bf_isabella
FemaleWarm
bm_george
MaleClassic
bm_lewis
MaleModern
语音ID性别风格
bf_emma
优雅精致
bf_isabella
温暖亲切
bm_george
经典庄重
bm_lewis
现代随性

Inworld TTS — Character & Emotion Voices

Inworld TTS — 角色与情绪语音

Inworld TTS-2 is purpose-built for character voices, gaming, and expressive speech. Use
[brackets]
inline for emotion, non-verbals, and delivery control:
bash
undefined
Inworld TTS-2专为角色语音、游戏场景及富有表现力的语音打造。使用
[方括号]
内联标记来控制情绪、非语言表达和交付模式:
bash
undefined

Expressive character voice with emotion steering

Expressive character voice with emotion steering

belt app run inworld/text-to-speech-2 --input '{ "text": "[excited] Oh wow, you actually found the ancient artifact! [gasp] I cannot believe it... [whisper] We need to keep this between us.", "voice_id": "Sarah", "delivery_mode": "CREATIVE" }'
belt app run inworld/text-to-speech-2 --input '{ "text": "[excited] Oh wow, you actually found the ancient artifact! [gasp] I cannot believe it... [whisper] We need to keep this between us.", "voice_id": "Sarah", "delivery_mode": "CREATIVE" }'

Calm narrator with stable delivery

Calm narrator with stable delivery

belt app run inworld/text-to-speech-2 --input '{ "text": "The sun set behind the mountains, casting long shadows across the valley. A new chapter was about to begin.", "voice_id": "Sarah", "delivery_mode": "STABLE" }'

**Delivery modes:** `STABLE` (consistent, narration), `BALANCED` (natural, default), `CREATIVE` (expressive, characters)

**Steering examples:** `[laugh]`, `[sigh]`, `[whisper]`, `[excited]`, `[sad]`, `[angry]`, `[pause]`, `[gasp]`

**Built-in voices** (271+ across 15 languages): `Sarah`, `Alex`, `Ashley`, `Dennis`, `Hana`, `Blake`, `Luna`, `Clive`, and many more. Browse all at the [Inworld TTS Playground](https://platform.inworld.ai/tts-playground).
belt app run inworld/text-to-speech-2 --input '{ "text": "The sun set behind the mountains, casting long shadows across the valley. A new chapter was about to begin.", "voice_id": "Sarah", "delivery_mode": "STABLE" }'

**交付模式:** `STABLE`(稳定一致,适用于旁白)、`BALANCED`(自然流畅,默认模式)、`CREATIVE`(富有表现力,适用于角色)

**调控示例:** `[laugh]`、`[sigh]`、`[whisper]`、`[excited]`、`[sad]`、`[angry]`、`[pause]`、`[gasp]`

**内置语音**(15种语言共271+种):`Sarah`、`Alex`、`Ashley`、`Dennis`、`Hana`、`Blake`、`Luna`、`Clive`等。可前往[Inworld TTS Playground](https://platform.inworld.ai/tts-playground)浏览全部语音。

Low-Latency for Real-Time / Conversational AI

低延迟实时/对话式AI场景

bash
undefined
bash
undefined

Ultra-fast response for chatbots & game NPCs (~120ms)

Ultra-fast response for chatbots & game NPCs (~120ms)

belt app run inworld/text-to-speech-1-5-mini --input '{ "text": "Welcome, traveler. What brings you to our village?", "voice_id": "Clive", "speaking_rate": 0.9 }'
undefined
belt app run inworld/text-to-speech-1-5-mini --input '{ "text": "Welcome, traveler. What brings you to our village?", "voice_id": "Clive", "speaking_rate": 0.9 }'
undefined

Voice Generation Examples

语音生成示例

Professional Narration

专业旁白

bash
belt app run infsh/kokoro-tts --input '{
  "prompt": "Welcome to our quarterly earnings call. Today we will discuss the financial performance and strategic initiatives for the past quarter.",
  "voice": "am_michael",
  "speed": 1.0
}'
bash
belt app run infsh/kokoro-tts --input '{
  "prompt": "Welcome to our quarterly earnings call. Today we will discuss the financial performance and strategic initiatives for the past quarter.",
  "voice": "am_michael",
  "speed": 1.0
}'

Conversational Style

对话风格

bash
belt app run infsh/dia-tts --input '{
  "text": "Hey, so I was thinking about that project we discussed. What if we tried a different approach?",
  "voice": "conversational"
}'
bash
belt app run infsh/dia-tts --input '{
  "text": "Hey, so I was thinking about that project we discussed. What if we tried a different approach?",
  "voice": "conversational"
}'

Audiobook Narration

有声书旁白

bash
belt app run infsh/kokoro-tts --input '{
  "prompt": "Chapter One. The morning mist hung low over the valley as Sarah made her way down the winding path. She had been walking for hours.",
  "voice": "bf_emma",
  "speed": 0.9
}'
bash
belt app run infsh/kokoro-tts --input '{
  "prompt": "Chapter One. The morning mist hung low over the valley as Sarah made her way down the winding path. She had been walking for hours.",
  "voice": "bf_emma",
  "speed": 0.9
}'

Video Voiceover

视频配音

bash
belt app run infsh/kokoro-tts --input '{
  "prompt": "Introducing the next generation of productivity. Work smarter, not harder.",
  "voice": "af_nicole",
  "speed": 1.1
}'
bash
belt app run infsh/kokoro-tts --input '{
  "prompt": "Introducing the next generation of productivity. Work smarter, not harder.",
  "voice": "af_nicole",
  "speed": 1.1
}'

Podcast Host

播客主持人

bash
belt app run infsh/kokoro-tts --input '{
  "prompt": "Welcome back to Tech Talk! Im your host, and today we are diving deep into the world of artificial intelligence.",
  "voice": "am_adam"
}'
bash
belt app run infsh/kokoro-tts --input '{
  "prompt": "Welcome back to Tech Talk! Im your host, and today we are diving deep into the world of artificial intelligence.",
  "voice": "am_adam"
}'

Multi-Voice Conversation

多语音对话

bash
undefined
bash
undefined

Generate dialogue between two speakers

Generate dialogue between two speakers

Speaker 1

Speaker 1

belt app run infsh/kokoro-tts --input '{ "prompt": "Have you seen the latest AI developments? Its incredible how fast things are moving.", "voice": "am_michael" }' > speaker1.json
belt app run infsh/kokoro-tts --input '{ "prompt": "Have you seen the latest AI developments? Its incredible how fast things are moving.", "voice": "am_michael" }' > speaker1.json

Speaker 2

Speaker 2

belt app run infsh/kokoro-tts --input '{ "prompt": "I know, right? Just last week I tried that new image generator and was blown away.", "voice": "af_sarah" }' > speaker2.json
belt app run infsh/kokoro-tts --input '{ "prompt": "I know, right? Just last week I tried that new image generator and was blown away.", "voice": "af_sarah" }' > speaker2.json

Merge conversation

Merge conversation

belt app run infsh/media-merger --input '{ "audio_files": ["<speaker1-url>", "<speaker2-url>"], "crossfade_ms": 300 }'
undefined
belt app run infsh/media-merger --input '{ "audio_files": ["<speaker1-url>", "<speaker2-url>"], "crossfade_ms": 300 }'
undefined

Long-Form Content

长篇内容处理

Chunked Processing

分块处理

For content over 5000 characters, split into chunks:
bash
undefined
对于超过5000字符的内容,需拆分为多个块:
bash
undefined

Process long text in chunks

Process long text in chunks

TEXT="Your very long text here..."
TEXT="Your very long text here..."

Split and generate

Split and generate

Chunk 1

Chunk 1

belt app run infsh/kokoro-tts --input '{ "prompt": "<chunk-1>", "voice": "bf_emma" }' > chunk1.json
belt app run infsh/kokoro-tts --input '{ "prompt": "<chunk-1>", "voice": "bf_emma" }' > chunk1.json

Chunk 2

Chunk 2

belt app run infsh/kokoro-tts --input '{ "prompt": "<chunk-2>", "voice": "bf_emma" }' > chunk2.json
belt app run infsh/kokoro-tts --input '{ "prompt": "<chunk-2>", "voice": "bf_emma" }' > chunk2.json

Merge chunks

Merge chunks

belt app run infsh/media-merger --input '{ "audio_files": ["<chunk1-url>", "<chunk2-url>"], "crossfade_ms": 100 }'
undefined
belt app run infsh/media-merger --input '{ "audio_files": ["<chunk1-url>", "<chunk2-url>"], "crossfade_ms": 100 }'
undefined

Voice + Video Workflow

语音+视频工作流

Add Voiceover to Video

为视频添加配音

bash
undefined
bash
undefined

1. Generate voiceover

1. Generate voiceover

belt app run infsh/kokoro-tts --input '{ "prompt": "This stunning footage shows the beauty of nature in its purest form.", "voice": "am_michael" }' > voiceover.json
belt app run infsh/kokoro-tts --input '{ "prompt": "This stunning footage shows the beauty of nature in its purest form.", "voice": "am_michael" }' > voiceover.json

2. Merge with video

2. Merge with video

belt app run infsh/media-merger --input '{ "video_url": "https://your-video.mp4", "audio_url": "<voiceover-url>" }'
undefined
belt app run infsh/media-merger --input '{ "video_url": "https://your-video.mp4", "audio_url": "<voiceover-url>" }'
undefined

Create Talking Head

创建虚拟形象说话视频

bash
undefined
bash
undefined

1. Generate speech

1. Generate speech

belt app run infsh/kokoro-tts --input '{ "prompt": "Hi, Im excited to share some updates with you today.", "voice": "af_sarah" }' > speech.json
belt app run infsh/kokoro-tts --input '{ "prompt": "Hi, Im excited to share some updates with you today.", "voice": "af_sarah" }' > speech.json

2. Animate with avatar

2. Animate with avatar

belt app run bytedance/omnihuman-1-5 --input '{ "image_url": "https://portrait.jpg", "audio_url": "<speech-url>" }'
undefined
belt app run bytedance/omnihuman-1-5 --input '{ "image_url": "https://portrait.jpg", "audio_url": "<speech-url>" }'
undefined

Speed and Pacing

语速与节奏

SpeedEffectUse For
0.8Slow, deliberateAudiobooks, meditation
0.9Slightly slowEducation, tutorials
1.0NormalGeneral purpose
1.1Slightly fastCommercials, energy
1.2FastQuick announcements
bash
undefined
语速效果适用场景
0.8缓慢沉稳有声书、冥想内容
0.9稍慢教育内容、教程
1.0正常通用场景
1.1稍快广告、活力内容
1.2快速简短公告
bash
undefined

Slow narration

Slow narration

belt app run infsh/kokoro-tts --input '{ "prompt": "Take a deep breath. Let yourself relax.", "voice": "bf_emma", "speed": 0.8 }'
undefined
belt app run infsh/kokoro-tts --input '{ "prompt": "Take a deep breath. Let yourself relax.", "voice": "bf_emma", "speed": 0.8 }'
undefined

Punctuation for Pacing

标点符号控制节奏

Use punctuation to control speech rhythm:
PunctuationEffect
Period
.
Full pause
Comma
,
Brief pause
...
Extended pause
!
Emphasis
?
Question intonation
-
Quick break
bash
belt app run infsh/kokoro-tts --input '{
  "prompt": "Wait... Did you hear that? Something is coming. Something big!",
  "voice": "am_adam"
}'
使用标点符号控制语音节奏:
标点效果
句号
.
完全停顿
逗号
,
短暂停顿
...
延长停顿
!
加重语气
?
疑问语调
-
快速停顿
bash
belt app run infsh/kokoro-tts --input '{
  "prompt": "Wait... Did you hear that? Something is coming. Something big!",
  "voice": "am_adam"
}'

Best Practices

最佳实践

  1. Match voice to content - Professional voice for business, casual for social
  2. Use punctuation - Control pacing with periods and commas
  3. Keep sentences short - Easier to generate and sounds more natural
  4. Test different voices - Same text sounds different across voices
  5. Adjust speed - Slightly slower often sounds more natural
  6. Break long content - Process in chunks for consistency
  1. 语音匹配内容 - 商务内容用专业语音,社交内容用休闲语音
  2. 使用标点符号 - 用句号和逗号控制节奏
  3. 保持短句 - 更易生成且听起来更自然
  4. 测试不同语音 - 同一文本在不同语音下效果不同
  5. 调整语速 - 稍慢的语速通常更自然
  6. 拆分长篇内容 - 分块处理以保证一致性

Use Cases

适用场景

  • Voiceovers - Video narration, commercials
  • Audiobooks - Full book narration
  • Podcasts - AI hosts and guests
  • E-learning - Course narration
  • Accessibility - Screen reader content
  • IVR - Phone system messages
  • Content localization - Translate and voice
  • 配音 - 视频旁白、广告
  • 有声书 - 全书旁白
  • 播客 - AI主持人与嘉宾
  • 在线学习 - 课程旁白
  • 无障碍服务 - 屏幕阅读器内容
  • IVR - 电话系统语音
  • 内容本地化 - 翻译并生成语音

Related Skills

相关技能

bash
undefined
bash
undefined

ElevenLabs TTS (premium, 22+ voices)

ElevenLabs TTS (premium, 22+ voices)

npx skills add inference-sh/skills@elevenlabs-tts
npx skills add inference-sh/skills@elevenlabs-tts

ElevenLabs voice changer (transform recordings)

ElevenLabs voice changer (transform recordings)

npx skills add inference-sh/skills@elevenlabs-voice-changer
npx skills add inference-sh/skills@elevenlabs-voice-changer

All TTS models

All TTS models

npx skills add inference-sh/skills@text-to-speech
npx skills add inference-sh/skills@text-to-speech

Podcast creation

Podcast creation

npx skills add inference-sh/skills@ai-podcast-creation
npx skills add inference-sh/skills@ai-podcast-creation

AI avatars

AI avatars

npx skills add inference-sh/skills@ai-avatar-video
npx skills add inference-sh/skills@ai-avatar-video

Video generation

Video generation

npx skills add inference-sh/skills@ai-video-generation
npx skills add inference-sh/skills@ai-video-generation

Full platform skill

Full platform skill

npx skills add inference-sh/skills@infsh-cli

Browse audio apps: `belt app store --category audio`
npx skills add inference-sh/skills@infsh-cli

浏览音频应用:`belt app store --category audio`