Install the belt CLI skill:
npx skills add belt-sh/cli

安装belt CLI技能：
npx skills add belt-sh/cli

AI Voice Generation

AI语音生成

Generate natural AI voices via inference.sh CLI.

通过inference.sh CLI生成自然AI语音。

Quick Start

快速开始

Requires inference.sh CLI (
belt
). Install instructions

bash

belt login

需要inference.sh CLI（
belt
）。安装说明

bash

belt login

Generate speech

belt app run infsh/kokoro-tts --input '{ "prompt": "Hello! This is an AI-generated voice that sounds natural and engaging.", "voice": "af_sarah" }'

undefined

belt app run infsh/kokoro-tts --input '{ "prompt": "Hello! This is an AI-generated voice that sounds natural and engaging.", "voice": "af_sarah" }'

undefined

Available Models

可用模型

Model	App ID	Best For
Inworld TTS-2	`inworld/text-to-speech-2`	100+ languages, emotion/non-verbal steering, delivery modes
Inworld TTS 1.5 Max	`inworld/text-to-speech-1-5-max`	Low latency (<200ms), 15 languages
Inworld TTS 1.5 Mini	`inworld/text-to-speech-1-5-mini`	Ultra-low latency (~120ms), 15 languages, real-time
ElevenLabs TTS	`elevenlabs/tts`	Premium quality, 22+ voices, 32 languages
ElevenLabs Voice Changer	`elevenlabs/voice-changer`	Transform existing voice recordings
Kokoro TTS	`infsh/kokoro-tts`	Natural, multiple voices
DIA	`infsh/dia-tts`	Conversational, expressive
Chatterbox	`infsh/chatterbox`	Casual, entertainment
Higgs	`infsh/higgs-tts`	Professional narration
VibeVoice	`infsh/vibevoice`	Emotional range

模型	App ID	最佳适用场景
Inworld TTS-2	`inworld/text-to-speech-2`	100+种语言，情绪/非语言表达调控，多种交付模式
Inworld TTS 1.5 Max	`inworld/text-to-speech-1-5-max`	低延迟（<200ms），15种语言
Inworld TTS 1.5 Mini	`inworld/text-to-speech-1-5-mini`	超低延迟（~120ms），15种语言，实时响应
ElevenLabs TTS	`elevenlabs/tts`	高品质，22+种语音，32种语言
ElevenLabs Voice Changer	`elevenlabs/voice-changer`	转换现有语音录音
Kokoro TTS	`infsh/kokoro-tts`	自然语音，多语音选择
DIA	`infsh/dia-tts`	对话式，富有表现力
Chatterbox	`infsh/chatterbox`	休闲风格，娱乐场景
Higgs	`infsh/higgs-tts`	专业旁白
VibeVoice	`infsh/vibevoice`	丰富情绪表达

Kokoro Voice Library

Kokoro语音库

American English

美式英语

Voice ID	Gender	Style
`af_sarah`	Female	Warm, friendly
`af_nicole`	Female	Professional
`af_sky`	Female	Youthful
`am_michael`	Male	Authoritative
`am_adam`	Male	Conversational
`am_echo`	Male	Clear, neutral

语音ID	性别	风格
`af_sarah`	女	温暖友好
`af_nicole`	女	专业正式
`af_sky`	女	年轻活力
`am_michael`	男	权威沉稳
`am_adam`	男	轻松对话
`am_echo`	男	清晰中立

British English

英式英语

Voice ID	Gender	Style
`bf_emma`	Female	Refined
`bf_isabella`	Female	Warm
`bm_george`	Male	Classic
`bm_lewis`	Male	Modern

语音ID	性别	风格
`bf_emma`	女	优雅精致
`bf_isabella`	女	温暖亲切
`bm_george`	男	经典庄重
`bm_lewis`	男	现代随性

Inworld TTS — Character & Emotion Voices

Inworld TTS — 角色与情绪语音

Inworld TTS-2 is purpose-built for character voices, gaming, and expressive speech. Use

[brackets]

inline for emotion, non-verbals, and delivery control:

bash

undefined

Inworld TTS-2专为角色语音、游戏场景及富有表现力的语音打造。使用

[方括号]

内联标记来控制情绪、非语言表达和交付模式：

bash

undefined

Expressive character voice with emotion steering

belt app run inworld/text-to-speech-2 --input '{ "text": "[excited] Oh wow, you actually found the ancient artifact! [gasp] I cannot believe it... [whisper] We need to keep this between us.", "voice_id": "Sarah", "delivery_mode": "CREATIVE" }'

Calm narrator with stable delivery

belt app run inworld/text-to-speech-2 --input '{ "text": "The sun set behind the mountains, casting long shadows across the valley. A new chapter was about to begin.", "voice_id": "Sarah", "delivery_mode": "STABLE" }'


**Delivery modes:** `STABLE` (consistent, narration), `BALANCED` (natural, default), `CREATIVE` (expressive, characters)

**Steering examples:** `[laugh]`, `[sigh]`, `[whisper]`, `[excited]`, `[sad]`, `[angry]`, `[pause]`, `[gasp]`

**Built-in voices** (271+ across 15 languages): `Sarah`, `Alex`, `Ashley`, `Dennis`, `Hana`, `Blake`, `Luna`, `Clive`, and many more. Browse all at the [Inworld TTS Playground](https://platform.inworld.ai/tts-playground).

belt app run inworld/text-to-speech-2 --input '{ "text": "The sun set behind the mountains, casting long shadows across the valley. A new chapter was about to begin.", "voice_id": "Sarah", "delivery_mode": "STABLE" }'


**交付模式：** `STABLE`（稳定一致，适用于旁白）、`BALANCED`（自然流畅，默认模式）、`CREATIVE`（富有表现力，适用于角色）

**调控示例：** `[laugh]`、`[sigh]`、`[whisper]`、`[excited]`、`[sad]`、`[angry]`、`[pause]`、`[gasp]`

**内置语音**（15种语言共271+种）：`Sarah`、`Alex`、`Ashley`、`Dennis`、`Hana`、`Blake`、`Luna`、`Clive`等。可前往[Inworld TTS Playground](https://platform.inworld.ai/tts-playground)浏览全部语音。

Low-Latency for Real-Time / Conversational AI

低延迟实时/对话式AI场景

bash

undefined

bash

undefined

Ultra-fast response for chatbots & game NPCs (~120ms)

belt app run inworld/text-to-speech-1-5-mini --input '{ "text": "Welcome, traveler. What brings you to our village?", "voice_id": "Clive", "speaking_rate": 0.9 }'

undefined

belt app run inworld/text-to-speech-1-5-mini --input '{ "text": "Welcome, traveler. What brings you to our village?", "voice_id": "Clive", "speaking_rate": 0.9 }'

undefined

Voice Generation Examples

语音生成示例

Professional Narration

专业旁白

bash

belt app run infsh/kokoro-tts --input '{
  "prompt": "Welcome to our quarterly earnings call. Today we will discuss the financial performance and strategic initiatives for the past quarter.",
  "voice": "am_michael",
  "speed": 1.0
}'

bash

belt app run infsh/kokoro-tts --input '{
  "prompt": "Welcome to our quarterly earnings call. Today we will discuss the financial performance and strategic initiatives for the past quarter.",
  "voice": "am_michael",
  "speed": 1.0
}'

Conversational Style

对话风格

bash

belt app run infsh/dia-tts --input '{
  "text": "Hey, so I was thinking about that project we discussed. What if we tried a different approach?",
  "voice": "conversational"
}'

bash

belt app run infsh/dia-tts --input '{
  "text": "Hey, so I was thinking about that project we discussed. What if we tried a different approach?",
  "voice": "conversational"
}'

Audiobook Narration

有声书旁白

bash

belt app run infsh/kokoro-tts --input '{
  "prompt": "Chapter One. The morning mist hung low over the valley as Sarah made her way down the winding path. She had been walking for hours.",
  "voice": "bf_emma",
  "speed": 0.9
}'

bash

belt app run infsh/kokoro-tts --input '{
  "prompt": "Chapter One. The morning mist hung low over the valley as Sarah made her way down the winding path. She had been walking for hours.",
  "voice": "bf_emma",
  "speed": 0.9
}'

Video Voiceover

视频配音

bash

belt app run infsh/kokoro-tts --input '{
  "prompt": "Introducing the next generation of productivity. Work smarter, not harder.",
  "voice": "af_nicole",
  "speed": 1.1
}'

bash

belt app run infsh/kokoro-tts --input '{
  "prompt": "Introducing the next generation of productivity. Work smarter, not harder.",
  "voice": "af_nicole",
  "speed": 1.1
}'

Podcast Host

播客主持人

bash

belt app run infsh/kokoro-tts --input '{
  "prompt": "Welcome back to Tech Talk! Im your host, and today we are diving deep into the world of artificial intelligence.",
  "voice": "am_adam"
}'

bash

belt app run infsh/kokoro-tts --input '{
  "prompt": "Welcome back to Tech Talk! Im your host, and today we are diving deep into the world of artificial intelligence.",
  "voice": "am_adam"
}'

Multi-Voice Conversation

多语音对话

bash

undefined

bash

undefined

Generate dialogue between two speakers

Speaker 1

belt app run infsh/kokoro-tts --input '{ "prompt": "Have you seen the latest AI developments? Its incredible how fast things are moving.", "voice": "am_michael" }' > speaker1.json

Speaker 2

belt app run infsh/kokoro-tts --input '{ "prompt": "I know, right? Just last week I tried that new image generator and was blown away.", "voice": "af_sarah" }' > speaker2.json

Merge conversation

belt app run infsh/media-merger --input '{ "audio_files": ["<speaker1-url>", "<speaker2-url>"], "crossfade_ms": 300 }'

undefined

belt app run infsh/media-merger --input '{ "audio_files": ["<speaker1-url>", "<speaker2-url>"], "crossfade_ms": 300 }'

undefined

Long-Form Content

长篇内容处理

Chunked Processing

分块处理

For content over 5000 characters, split into chunks:

bash

undefined

对于超过5000字符的内容，需拆分为多个块：

bash

undefined

Process long text in chunks

TEXT="Your very long text here..."

Split and generate

Chunk 1

belt app run infsh/kokoro-tts --input '{ "prompt": "<chunk-1>", "voice": "bf_emma" }' > chunk1.json

Chunk 2

belt app run infsh/kokoro-tts --input '{ "prompt": "<chunk-2>", "voice": "bf_emma" }' > chunk2.json

Merge chunks

belt app run infsh/media-merger --input '{ "audio_files": ["<chunk1-url>", "<chunk2-url>"], "crossfade_ms": 100 }'

undefined

belt app run infsh/media-merger --input '{ "audio_files": ["<chunk1-url>", "<chunk2-url>"], "crossfade_ms": 100 }'

undefined

Voice + Video Workflow

语音+视频工作流

Add Voiceover to Video

为视频添加配音

bash

undefined

bash

undefined

1. Generate voiceover

belt app run infsh/kokoro-tts --input '{ "prompt": "This stunning footage shows the beauty of nature in its purest form.", "voice": "am_michael" }' > voiceover.json

2. Merge with video

belt app run infsh/media-merger --input '{ "video_url": "https://your-video.mp4", "audio_url": "<voiceover-url>" }'

undefined

belt app run infsh/media-merger --input '{ "video_url": "https://your-video.mp4", "audio_url": "<voiceover-url>" }'

undefined

Create Talking Head

创建虚拟形象说话视频

bash

undefined

bash

undefined

1. Generate speech

belt app run infsh/kokoro-tts --input '{ "prompt": "Hi, Im excited to share some updates with you today.", "voice": "af_sarah" }' > speech.json

2. Animate with avatar

belt app run bytedance/omnihuman-1-5 --input '{ "image_url": "https://portrait.jpg", "audio_url": "<speech-url>" }'

undefined

belt app run bytedance/omnihuman-1-5 --input '{ "image_url": "https://portrait.jpg", "audio_url": "<speech-url>" }'

undefined

Speed and Pacing

语速与节奏

Speed	Effect	Use For
0.8	Slow, deliberate	Audiobooks, meditation
0.9	Slightly slow	Education, tutorials
1.0	Normal	General purpose
1.1	Slightly fast	Commercials, energy
1.2	Fast	Quick announcements

bash

undefined

语速	效果	适用场景
0.8	缓慢沉稳	有声书、冥想内容
0.9	稍慢	教育内容、教程
1.0	正常	通用场景
1.1	稍快	广告、活力内容
1.2	快速	简短公告

bash

undefined

Slow narration

belt app run infsh/kokoro-tts --input '{ "prompt": "Take a deep breath. Let yourself relax.", "voice": "bf_emma", "speed": 0.8 }'

undefined

belt app run infsh/kokoro-tts --input '{ "prompt": "Take a deep breath. Let yourself relax.", "voice": "bf_emma", "speed": 0.8 }'

undefined

Punctuation for Pacing

标点符号控制节奏

Use punctuation to control speech rhythm:

Punctuation	Effect
Period `.`	Full pause
Comma `,`	Brief pause
`...`	Extended pause
`!`	Emphasis
`?`	Question intonation
`-`	Quick break

bash

belt app run infsh/kokoro-tts --input '{
  "prompt": "Wait... Did you hear that? Something is coming. Something big!",
  "voice": "am_adam"
}'

使用标点符号控制语音节奏：

标点	效果
句号 `.`	完全停顿
逗号 `,`	短暂停顿
`...`	延长停顿
`!`	加重语气
`?`	疑问语调
`-`	快速停顿

bash

belt app run infsh/kokoro-tts --input '{
  "prompt": "Wait... Did you hear that? Something is coming. Something big!",
  "voice": "am_adam"
}'

Best Practices

最佳实践

Match voice to content - Professional voice for business, casual for social
Use punctuation - Control pacing with periods and commas
Keep sentences short - Easier to generate and sounds more natural
Test different voices - Same text sounds different across voices
Adjust speed - Slightly slower often sounds more natural
Break long content - Process in chunks for consistency

语音匹配内容 - 商务内容用专业语音，社交内容用休闲语音
使用标点符号 - 用句号和逗号控制节奏
保持短句 - 更易生成且听起来更自然
测试不同语音 - 同一文本在不同语音下效果不同
调整语速 - 稍慢的语速通常更自然
拆分长篇内容 - 分块处理以保证一致性

Use Cases

适用场景

Voiceovers - Video narration, commercials
Audiobooks - Full book narration
Podcasts - AI hosts and guests
E-learning - Course narration
Accessibility - Screen reader content
IVR - Phone system messages
Content localization - Translate and voice

配音 - 视频旁白、广告
有声书 - 全书旁白
播客 - AI主持人与嘉宾
在线学习 - 课程旁白
无障碍服务 - 屏幕阅读器内容
IVR - 电话系统语音
内容本地化 - 翻译并生成语音

Related Skills

ElevenLabs TTS (premium, 22+ voices)

npx skills add inference-sh/skills@elevenlabs-tts

ElevenLabs voice changer (transform recordings)

npx skills add inference-sh/skills@elevenlabs-voice-changer

All TTS models

npx skills add inference-sh/skills@text-to-speech

Podcast creation

npx skills add inference-sh/skills@ai-podcast-creation

AI avatars

npx skills add inference-sh/skills@ai-avatar-video

Video generation

npx skills add inference-sh/skills@ai-video-generation

Full platform skill

npx skills add inference-sh/skills@infsh-cli


Browse audio apps: `belt app store --category audio`

npx skills add inference-sh/skills@infsh-cli


浏览音频应用：`belt app store --category audio`

ai-voice-cloning

Original

Translation

AI Voice Generation

AI语音生成

Quick Start

快速开始

Generate speech

Generate speech

Available Models

可用模型

Kokoro Voice Library

Kokoro语音库

American English

美式英语

British English

英式英语

Inworld TTS — Character & Emotion Voices

Inworld TTS — 角色与情绪语音

Expressive character voice with emotion steering

Expressive character voice with emotion steering

Calm narrator with stable delivery

Calm narrator with stable delivery

Low-Latency for Real-Time / Conversational AI

低延迟实时/对话式AI场景

Ultra-fast response for chatbots & game NPCs (~120ms)

Ultra-fast response for chatbots & game NPCs (~120ms)

Voice Generation Examples

语音生成示例

Professional Narration

专业旁白

Conversational Style

对话风格

Audiobook Narration

有声书旁白

Video Voiceover

视频配音

Podcast Host

播客主持人

Multi-Voice Conversation

多语音对话

Generate dialogue between two speakers

Generate dialogue between two speakers

Speaker 1

Speaker 1

Speaker 2

Speaker 2

Merge conversation

Merge conversation

Long-Form Content

长篇内容处理

Chunked Processing

分块处理

Process long text in chunks

Process long text in chunks

Split and generate

Split and generate

Chunk 1

Chunk 1

Chunk 2

Chunk 2

Merge chunks

Merge chunks

Voice + Video Workflow

语音+视频工作流

Add Voiceover to Video

为视频添加配音

1. Generate voiceover

1. Generate voiceover

2. Merge with video

2. Merge with video

Create Talking Head

创建虚拟形象说话视频

1. Generate speech

1. Generate speech

2. Animate with avatar

2. Animate with avatar

Speed and Pacing

语速与节奏

Slow narration