inworld
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseInworld AI
Inworld AI
Text-to-Speech platform with voice cloning, audio markups, and timestamp alignment.
具备语音克隆、音频标记和时间戳对齐功能的Text-to-Speech平台。
Quick Navigation
快速导航
| Topic | Reference |
|---|---|
| Installation | installation.md |
| Voice Cloning | cloning.md |
| Voice Control | voice-control.md |
| API Reference | api.md |
| 主题 | 参考链接 |
|---|---|
| 安装 | installation.md |
| 语音克隆 | cloning.md |
| 语音控制 | voice-control.md |
| API参考 | api.md |
When to Use
适用场景
- Text-to-speech audio generation
- Voice cloning from 5-15 seconds of audio
- Emotion-controlled speech (,
[happy], etc.)[sad] - Word/phoneme timestamps for lip sync
- Custom pronunciation with IPA
- 生成text-to-speech音频
- 基于5-15秒音频的语音克隆
- 情感控制语音(、
[happy]等)[sad] - 用于唇形同步的单词/音素时间戳
- 使用IPA的自定义发音
Models
模型
| Model | ID | Latency | Price |
|---|---|---|---|
| TTS 1.5 Max | | ~200ms | $10/1M chars |
| TTS 1.5 Mini | | ~120ms | $5/1M chars |
| 模型 | ID | 延迟 | 价格 |
|---|---|---|---|
| TTS 1.5 Max | | ~200ms | 10美元/百万字符 |
| TTS 1.5 Mini | | ~120ms | 5美元/百万字符 |
Minimal Example
最简示例
python
import requests, base64, os
response = requests.post(
"https://api.inworld.ai/tts/v1/voice",
headers={"Authorization": f"Basic {os.getenv('INWORLD_API_KEY')}"},
json={"text": "Hello!", "voiceId": "Ashley", "modelId": "inworld-tts-1.5-max"}
)
audio = base64.b64decode(response.json()['audioContent'])python
import requests, base64, os
response = requests.post(
"https://api.inworld.ai/tts/v1/voice",
headers={"Authorization": f"Basic {os.getenv('INWORLD_API_KEY')}"},
json={"text": "Hello!", "voiceId": "Ashley", "modelId": "inworld-tts-1.5-max"}
)
audio = base64.b64decode(response.json()['audioContent'])Key Features
核心功能
- 15 languages — en, zh, ja, ko, ru, it, es, pt, fr, de, pl, nl, hi, he, ar
- Instant cloning — 5-15 seconds audio, no training
- Audio markups — ,
[happy],[laughing](English only)[sigh] - Timestamps — word, phoneme, viseme timing for lip sync
- Streaming — endpoint
/voice:stream
- 15种语言 — 英文、中文、日文、韩文、俄文、意大利文、西班牙文、葡萄牙文、法文、德文、波兰文、荷兰文、印地文、希伯来文、阿拉伯文
- 即时克隆 — 仅需5-15秒音频,无需训练
- 音频标记 — 、
[happy]、[laughing](仅支持英文)[sigh] - 时间戳 — 用于唇形同步的单词、音素、viseme时间信息
- 流式传输 — 端点
/voice:stream
Prohibitions
禁止事项
- Audio markups work only in English
- Use ONE emotion markup at text beginning
- Match voice language to text language
- Instant cloning may not work for children's voices or unique accents
- 音频标记仅支持英文
- 仅可在文本开头使用一个情感标记
- 语音语言需与文本语言匹配
- 即时克隆可能无法处理儿童语音或特殊口音