inworld

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Inworld AI

Text-to-Speech platform with voice cloning, audio markups, and timestamp alignment.

具备语音克隆、音频标记和时间戳对齐功能的Text-to-Speech平台。

Quick Navigation

快速导航

Topic	Reference
Installation	installation.md
Voice Cloning	cloning.md
Voice Control	voice-control.md
API Reference	api.md

主题	参考链接
安装	installation.md
语音克隆	cloning.md
语音控制	voice-control.md
API参考	api.md

When to Use

适用场景

Text-to-speech audio generation
Voice cloning from 5-15 seconds of audio
Emotion-controlled speech (
```
[happy]
```
,
```
[sad]
```
, etc.)
Word/phoneme timestamps for lip sync
Custom pronunciation with IPA

生成text-to-speech音频
基于5-15秒音频的语音克隆
情感控制语音（
```
[happy]
```
、
```
[sad]
```
等）
用于唇形同步的单词/音素时间戳
使用IPA的自定义发音

Models

模型

Model	ID	Latency	Price
TTS 1.5 Max	`inworld-tts-1.5-max`	~200ms	$10/1M chars
TTS 1.5 Mini	`inworld-tts-1.5-mini`	~120ms	$5/1M chars

模型	ID	延迟	价格
TTS 1.5 Max	`inworld-tts-1.5-max`	~200ms	10美元/百万字符
TTS 1.5 Mini	`inworld-tts-1.5-mini`	~120ms	5美元/百万字符

Minimal Example

最简示例

python

import requests, base64, os

response = requests.post(
    "https://api.inworld.ai/tts/v1/voice",
    headers={"Authorization": f"Basic {os.getenv('INWORLD_API_KEY')}"},
    json={"text": "Hello!", "voiceId": "Ashley", "modelId": "inworld-tts-1.5-max"}
)
audio = base64.b64decode(response.json()['audioContent'])

python

import requests, base64, os

response = requests.post(
    "https://api.inworld.ai/tts/v1/voice",
    headers={"Authorization": f"Basic {os.getenv('INWORLD_API_KEY')}"},
    json={"text": "Hello!", "voiceId": "Ashley", "modelId": "inworld-tts-1.5-max"}
)
audio = base64.b64decode(response.json()['audioContent'])

Key Features

核心功能

15 languages — en, zh, ja, ko, ru, it, es, pt, fr, de, pl, nl, hi, he, ar
Instant cloning — 5-15 seconds audio, no training
Audio markups —
```
[happy]
```
,
```
[laughing]
```
,
```
[sigh]
```
(English only)
Timestamps — word, phoneme, viseme timing for lip sync
Streaming —
```
/voice:stream
```
endpoint

15种语言 — 英文、中文、日文、韩文、俄文、意大利文、西班牙文、葡萄牙文、法文、德文、波兰文、荷兰文、印地文、希伯来文、阿拉伯文
即时克隆 — 仅需5-15秒音频，无需训练
音频标记 —
```
[happy]
```
、
```
[laughing]
```
、
```
[sigh]
```
（仅支持英文）
时间戳 — 用于唇形同步的单词、音素、viseme时间信息
流式传输 —
```
/voice:stream
```
端点

Prohibitions

禁止事项

Audio markups work only in English
Use ONE emotion markup at text beginning
Match voice language to text language
Instant cloning may not work for children's voices or unique accents

音频标记仅支持英文
仅可在文本开头使用一个情感标记
语音语言需与文本语言匹配
即时克隆可能无法处理儿童语音或特殊口音

inworld

Original

Translation

Inworld AI

Inworld AI

Quick Navigation

快速导航

When to Use

适用场景

Models

模型

Minimal Example

最简示例

Key Features

核心功能

Prohibitions

禁止事项

Links

相关链接