inworld

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Inworld AI

Inworld AI

Text-to-Speech platform with voice cloning, audio markups, and timestamp alignment.
具备语音克隆、音频标记和时间戳对齐功能的Text-to-Speech平台。

Quick Navigation

快速导航

TopicReference
Installationinstallation.md
Voice Cloningcloning.md
Voice Controlvoice-control.md
API Referenceapi.md
主题参考链接
安装installation.md
语音克隆cloning.md
语音控制voice-control.md
API参考api.md

When to Use

适用场景

  • Text-to-speech audio generation
  • Voice cloning from 5-15 seconds of audio
  • Emotion-controlled speech (
    [happy]
    ,
    [sad]
    , etc.)
  • Word/phoneme timestamps for lip sync
  • Custom pronunciation with IPA
  • 生成text-to-speech音频
  • 基于5-15秒音频的语音克隆
  • 情感控制语音(
    [happy]
    [sad]
    等)
  • 用于唇形同步的单词/音素时间戳
  • 使用IPA的自定义发音

Models

模型

ModelIDLatencyPrice
TTS 1.5 Max
inworld-tts-1.5-max
~200ms$10/1M chars
TTS 1.5 Mini
inworld-tts-1.5-mini
~120ms$5/1M chars
模型ID延迟价格
TTS 1.5 Max
inworld-tts-1.5-max
~200ms10美元/百万字符
TTS 1.5 Mini
inworld-tts-1.5-mini
~120ms5美元/百万字符

Minimal Example

最简示例

python
import requests, base64, os

response = requests.post(
    "https://api.inworld.ai/tts/v1/voice",
    headers={"Authorization": f"Basic {os.getenv('INWORLD_API_KEY')}"},
    json={"text": "Hello!", "voiceId": "Ashley", "modelId": "inworld-tts-1.5-max"}
)
audio = base64.b64decode(response.json()['audioContent'])
python
import requests, base64, os

response = requests.post(
    "https://api.inworld.ai/tts/v1/voice",
    headers={"Authorization": f"Basic {os.getenv('INWORLD_API_KEY')}"},
    json={"text": "Hello!", "voiceId": "Ashley", "modelId": "inworld-tts-1.5-max"}
)
audio = base64.b64decode(response.json()['audioContent'])

Key Features

核心功能

  • 15 languages — en, zh, ja, ko, ru, it, es, pt, fr, de, pl, nl, hi, he, ar
  • Instant cloning — 5-15 seconds audio, no training
  • Audio markups
    [happy]
    ,
    [laughing]
    ,
    [sigh]
    (English only)
  • Timestamps — word, phoneme, viseme timing for lip sync
  • Streaming
    /voice:stream
    endpoint
  • 15种语言 — 英文、中文、日文、韩文、俄文、意大利文、西班牙文、葡萄牙文、法文、德文、波兰文、荷兰文、印地文、希伯来文、阿拉伯文
  • 即时克隆 — 仅需5-15秒音频,无需训练
  • 音频标记
    [happy]
    [laughing]
    [sigh]
    (仅支持英文)
  • 时间戳 — 用于唇形同步的单词、音素、viseme时间信息
  • 流式传输
    /voice:stream
    端点

Prohibitions

禁止事项

  • Audio markups work only in English
  • Use ONE emotion markup at text beginning
  • Match voice language to text language
  • Instant cloning may not work for children's voices or unique accents
  • 音频标记仅支持英文
  • 仅可在文本开头使用一个情感标记
  • 语音语言需与文本语言匹配
  • 即时克隆可能无法处理儿童语音或特殊口音

Links

相关链接