text-to-speech
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseText-to-Speech (HeyGen Starfish)
文本转语音(HeyGen Starfish)
Generate speech audio files from text using HeyGen's in-house Starfish TTS model. This skill is for standalone audio generation — separate from video creation.
使用HeyGen自研的Starfish TTS模型将文本转换为语音音频文件。本技能用于独立音频生成——与视频创建功能分离。
Authentication
身份验证
All requests require the header. Set the environment variable.
X-Api-KeyHEYGEN_API_KEYbash
curl -X GET "https://api.heygen.com/v1/audio/voices" \
-H "X-Api-Key: $HEYGEN_API_KEY"所有请求都需要携带请求头。请设置环境变量。
X-Api-KeyHEYGEN_API_KEYbash
curl -X GET "https://api.heygen.com/v1/audio/voices" \
-H "X-Api-Key: $HEYGEN_API_KEY"Tool Selection
工具选择
If HeyGen MCP tools are available (), prefer them over direct HTTP API calls.
mcp__heygen__*| Task | MCP Tool | Fallback (Direct API) |
|---|---|---|
| List TTS voices | | |
| Generate speech audio | | |
如果HeyGen MCP工具可用(),优先使用这些工具而非直接调用HTTP API。
mcp__heygen__*| 任务 | MCP工具 | 备选方案(直接调用API) |
|---|---|---|
| 列出TTS语音 | | |
| 生成语音音频 | | |
Default Workflow
默认工作流程
- List voices with (or
mcp__heygen__list_audio_voices)GET /v1/audio/voices - Pick a voice matching desired language, gender, and features
- Call (or
mcp__heygen__text_to_speech) with text and voice_idPOST /v1/audio/text_to_speech - Use the returned to download or play the audio
audio_url
- 使用(或
mcp__heygen__list_audio_voices)列出语音GET /v1/audio/voices - 选择符合所需语言、性别及特性的语音
- 携带文本和voice_id调用(或
mcp__heygen__text_to_speech)POST /v1/audio/text_to_speech - 使用返回的下载或播放音频
audio_url
List TTS Voices
列出TTS语音
Retrieve voices compatible with the Starfish TTS model.
Note: This uses— a different endpoint from the video voices API (GET /v1/audio/voices). Not all video voices support Starfish TTS.GET /v2/voices
获取兼容Starfish TTS模型的语音。
注意: 此处使用的是接口——与视频语音API(GET /v1/audio/voices)不同。并非所有视频语音都支持Starfish TTS。GET /v2/voices
curl
curl示例
bash
curl -X GET "https://api.heygen.com/v1/audio/voices" \
-H "X-Api-Key: $HEYGEN_API_KEY"bash
curl -X GET "https://api.heygen.com/v1/audio/voices" \
-H "X-Api-Key: $HEYGEN_API_KEY"TypeScript
TypeScript示例
typescript
interface TTSVoice {
voice_id: string;
language: string;
gender: "female" | "male" | "unknown";
name: string;
preview_audio_url: string | null;
support_pause: boolean;
support_locale: boolean;
type: string;
}
interface TTSVoicesResponse {
error: null | string;
data: {
voices: TTSVoice[];
};
}
async function listTTSVoices(): Promise<TTSVoice[]> {
const response = await fetch("https://api.heygen.com/v1/audio/voices", {
headers: { "X-Api-Key": process.env.HEYGEN_API_KEY! },
});
const json: TTSVoicesResponse = await response.json();
if (json.error) {
throw new Error(json.error);
}
return json.data.voices;
}typescript
interface TTSVoice {
voice_id: string;
language: string;
gender: "female" | "male" | "unknown";
name: string;
preview_audio_url: string | null;
support_pause: boolean;
support_locale: boolean;
type: string;
}
interface TTSVoicesResponse {
error: null | string;
data: {
voices: TTSVoice[];
};
}
async function listTTSVoices(): Promise<TTSVoice[]> {
const response = await fetch("https://api.heygen.com/v1/audio/voices", {
headers: { "X-Api-Key": process.env.HEYGEN_API_KEY! },
});
const json: TTSVoicesResponse = await response.json();
if (json.error) {
throw new Error(json.error);
}
return json.data.voices;
}Python
Python示例
python
import requests
import os
def list_tts_voices() -> list:
response = requests.get(
"https://api.heygen.com/v1/audio/voices",
headers={"X-Api-Key": os.environ["HEYGEN_API_KEY"]}
)
data = response.json()
if data.get("error"):
raise Exception(data["error"])
return data["data"]["voices"]python
import requests
import os
def list_tts_voices() -> list:
response = requests.get(
"https://api.heygen.com/v1/audio/voices",
headers={"X-Api-Key": os.environ["HEYGEN_API_KEY"]}
)
data = response.json()
if data.get("error"):
raise Exception(data["error"])
return data["data"]["voices"]Response Format
响应格式
json
{
"error": null,
"data": {
"voices": [
{
"voice_id": "f38a635bee7a4d1f9b0a654a31d050d2",
"name": "Chill Brian",
"language": "English",
"gender": "male",
"preview_audio_url": "https://resource.heygen.ai/text_to_speech/WpSDQvmLGXEqXZVZQiVeg6.mp3",
"support_pause": true,
"support_locale": false,
"type": "public"
}
]
}
}json
{
"error": null,
"data": {
"voices": [
{
"voice_id": "f38a635bee7a4d1f9b0a654a31d050d2",
"name": "Chill Brian",
"language": "English",
"gender": "male",
"preview_audio_url": "https://resource.heygen.ai/text_to_speech/WpSDQvmLGXEqXZVZQiVeg6.mp3",
"support_pause": true,
"support_locale": false,
"type": "public"
}
]
}
}Generate Speech Audio
生成语音音频
Convert text to speech audio using a specified voice.
使用指定语音将文本转换为语音音频。
Endpoint
接口地址
POST https://api.heygen.com/v1/audio/text_to_speechPOST https://api.heygen.com/v1/audio/text_to_speechRequest Fields
请求参数
| Field | Type | Req | Description |
|---|---|---|---|
| string | Y | Text content to convert to speech |
| string | Y | Voice ID from |
| number | Speech speed, 0.5-1.5 (default: 1) | |
| integer | Voice pitch, -50 to 50 (default: 0) | |
| string | Accent/locale for multilingual voices (e.g., | |
| object | Advanced settings for ElevenLabs voices |
| 参数 | 类型 | 必填 | 描述 |
|---|---|---|---|
| string | 是 | 要转换为语音的文本内容 |
| string | 是 | 从 |
| number | 否 | 语速,范围0.5-1.5(默认值:1) |
| integer | 否 | 音调,范围-50至50(默认值:0) |
| string | 否 | 多语言语音的口音/区域设置(例如: |
| object | 否 | ElevenLabs语音的高级设置 |
ElevenLabs Settings (optional)
ElevenLabs设置(可选)
| Field | Type | Description |
|---|---|---|
| string | Model selection ( |
| number | Voice similarity, 0.0-1.0 |
| number | Output consistency, 0.0-1.0 |
| number | Style intensity, 0.0-1.0 |
| 参数 | 类型 | 描述 |
|---|---|---|
| string | 模型选择( |
| number | 语音相似度,范围0.0-1.0 |
| number | 输出一致性,范围0.0-1.0 |
| number | 风格强度,范围0.0-1.0 |
curl
curl示例
bash
curl -X POST "https://api.heygen.com/v1/audio/text_to_speech" \
-H "X-Api-Key: $HEYGEN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "Hello! Welcome to our product demo.",
"voice_id": "YOUR_VOICE_ID",
"speed": 1.0
}'bash
curl -X POST "https://api.heygen.com/v1/audio/text_to_speech" \
-H "X-Api-Key: $HEYGEN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "Hello! Welcome to our product demo.",
"voice_id": "YOUR_VOICE_ID",
"speed": 1.0
}'TypeScript
TypeScript示例
typescript
interface TTSRequest {
text: string;
voice_id: string;
speed?: number;
pitch?: number;
locale?: string;
elevenlabs_settings?: {
model?: string;
similarity_boost?: number;
stability?: number;
style?: number;
};
}
interface WordTimestamp {
word: string;
start: number;
end: number;
}
interface TTSResponse {
error: null | string;
data: {
audio_url: string;
duration: number;
request_id: string;
word_timestamps: WordTimestamp[];
};
}
async function textToSpeech(request: TTSRequest): Promise<TTSResponse["data"]> {
const response = await fetch(
"https://api.heygen.com/v1/audio/text_to_speech",
{
method: "POST",
headers: {
"X-Api-Key": process.env.HEYGEN_API_KEY!,
"Content-Type": "application/json",
},
body: JSON.stringify(request),
}
);
const json: TTSResponse = await response.json();
if (json.error) {
throw new Error(json.error);
}
return json.data;
}typescript
interface TTSRequest {
text: string;
voice_id: string;
speed?: number;
pitch?: number;
locale?: string;
elevenlabs_settings?: {
model?: string;
similarity_boost?: number;
stability?: number;
style?: number;
};
}
interface WordTimestamp {
word: string;
start: number;
end: number;
}
interface TTSResponse {
error: null | string;
data: {
audio_url: string;
duration: number;
request_id: string;
word_timestamps: WordTimestamp[];
};
}
async function textToSpeech(request: TTSRequest): Promise<TTSResponse["data"]> {
const response = await fetch(
"https://api.heygen.com/v1/audio/text_to_speech",
{
method: "POST",
headers: {
"X-Api-Key": process.env.HEYGEN_API_KEY!,
"Content-Type": "application/json",
},
body: JSON.stringify(request),
}
);
const json: TTSResponse = await response.json();
if (json.error) {
throw new Error(json.error);
}
return json.data;
}Python
Python示例
python
import requests
import os
def text_to_speech(
text: str,
voice_id: str,
speed: float = 1.0,
pitch: int = 0,
locale: str | None = None,
) -> dict:
payload = {
"text": text,
"voice_id": voice_id,
"speed": speed,
"pitch": pitch,
}
if locale:
payload["locale"] = locale
response = requests.post(
"https://api.heygen.com/v1/audio/text_to_speech",
headers={
"X-Api-Key": os.environ["HEYGEN_API_KEY"],
"Content-Type": "application/json",
},
json=payload,
)
data = response.json()
if data.get("error"):
raise Exception(data["error"])
return data["data"]python
import requests
import os
def text_to_speech(
text: str,
voice_id: str,
speed: float = 1.0,
pitch: int = 0,
locale: str | None = None,
) -> dict:
payload = {
"text": text,
"voice_id": voice_id,
"speed": speed,
"pitch": pitch,
}
if locale:
payload["locale"] = locale
response = requests.post(
"https://api.heygen.com/v1/audio/text_to_speech",
headers={
"X-Api-Key": os.environ["HEYGEN_API_KEY"],
"Content-Type": "application/json",
},
json=payload,
)
data = response.json()
if data.get("error"):
raise Exception(data["error"])
return data["data"]Response Format
响应格式
json
{
"error": null,
"data": {
"audio_url": "https://resource2.heygen.ai/text_to_speech/.../id=365d46bb.wav",
"duration": 5.526,
"request_id": "p38QJ52hfgNlsYKZZmd9",
"word_timestamps": [
{ "word": "<start>", "start": 0.0, "end": 0.0 },
{ "word": "Hey", "start": 0.079, "end": 0.219 },
{ "word": "there,", "start": 0.239, "end": 0.459 },
{ "word": "<end>", "start": 5.526, "end": 5.526 }
]
}
}json
{
"error": null,
"data": {
"audio_url": "https://resource2.heygen.ai/text_to_speech/.../id=365d46bb.wav",
"duration": 5.526,
"request_id": "p38QJ52hfgNlsYKZZmd9",
"word_timestamps": [
{ "word": "<start>", "start": 0.0, "end": 0.0 },
{ "word": "Hey", "start": 0.079, "end": 0.219 },
{ "word": "there,", "start": 0.239, "end": 0.459 },
{ "word": "<end>", "start": 5.526, "end": 5.526 }
]
}
}Usage Examples
使用示例
Basic TTS
基础文本转语音
typescript
const result = await textToSpeech({
text: "Welcome to our quarterly earnings call.",
voice_id: "YOUR_VOICE_ID",
});
console.log(`Audio URL: ${result.audio_url}`);
console.log(`Duration: ${result.duration}s`);typescript
const result = await textToSpeech({
text: "Welcome to our quarterly earnings call.",
voice_id: "YOUR_VOICE_ID",
});
console.log(`Audio URL: ${result.audio_url}`);
console.log(`Duration: ${result.duration}s`);With Speed Adjustment
调节语速
typescript
const result = await textToSpeech({
text: "We're thrilled to announce our newest feature!",
voice_id: "YOUR_VOICE_ID",
speed: 1.1,
});typescript
const result = await textToSpeech({
text: "We're thrilled to announce our newest feature!",
voice_id: "YOUR_VOICE_ID",
speed: 1.1,
});With Locale for Multilingual Voices
为多语言语音设置区域
typescript
const result = await textToSpeech({
text: "Bem-vindo ao nosso produto.",
voice_id: "MULTILINGUAL_VOICE_ID",
locale: "pt-BR",
});typescript
const result = await textToSpeech({
text: "Bem-vindo ao nosso produto.",
voice_id: "MULTILINGUAL_VOICE_ID",
locale: "pt-BR",
});Find a Voice and Generate Audio
查找语音并生成音频
typescript
async function generateSpeech(text: string, language: string): Promise<string> {
const voices = await listTTSVoices();
const voice = voices.find(
(v) => v.language.toLowerCase().includes(language.toLowerCase())
);
if (!voice) {
throw new Error(`No TTS voice found for language: ${language}`);
}
const result = await textToSpeech({
text,
voice_id: voice.voice_id,
});
return result.audio_url;
}
const audioUrl = await generateSpeech("Hello and welcome!", "english");typescript
async function generateSpeech(text: string, language: string): Promise<string> {
const voices = await listTTSVoices();
const voice = voices.find(
(v) => v.language.toLowerCase().includes(language.toLowerCase())
);
if (!voice) {
throw new Error(`No TTS voice found for language: ${language}`);
}
const result = await textToSpeech({
text,
voice_id: voice.voice_id,
});
return result.audio_url;
}
const audioUrl = await generateSpeech("Hello and welcome!", "english");Pauses with Break Tags
使用停顿标签添加停顿
Use SSML-style break tags in your text for pauses:
word <break time="1s"/> wordRules:
- Use seconds with suffix:
s<break time="1.5s"/> - Must have spaces before and after the tag
- Self-closing tag format
在文本中使用SSML风格的停顿标签来添加停顿:
word <break time="1s"/> word规则:
- 使用带后缀的秒数:
s<break time="1.5s"/> - 标签前后必须有空格
- 使用自闭合标签格式
Best Practices
最佳实践
- Use to find compatible voices — not all voices from
GET /v1/audio/voicessupport Starfish TTSGET /v2/voices - Check before setting a
support_locale— only multilingual voices support locale selectionlocale - Keep speed between 0.8-1.2 for natural-sounding output
- Preview voices using the before generating (may be null for some voices)
preview_audio_url - Use in the response for caption syncing or timed text overlays
word_timestamps - Use SSML break tags in your text for pauses:
word <break time="1s"/> word
- **使用**查找兼容语音——并非所有来自
GET /v1/audio/voices的语音都支持Starfish TTSGET /v2/voices - 设置前检查
locale——只有多语言语音支持区域设置support_locale - 将语速保持在0.8-1.2之间以获得自然的输出效果
- 生成前预览语音(部分语音的可能为null)
preview_audio_url - **使用响应中的**进行字幕同步或定时文本叠加
word_timestamps - 在文本中使用SSML停顿标签添加停顿:
word <break time="1s"/> word