elevenlabs

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

ElevenLabs AI Audio Platform

ElevenLabs AI 音频平台

Complete guide to ElevenLabs' audio AI capabilities: speech synthesis, transcription, voice cloning, sound effects, music generation, dubbing, and conversational voice agents.

ElevenLabs音频AI功能完整指南：语音合成、转录、语音克隆、音效生成、音乐生成、配音以及对话式语音Agent。

Quick Reference

快速参考

Capability	API/Tool	Use Case
Text-to-Speech	`text_to_speech`	Generate lifelike speech from text
Speech-to-Text	`speech_to_text`	Transcribe audio with Scribe v2
Voice Cloning	`voice_clone`	Clone voices from audio samples
Voice Design	`text_to_voice`	Create voices from text descriptions
Sound Effects	`text_to_sound_effects`	Generate SFX from prompts
Music	`compose_music`	Generate studio-grade music
Dubbing	Dubbing API	Translate video/audio (32 languages)
Voice Changer	`speech_to_speech`	Transform voice while preserving emotion
Voice Isolator	`isolate_audio`	Remove background noise
Voice Agents	Agents CLI/API	Build conversational AI agents

功能	API/工具	适用场景
文本转语音	`text_to_speech`	从文本生成逼真语音
语音转文本	`speech_to_text`	使用Scribe v2转录音频
语音克隆	`voice_clone`	从音频样本克隆语音
语音设计	`text_to_voice`	通过文本描述创建语音
音效生成	`text_to_sound_effects`	根据提示生成SFX
音乐生成	`compose_music`	生成专业级音乐
配音	Dubbing API	翻译视频/音频（支持32种语言）
变声	`speech_to_speech`	转换语音同时保留情感
人声分离	`isolate_audio`	去除背景噪音
语音Agent	Agents CLI/API	构建对话式AI Agent

Setup

设置

API Key

API密钥

bash

undefined

bash

undefined

Environment variable

环境变量

export ELEVENLABS_API_KEY="your-api-key"

Or in .env file

或在.env文件中

ELEVENLABS_API_KEY=your-api-key

undefined

ELEVENLABS_API_KEY=your-api-key

undefined

SDK Installation

SDK安装

bash

undefined

bash

undefined

Python

pip install elevenlabs

TypeScript/Node

npm install elevenlabs

undefined

npm install elevenlabs

undefined

MCP Server (for Claude Code, Cursor, etc.)

MCP服务器（适用于Claude Code、Cursor等）

json

{
  "mcpServers": {
    "ElevenLabs": {
      "command": "uvx",
      "args": ["elevenlabs-mcp"],
      "env": {
        "ELEVENLABS_API_KEY": "your-api-key"
      }
    }
  }
}

json

{
  "mcpServers": {
    "ElevenLabs": {
      "command": "uvx",
      "args": ["elevenlabs-mcp"],
      "env": {
        "ELEVENLABS_API_KEY": "your-api-key"
      }
    }
  }
}

Text-to-Speech (TTS)

文本转语音（TTS）

Convert text to lifelike speech. See references/tts-models.md for model details.

将文本转换为逼真语音。模型详情请查看references/tts-models.md。

Python SDK

python

from elevenlabs.client import ElevenLabs
from elevenlabs import play

client = ElevenLabs(api_key="your-api-key")

audio = client.text_to_speech.convert(
    text="Hello world!",
    voice_id="JBFqnCBsd6RMkjVDRZzb",  # George
    model_id="eleven_multilingual_v2",
    output_format="mp3_44100_128"
)
play(audio)

python

from elevenlabs.client import ElevenLabs
from elevenlabs import play

client = ElevenLabs(api_key="your-api-key")

audio = client.text_to_speech.convert(
    text="Hello world!",
    voice_id="JBFqnCBsd6RMkjVDRZzb",  # George
    model_id="eleven_multilingual_v2",
    output_format="mp3_44100_128"
)
play(audio)

MCP Tool

MCP工具

mcp__ElevenLabs__text_to_speech
- text: "Your text here"
- voice_name: "Rachel" (or voice_id)
- model_id: "eleven_multilingual_v2"
- stability: 0.5, similarity_boost: 0.75
- speed: 1.0 (range: 0.7-1.2)

mcp__ElevenLabs__text_to_speech
- text: "Your text here"
- voice_name: "Rachel" (or voice_id)
- model_id: "eleven_multilingual_v2"
- stability: 0.5, similarity_boost: 0.75
- speed: 1.0 (range: 0.7-1.2)

Model Selection

模型选择

Model	Latency	Languages	Best For
`eleven_multilingual_v2`	~500ms	29	High quality, long-form
`eleven_flash_v2_5`	~75ms	32	Real-time, agents
`eleven_turbo_v2_5`	~250ms	32	Balanced quality/speed
`eleven_v3` (alpha)	Higher	70+	Emotional, dramatic

模型	延迟	支持语言	最佳适用场景
`eleven_multilingual_v2`	~500ms	29种	高质量、长文本内容
`eleven_flash_v2_5`	~75ms	32种	实时交互、Agent
`eleven_turbo_v2_5`	~250ms	32种	质量与速度平衡
`eleven_v3` （测试版）	较高	70+种	情感化、戏剧化内容

Speech-to-Text (Scribe)

语音转文本（Scribe）

Transcribe audio with 90+ language support. See references/stt-scribe.md for details.

支持90+种语言的音频转录。详情请查看references/stt-scribe.md。

Python SDK

python

result = client.speech_to_text.convert(
    file=open("audio.mp3", "rb"),
    model_id="scribe_v2",
    diarize=True  # Speaker detection
)
print(result.text)

python

result = client.speech_to_text.convert(
    file=open("audio.mp3", "rb"),
    model_id="scribe_v2",
    diarize=True  # 说话人检测
)
print(result.text)

MCP Tool

MCP工具

mcp__ElevenLabs__speech_to_text
- input_file_path: "/path/to/audio.mp3"
- diarize: true (speaker detection)
- language_code: "eng" (or auto-detect)

mcp__ElevenLabs__speech_to_text
- input_file_path: "/path/to/audio.mp3"
- diarize: true (speaker detection)
- language_code: "eng" (or auto-detect)

Features

功能特性

90+ languages with word-level timestamps
Speaker diarization (up to 48 speakers)
Keyterm prompting (bias toward specific words)
Entity detection (names, numbers, dates)
Realtime mode (~150ms latency)

90+种语言支持，带词级时间戳
说话人分离（最多支持48位说话人）
关键词提示（偏向特定词汇）
实体识别（姓名、数字、日期）
实时模式（延迟约150ms）

Voice Cloning

语音克隆

Instant Voice Clone (MCP)

即时语音克隆（MCP）

mcp__ElevenLabs__voice_clone
- name: "My Voice"
- files: ["/path/to/sample1.mp3", "/path/to/sample2.mp3"]
- description: "Professional male voice"

mcp__ElevenLabs__voice_clone
- name: "My Voice"
- files: ["/path/to/sample1.mp3", "/path/to/sample2.mp3"]
- description: "Professional male voice"

Requirements

要求

Instant: 30+ seconds of clean audio
Professional: 30+ minutes for hyper-realistic clones
Creator plan or higher required

即时克隆：30秒以上清晰音频
专业克隆：30分钟以上音频以实现超逼真效果
需要Creator及以上套餐

Voice Design

语音设计

Create entirely new voices from text descriptions.

通过文本描述创建全新语音。

MCP Tool

MCP工具

mcp__ElevenLabs__text_to_voice
- voice_description: "A warm, friendly male voice with a slight British accent,
  perfect for audiobook narration"

Creates 3 voice previews to choose from. Use

create_voice_from_preview

to save.

mcp__ElevenLabs__text_to_voice
- voice_description: "A warm, friendly male voice with a slight British accent,
  perfect for audiobook narration"

将生成3种语音预览供选择，使用

create_voice_from_preview

保存所选语音。

Sound Effects

音效生成

Generate cinematic sound effects from text. See references/sound-effects.md.

通过文本生成电影级音效。详情请查看references/sound-effects.md。

MCP Tool

MCP工具

mcp__ElevenLabs__text_to_sound_effects
- text: "Heavy wooden door creaking open slowly"
- duration_seconds: 3.0 (0.5-30 seconds)
- loop: false

mcp__ElevenLabs__text_to_sound_effects
- text: "Heavy wooden door creaking open slowly"
- duration_seconds: 3.0 (0.5-30 seconds)
- loop: false

Prompting Tips

提示技巧

Simple: "Glass shattering on concrete"
Sequences: "Footsteps on gravel, then a metallic door opens"
Musical: "90s hip-hop drum loop, 90 BPM"

简洁型："Glass shattering on concrete"
序列型："Footsteps on gravel, then a metallic door opens"
音乐型："90s hip-hop drum loop, 90 BPM"

Music Generation

音乐生成

Generate studio-grade music. See references/music-generation.md.

生成专业级音乐。详情请查看references/music-generation.md。

MCP Tool

MCP工具

mcp__ElevenLabs__compose_music
- prompt: "Upbeat electronic track with driving synths, 120 BPM"
- music_length_ms: 60000 (10s-5min)

mcp__ElevenLabs__compose_music
- prompt: "Upbeat electronic track with driving synths, 120 BPM"
- music_length_ms: 60000 (10s-5min)

Features

功能特性

Complete control over genre, style, structure
Vocals or instrumental
Multilingual lyrics
Edit sections individually

完全控制流派、风格、结构
支持带人声或纯器乐
多语种歌词
可单独编辑各段落

Dubbing

配音

Translate audio/video while preserving speaker identity. See references/dubbing.md.

32 languages supported
Preserves emotion, timing, tone
Speaker separation (up to 9 speakers)
Files up to 1GB / 2.5 hours via API

在保留说话人特征的同时翻译音频/视频。详情请查看references/dubbing.md。

支持32种语言
保留情感、时长、语调
说话人分离（最多支持9位说话人）
API支持最大1GB / 2.5小时的文件

Voice Changer (Speech-to-Speech)

变声（语音转语音）

Transform any voice while preserving performance nuances.

转换任意语音同时保留表现细节。

MCP Tool

MCP工具

mcp__ElevenLabs__speech_to_speech
- input_file_path: "/path/to/recording.mp3"
- voice_id: "target_voice_id"

Preserves whispers, laughs, emotional cues
29 languages supported
Billed at 1000 chars/minute

mcp__ElevenLabs__speech_to_speech
- input_file_path: "/path/to/recording.mp3"
- voice_id: "target_voice_id"

保留低语、笑声、情感线索
支持29种语言
计费标准：1000字符/分钟

Voice Isolator

人声分离

Remove background noise from recordings.

去除录音中的背景噪音。

MCP Tool

MCP工具

mcp__ElevenLabs__isolate_audio
- input_file_path: "/path/to/noisy_audio.mp3"

Supports audio and video files
Files up to 500MB / 1 hour

mcp__ElevenLabs__isolate_audio
- input_file_path: "/path/to/noisy_audio.mp3"

支持音频和视频文件
支持最大500MB / 1小时的文件

Conversational Voice Agents

对话式语音Agent

Build and deploy voice-enabled AI agents. See references/voice-agents.md for comprehensive guide.

构建并部署语音交互AI Agent。完整指南请查看references/voice-agents.md。

CLI Quick Start

CLI快速开始

bash

undefined

bash

undefined

Install

安装

npm install -g @elevenlabs/cli

Initialize and authenticate

初始化并认证

elevenlabs agents init elevenlabs auth login

Create agent

创建Agent

elevenlabs agents add "Support Bot" --template customer-service

Deploy

部署

elevenlabs agents push

undefined

elevenlabs agents push

undefined

Templates

模板

Template	Use Case
`customer-service`	Professional support, low temp
`assistant`	General purpose, balanced
`voice-only`	Voice interactions only
`text-only`	Text conversations only
`minimal`	Quick prototyping

模板	适用场景
`customer-service`	专业客服、低温度
`assistant`	通用场景、平衡型
`voice-only`	仅语音交互
`text-only`	仅文本对话
`minimal`	快速原型开发

Agent Tools

Agent工具

Server Tools: Webhook API calls
Client Tools: Frontend events
MCP Tools: Model Context Protocol servers
System Tools: transfer_to_number, agent_transfer, end_call

服务器工具：Webhook API调用
客户端工具：前端事件
MCP工具：模型上下文协议服务器
系统工具：transfer_to_number、agent_transfer、end_call

Voice Library

语音库

Search Voices (MCP)

搜索语音（MCP）

mcp__ElevenLabs__search_voices
- search: "professional narrator"
- sort: "name" | "created_at_unix"

mcp__ElevenLabs__search_voices
- search: "professional narrator"
- sort: "name" | "created_at_unix"

Search Public Library

搜索公共语音库

mcp__ElevenLabs__search_voice_library
- search: "deep male"
- page_size: 10

mcp__ElevenLabs__search_voice_library
- search: "deep male"
- page_size: 10

Popular Voice IDs

Voice	ID	Style
Rachel	21m00Tcm4TlvDq8ikWAM	Neutral, professional
Adam	pNInz6obpgDQGcFmaJgB	Deep, warm
Bella	EXAVITQu4vr4xnSDxMaL	Soft, gentle

语音	ID	风格
Rachel	21m00Tcm4TlvDq8ikWAM	中立、专业
Adam	pNInz6obpgDQGcFmaJgB	深沉、温暖
Bella	EXAVITQu4vr4xnSDxMaL	柔和、亲切

Account & Billing

账户与计费

Check Subscription

查看订阅信息

mcp__ElevenLabs__check_subscription

mcp__ElevenLabs__check_subscription

List Models

列出模型

mcp__ElevenLabs__list_models

mcp__ElevenLabs__list_models

Reference Documentation

参考文档

Topic	File
TTS Models & Parameters	references/tts-models.md
Speech-to-Text (Scribe)	references/stt-scribe.md
Sound Effects Prompting	references/sound-effects.md
Music Generation	references/music-generation.md
Voice Agents (CLI/API)	references/voice-agents.md
Agent Prompting Guide	references/agent-prompting.md
Dubbing Guide	references/dubbing.md

主题	文件
TTS模型与参数	references/tts-models.md
语音转文本（Scribe）	references/stt-scribe.md
音效提示技巧	references/sound-effects.md
音乐生成	references/music-generation.md
语音Agent（CLI/API）	references/voice-agents.md
Agent提示指南	references/agent-prompting.md
配音指南	references/dubbing.md

Pricing & Limits

定价与限制

TTS: Per character (Flash models 50% cheaper)
STT: Per hour of audio
Sound Effects: 40 credits/second when duration specified
Music: Per generation
See: elevenlabs.io/pricing

TTS：按字符计费（Flash模型优惠50%）
STT：按音频小时数计费
音效：指定时长时按40积分/秒计费
音乐：按生成次数计费
详情查看：elevenlabs.io/pricing

Concurrency Limits (by plan)

并发限制（按套餐）

Plan	Multilingual v2	Flash/Turbo	STT
Free	2	4	8
Starter	3	6	12
Creator	5	10	20
Pro	10	20	40
Scale	15	30	60

套餐	Multilingual v2	Flash/Turbo	STT
免费版	2	4	8
入门版	3	6	12
Creator版	5	10	20
专业版	10	20	40
企业版	15	30	60