speech-engine
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseElevenLabs Speech Engine
ElevenLabs Speech Engine
Add a real-time voice interface to a custom LLM-backed agent. ElevenLabs handles microphone audio, speech-to-text, turn-taking, text-to-speech, and browser playback; your server exposes a Speech Engine WebSocket endpoint and streams the LLM response text back.
Setup: See Installation Guide. For JavaScript, usepackages only. For deeper SDK details, read JavaScript SDK Reference or Python SDK Reference.@elevenlabs/*
为基于自定义LLM的agent添加实时语音交互界面。ElevenLabs负责处理麦克风音频、语音转文本、话轮转换、文本转语音及浏览器播放;您的服务器需暴露一个Speech Engine WebSocket端点,并将LLM响应文本流式返回。
设置说明: 查看安装指南。JavaScript环境下仅使用包。如需深入了解SDK细节,请阅读JavaScript SDK参考文档或Python SDK参考文档。@elevenlabs/*
When to Use
适用场景
Use Speech Engine when the user wants to:
- Add voice to an existing chat app or custom LLM pipeline
- Add voice to OpenClaw, Hermes, or a similar agent runtime while keeping agent logic on the developer-owned server
- Build a developer-hosted WebSocket server for ElevenLabs voice conversations
- Stream OpenAI, Anthropic, Gemini, or custom LLM responses back as spoken audio
- Handle user interruptions while an LLM response is still streaming
- Build a browser client with or
@elevenlabs/reactusing a server-issued conversation token@elevenlabs/client
Use the skill instead when the user is creating or configuring a hosted ElevenLabs Conversational AI agent with platform-managed prompts, tools, workflows, phone numbers, or widgets.
agents当用户有以下需求时,可使用Speech Engine:
- 为现有聊天应用或自定义LLM流程添加语音功能
- 为OpenClaw、Hermes或类似的agent运行时添加语音功能,同时将agent逻辑保留在开发者自有服务器上
- 为ElevenLabs语音对话构建开发者托管的WebSocket服务器
- 将OpenAI、Anthropic、Gemini或自定义LLM的响应以语音音频形式流式返回
- 在LLM响应仍在流式传输时处理用户中断
- 使用服务器颁发的对话令牌,通过或
@elevenlabs/react构建浏览器客户端@elevenlabs/client
若用户正在创建或配置由平台管理提示词、工具、工作流、电话号码或小部件的托管式ElevenLabs对话AI agent,则应使用技能。
agentsHow It Works
工作原理
Each Speech Engine WebSocket connection represents one conversation.
- The browser sends user audio to ElevenLabs.
- ElevenLabs transcribes speech and sends the full transcript to your server.
- Your server calls the LLM with that conversation history.
- Your server streams text back through the SDK.
- ElevenLabs converts the response to speech and plays it in the browser.
The SDK manages WebSocket routing, request verification, session lifecycle, ping/pong, turn-taking, and interruption handling. / accepts a string, an async iterable, or provider streams from OpenAI, Anthropic, or Google Gemini.
sendResponse()send_response()每个Speech Engine WebSocket连接代表一次对话。
- 浏览器将用户音频发送至ElevenLabs。
- ElevenLabs转录语音并将完整转录文本发送至您的服务器。
- 您的服务器使用该对话历史调用LLM。
- 您的服务器通过SDK流式返回文本。
- ElevenLabs将响应转换为语音并在浏览器中播放。
SDK负责管理WebSocket路由、请求验证、会话生命周期、心跳检测、话轮转换及中断处理。 / 支持传入字符串、异步可迭代对象,或来自OpenAI、Anthropic、Google Gemini的提供商流。
sendResponse()send_response()Implementation Flow
实现流程
- Install server dependencies and configure plus the LLM provider key.
ELEVENLABS_API_KEY - Expose your Speech Engine server through a public HTTPS URL for local development, for example with .
ngrok http 3001 - Create a Speech Engine resource with /
ws_urlpointing at the public URL pluswsUrl./ws - Store the returned Speech Engine ID, for example in .
ELEVENLABS_SPEECH_ENGINE_ID - Start a Speech Engine server with in Python or
engine.serve(...)in TypeScript.speechEngine.attach(...) - Issue browser conversation tokens from a server endpoint. Never put in browser code.
ELEVENLABS_API_KEY - Start the client session with ; optionally set
conversationTokenif the agent should greet first.overrides.agent.firstMessage
- 安装服务器依赖项,并配置及LLM提供商密钥。
ELEVENLABS_API_KEY - 通过公开HTTPS URL暴露您的Speech Engine服务器以进行本地开发,例如使用。
ngrok http 3001 - 创建Speech Engine资源,将/
ws_url指向公开URL加wsUrl路径。/ws - 存储返回的Speech Engine ID,例如存入。
ELEVENLABS_SPEECH_ENGINE_ID - 在Python中使用,或在TypeScript中使用
engine.serve(...)启动Speech Engine服务器。speechEngine.attach(...) - 从服务器端点颁发浏览器对话令牌。切勿在浏览器代码中放入。
ELEVENLABS_API_KEY - 使用启动客户端会话;若agent需要主动问候,可选择性设置
conversationToken。overrides.agent.firstMessage
Create a Speech Engine
创建Speech Engine
Python
Python
python
import asyncio
import os
from dotenv import load_dotenv
from elevenlabs import AsyncElevenLabs
load_dotenv()
elevenlabs = AsyncElevenLabs(api_key=os.getenv("ELEVENLABS_API_KEY"))
async def main():
engine = await elevenlabs.speech_engine.create(
name="My Speech Engine",
speech_engine={"ws_url": os.environ["PUBLIC_WS_URL"]},
)
print(engine.engine_id)
asyncio.run(main())python
import asyncio
import os
from dotenv import load_dotenv
from elevenlabs import AsyncElevenLabs
load_dotenv()
elevenlabs = AsyncElevenLabs(api_key=os.getenv("ELEVENLABS_API_KEY"))
async def main():
engine = await elevenlabs.speech_engine.create(
name="My Speech Engine",
speech_engine={"ws_url": os.environ["PUBLIC_WS_URL"]},
)
print(engine.engine_id)
asyncio.run(main())TypeScript
TypeScript
typescript
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import "dotenv/config";
const elevenlabs = new ElevenLabsClient({
apiKey: process.env.ELEVENLABS_API_KEY,
});
const engine = await elevenlabs.speechEngine.create({
name: "My Speech Engine",
speechEngine: { wsUrl: process.env.PUBLIC_WS_URL! },
});
console.log(engine.engineId);PUBLIC_WS_URLhttps://example.ngrok.app/wsThe create request can also configure , , , / , and for custom voices, transcription keywords, turn-taking, server auth headers, and recording behavior. See the SDK reference files for expanded examples.
ttsasrturnspeech_engine.request_headersspeechEngine.requestHeadersprivacytypescript
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import "dotenv/config";
const elevenlabs = new ElevenLabsClient({
apiKey: process.env.ELEVENLABS_API_KEY,
});
const engine = await elevenlabs.speechEngine.create({
name: "My Speech Engine",
speechEngine: { wsUrl: process.env.PUBLIC_WS_URL! },
});
console.log(engine.engineId);本地环境下,应类似;部署时则为您的生产环境WebSocket路由。
PUBLIC_WS_URLhttps://example.ngrok.app/ws创建请求还可配置、、、 / 和,以实现自定义语音、转录关键词、话轮转换、服务器认证头及录制行为。扩展示例请查看SDK参考文档。
ttsasrturnspeech_engine.request_headersspeechEngine.requestHeadersprivacyServer Examples
服务器示例
Python
Python
python
import asyncio
import os
from dotenv import load_dotenv
from elevenlabs import AsyncElevenLabs
from openai import AsyncOpenAI
load_dotenv()
elevenlabs = AsyncElevenLabs(api_key=os.getenv("ELEVENLABS_API_KEY"))
openai = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
async def on_transcript(transcript, session):
stream = await openai.responses.create(
model=os.environ["OPENAI_MODEL"],
instructions="You are a concise, conversational voice assistant.",
input=[
{
"role": "assistant" if message.role == "agent" else message.role,
"content": message.content,
}
for message in transcript
],
stream=True,
)
await session.send_response(stream)
async def main():
engine = await elevenlabs.speech_engine.get(os.environ["ELEVENLABS_SPEECH_ENGINE_ID"])
await engine.serve(
port=3001,
path="/ws",
debug=True,
on_transcript=on_transcript,
)
asyncio.run(main())python
import asyncio
import os
from dotenv import load_dotenv
from elevenlabs import AsyncElevenLabs
from openai import AsyncOpenAI
load_dotenv()
elevenlabs = AsyncElevenLabs(api_key=os.getenv("ELEVENLABS_API_KEY"))
openai = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
async def on_transcript(transcript, session):
stream = await openai.responses.create(
model=os.environ["OPENAI_MODEL"],
instructions="You are a concise, conversational voice assistant.",
input=[
{
"role": "assistant" if message.role == "agent" else message.role,
"content": message.content,
}
for message in transcript
],
stream=True,
)
await session.send_response(stream)
async def main():
engine = await elevenlabs.speech_engine.get(os.environ["ELEVENLABS_SPEECH_ENGINE_ID"])
await engine.serve(
port=3001,
path="/ws",
debug=True,
on_transcript=on_transcript,
)
asyncio.run(main())TypeScript
TypeScript
typescript
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import { createServer } from "node:http";
import OpenAI from "openai";
import "dotenv/config";
const elevenlabs = new ElevenLabsClient({
apiKey: process.env.ELEVENLABS_API_KEY,
});
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const httpServer = createServer();
await elevenlabs.speechEngine.attach(
process.env.ELEVENLABS_SPEECH_ENGINE_ID!,
httpServer,
"/ws",
{
debug: true,
async onTranscript(transcript, signal, session) {
const response = await openai.responses.create(
{
model: process.env.OPENAI_MODEL!,
instructions: "You are a concise, conversational voice assistant.",
input: transcript.map((message) => ({
role: message.role === "agent" ? "assistant" : message.role,
content: message.content,
})),
stream: true,
},
{ signal },
);
session.sendResponse(response);
},
},
);
httpServer.listen(3001);In TypeScript, pass the from to the LLM request so user interruptions cancel the in-flight response. In Python, the SDK cancels the previous transcript handler when a newer transcript arrives.
AbortSignalonTranscripttypescript
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import { createServer } from "node:http";
import OpenAI from "openai";
import "dotenv/config";
const elevenlabs = new ElevenLabsClient({
apiKey: process.env.ELEVENLABS_API_KEY,
});
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const httpServer = createServer();
await elevenlabs.speechEngine.attach(
process.env.ELEVENLABS_SPEECH_ENGINE_ID!,
httpServer,
"/ws",
{
debug: true,
async onTranscript(transcript, signal, session) {
const response = await openai.responses.create(
{
model: process.env.OPENAI_MODEL!,
instructions: "You are a concise, conversational voice assistant.",
input: transcript.map((message) => ({
role: message.role === "agent" ? "assistant" : message.role,
content: message.content,
})),
stream: true,
},
{ signal },
);
session.sendResponse(response);
},
},
);
httpServer.listen(3001);在TypeScript中,需将中的传递给LLM请求,以便用户中断时取消正在进行的响应。在Python中,当有新的转录文本到达时,SDK会自动取消之前的转录处理程序。
onTranscriptAbortSignalBrowser Client
浏览器客户端
Create a server-side token endpoint and have the browser request a token before starting the microphone session. Keep the Speech Engine ID and API key on the server.
typescript
import express from "express";
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import "dotenv/config";
const app = express();
const elevenlabs = new ElevenLabsClient();
app.get("/api/token", async (_req, res) => {
const response = await elevenlabs.conversationalAi.conversations.getWebrtcToken({
agentId: process.env.ELEVENLABS_SPEECH_ENGINE_ID!,
});
res.json({ token: response.token });
});React clients can use :
@elevenlabs/reacttsx
import { useConversation } from "@elevenlabs/react";
export function VoiceControls() {
const conversation = useConversation({
onConnect: () => console.log("connected"),
onDisconnect: () => console.log("disconnected"),
onError: (error) => console.error(error),
});
async function startConversation() {
await navigator.mediaDevices.getUserMedia({ audio: true });
const { token } = await fetch("/api/token").then((res) => res.json());
await conversation.startSession({
conversationToken: token,
overrides: {
agent: { firstMessage: "Hello! How can I help you today?" },
},
});
}
return <button onClick={startConversation}>Start conversation</button>;
}If a WebRTC browser session stalls or logs 404s, , or , pin to in the app's until the upstream LiveKit compatibility issue is resolved:
/rtc/v1v1 RTC path not foundcould not establish pc connectionlivekit-client2.16.1package.jsonjson
{
"overrides": {
"livekit-client": "2.16.1"
}
}创建服务器端令牌端点,让浏览器在启动麦克风会话前请求令牌。请将Speech Engine ID和API密钥保留在服务器端。
typescript
import express from "express";
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import "dotenv/config";
const app = express();
const elevenlabs = new ElevenLabsClient();
app.get("/api/token", async (_req, res) => {
const response = await elevenlabs.conversationalAi.conversations.getWebrtcToken({
agentId: process.env.ELEVENLABS_SPEECH_ENGINE_ID!,
});
res.json({ token: response.token });
});React客户端可使用:
@elevenlabs/reacttsx
import { useConversation } from "@elevenlabs/react";
export function VoiceControls() {
const conversation = useConversation({
onConnect: () => console.log("connected"),
onDisconnect: () => console.log("disconnected"),
onError: (error) => console.error(error),
});
async function startConversation() {
await navigator.mediaDevices.getUserMedia({ audio: true });
const { token } = await fetch("/api/token").then((res) => res.json());
await conversation.startSession({
conversationToken: token,
overrides: {
agent: { firstMessage: "Hello! How can I help you today?" },
},
});
}
return <button onClick={startConversation}>Start conversation</button>;
}如果WebRTC浏览器会话停滞,或日志中出现 404错误、或,请在应用的中将固定为版本,直至上游LiveKit兼容性问题解决:
/rtc/v1v1 RTC path not foundcould not establish pc connectionpackage.jsonlivekit-client2.16.1json
{
"overrides": {
"livekit-client": "2.16.1"
}
}Callbacks and Events
回调与事件
| Event | TypeScript callback | Python callback | Notes |
|---|---|---|---|
| | | Full conversation history for the current turn |
| | | Conversation ID becomes available |
| | | Clean disconnect |
| | | Unexpected WebSocket drop |
| | | Protocol or WebSocket error |
Transcript messages use role or . Map to when passing history to LLM APIs that expect OpenAI-style roles.
"user""agent""agent""assistant"| 事件 | TypeScript回调 | Python回调 | 说明 |
|---|---|---|---|
| | | 当前话轮的完整对话历史 |
| | | 对话ID可用 |
| | | 正常断开连接 |
| | | WebSocket意外断开 |
| | | 协议或WebSocket错误 |
转录消息使用角色或。当将历史记录传递给期望OpenAI风格角色的LLM API时,需将映射为。
"user""agent""agent""assistant"References
参考资料
- Installation Guide
- JavaScript SDK Reference
- Python SDK Reference
- 安装指南
- JavaScript SDK参考文档
- Python SDK参考文档