livekit-agents

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

LiveKit Agents Development for LiveKit Cloud

基于LiveKit Cloud的LiveKit Agents开发

This skill provides opinionated guidance for building voice AI agents with LiveKit Cloud. It assumes you are using LiveKit Cloud (the recommended path) and encodes how to approach agent development, not API specifics. All factual information about APIs, methods, and configurations must come from live documentation.
This skill is for LiveKit Cloud developers. If you're self-hosting LiveKit, some recommendations (particularly around LiveKit Inference) won't apply directly.
本技能为使用LiveKit Cloud构建语音AI代理提供指导性建议。假设你使用的是LiveKit Cloud(推荐方案),并阐述了代理开发的方法,而非具体API细节。所有关于API、方法和配置的事实信息必须来自实时文档。
本技能适用于LiveKit Cloud开发者。 如果你是自托管LiveKit,部分建议(尤其是关于LiveKit Inference的内容)可能无法直接适用。

MANDATORY: Read This Checklist Before Starting

强制要求:开始前请阅读此检查清单

Before writing ANY code, complete this checklist:
  1. Read this entire skill document - Do not skip sections even if MCP is available
  2. Ensure LiveKit Cloud project is connected - You need
    LIVEKIT_URL
    ,
    LIVEKIT_API_KEY
    , and
    LIVEKIT_API_SECRET
    from your Cloud project
  3. Set up documentation access - Use MCP if available, otherwise use web search
  4. Plan to write tests - Every agent implementation MUST include tests (see testing section below)
  5. Verify all APIs against live docs - Never rely on model memory for LiveKit APIs
This checklist applies regardless of whether MCP is available. MCP provides documentation access but does NOT replace the guidance in this skill.
在编写任何代码之前,请完成以下检查清单:
  1. 阅读完整的技能文档 - 即使有MCP可用,也不要跳过任何章节
  2. 确保已连接LiveKit Cloud项目 - 你需要从云项目中获取
    LIVEKIT_URL
    LIVEKIT_API_KEY
    LIVEKIT_API_SECRET
  3. 设置文档访问权限 - 如果有MCP可用则使用它,否则使用网页搜索
  4. 规划编写测试 - 每个代理实现都必须包含测试(见下方测试章节)
  5. 对照实时文档验证所有API - 永远不要依赖模型记忆中的LiveKit API信息
无论MCP是否可用,此检查清单均适用。MCP提供文档访问权限,但不能替代本技能中的指导内容。

LiveKit Cloud Setup

LiveKit Cloud设置

LiveKit Cloud is the fastest way to get a voice agent running. It provides:
  • Managed infrastructure (no servers to deploy)
  • LiveKit Inference for AI models (no separate API keys needed)
  • Built-in noise cancellation, turn detection, and other voice features
  • Simple credential management
LiveKit Cloud是快速运行语音代理的最佳方式。它提供:
  • 托管式基础设施(无需部署服务器)
  • LiveKit Inference AI模型服务(无需额外API密钥)
  • 内置降噪、话轮检测等语音功能
  • 简单的凭证管理

Connect to Your Cloud Project

连接你的云项目

  1. Sign up at cloud.livekit.io if you haven't already
  2. Create a project (or use an existing one)
  3. Get your credentials from the project settings:
    • LIVEKIT_URL
      - Your project's WebSocket URL (e.g.,
      wss://your-project.livekit.cloud
      )
    • LIVEKIT_API_KEY
      - API key for authentication
    • LIVEKIT_API_SECRET
      - API secret for authentication
  4. Set these as environment variables (typically in
    .env.local
    ):
bash
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your-api-key
LIVEKIT_API_SECRET=your-api-secret
The LiveKit CLI can automate credential setup. Consult the CLI documentation for current commands.
  1. 如果你还没有账号,请在cloud.livekit.io注册
  2. 创建一个项目(或使用现有项目)
  3. 从项目设置中获取你的凭证:
    • LIVEKIT_URL
      - 你的项目WebSocket URL(例如:
      wss://your-project.livekit.cloud
    • LIVEKIT_API_KEY
      - 用于身份验证的API密钥
    • LIVEKIT_API_SECRET
      - 用于身份验证的API密钥
  4. 将这些设置为环境变量(通常在
    .env.local
    中):
bash
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your-api-key
LIVEKIT_API_SECRET=your-api-secret
LiveKit CLI可以自动完成凭证设置。请查阅CLI文档获取当前命令。

Use LiveKit Inference for AI Models

使用LiveKit Inference获取AI模型

LiveKit Inference is the recommended way to use AI models with LiveKit Cloud. It provides access to leading AI model providers—all through your LiveKit credentials with no separate API keys needed.
Benefits of LiveKit Inference:
  • No separate API keys to manage for each AI provider
  • Billing consolidated through your LiveKit Cloud account
  • Optimized for voice AI workloads
Consult the documentation for available models, supported providers, and current usage patterns. The documentation always has the most up-to-date information.
LiveKit Inference是在LiveKit Cloud中使用AI模型的推荐方式。 它允许你访问主流AI模型提供商的服务——所有操作都通过你的LiveKit凭证完成,无需额外API密钥。
LiveKit Inference的优势:
  • 无需为每个AI提供商管理单独的API密钥
  • 账单通过你的LiveKit Cloud账户统一结算
  • 针对语音AI工作负载优化
请查阅文档了解可用模型、支持的提供商和当前使用模式。文档始终包含最新信息。

Critical Rule: Never Trust Model Memory for LiveKit APIs

重要规则:永远不要依赖模型记忆中的LiveKit API

LiveKit Agents is a fast-evolving SDK. Model training data is outdated the moment it's created. When working with LiveKit:
  • Never assume API signatures, method names, or configuration options from memory
  • Never guess SDK behavior or default values
  • Always verify against live documentation before writing code
  • Always cite the documentation source when implementing features
This rule applies even when confident about an API. Verify anyway.
LiveKit Agents是一个快速迭代的SDK。模型训练数据在生成的那一刻就已经过时。使用LiveKit时:
  • 永远不要假设 API签名、方法名称或配置选项的正确性
  • 永远不要猜测 SDK的行为或默认值
  • 编写代码前务必 对照实时文档验证
  • 实现功能时务必 引用文档来源
即使你对某个API很有信心,也要遵守此规则,务必进行验证。

REQUIRED: Use LiveKit MCP Server for Documentation

强制要求:使用LiveKit MCP服务器获取文档

Before writing any LiveKit code, ensure access to the LiveKit documentation MCP server. This provides current, verified API information and prevents reliance on stale model knowledge.
在编写任何LiveKit代码之前,请确保可以访问LiveKit文档MCP服务器。它提供最新、经过验证的API信息,避免依赖过时的模型知识。

Check for MCP Availability

检查MCP可用性

Look for
livekit-docs
MCP tools. If available, use them for all documentation lookups:
  • Search documentation before implementing any feature
  • Verify API signatures and method parameters
  • Look up configuration options and their valid values
  • Find working examples for the specific task at hand
查找
livekit-docs
MCP工具。如果可用,所有文档查询都应使用它:
  • 实现任何功能前先搜索文档
  • 验证API签名和方法参数
  • 查找配置选项及其有效值
  • 查找特定任务的可用示例

If MCP Is Not Available

当MCP不可用时

If the LiveKit MCP server is not configured, inform the user and recommend installation. Installation instructions for all supported platforms are available at:
Fetch the installation instructions appropriate for the user's coding agent from that page.
如果未配置LiveKit MCP服务器,请告知用户并建议安装。所有支持平台的安装说明可在以下地址获取:
从该页面获取适合用户编码代理的安装说明。

Fallback When MCP Unavailable

MCP不可用时的备选方案

If MCP cannot be installed in the current session:
  1. Inform the user immediately that documentation cannot be verified in real-time
  2. Use web search to fetch current documentation from docs.livekit.io
  3. Explicitly mark all LiveKit-specific code with a comment like
    # UNVERIFIED: Please check docs.livekit.io for current API
  4. State clearly when you cannot verify something: "I cannot verify this API signature against current documentation"
  5. Recommend the user verify against https://docs.livekit.io before using the code
如果当前会话无法安装MCP:
  1. 立即告知用户 无法实时验证文档
  2. 使用网页搜索从docs.livekit.io获取最新文档
  3. 明确标记所有LiveKit特定代码,添加类似
    # 未验证:请查阅docs.livekit.io获取当前API
    的注释
  4. 明确说明 无法验证的内容:“我无法对照当前文档验证此API签名”
  5. 建议用户在使用代码前先查阅https://docs.livekit.io进行验证

Voice Agent Architecture Principles

语音代理架构原则

Voice AI agents have fundamentally different requirements than text-based agents or traditional software. Internalize these principles:
语音AI代理的需求与文本代理或传统软件有本质区别。请牢记以下原则:

Latency Is Critical

延迟至关重要

Voice conversations are real-time. Users expect responses within hundreds of milliseconds, not seconds. Every architectural decision should consider latency impact:
  • Minimize LLM context size to reduce inference time
  • Avoid unnecessary tool calls during active conversation
  • Prefer streaming responses over batch responses
  • Design for the unhappy path (network delays, API timeouts)
语音对话是实时的。用户期望在数百毫秒内得到响应,而非数秒。每个架构决策都应考虑延迟影响:
  • 最小化LLM上下文长度以减少推理时间
  • 对话过程中避免不必要的工具调用
  • 优先使用流式响应而非批量响应
  • 针对异常情况(网络延迟、API超时)设计处理逻辑

Context Bloat Kills Performance

上下文膨胀会严重影响性能

Large system prompts and extensive tool lists directly increase latency. A voice agent with 50 tools and a 10,000-token system prompt will feel sluggish regardless of model speed.
Design agents with minimal viable context:
  • Include only tools relevant to the current conversation phase
  • Keep system prompts focused and concise
  • Remove tools and context that aren't actively needed
庞大的系统提示和工具列表会直接增加延迟。一个拥有50个工具和10000词系统提示的语音代理,无论模型速度多快,都会反应迟缓。
设计代理时应最小化必要上下文:
  • 仅包含与当前对话阶段相关的工具
  • 保持系统提示简洁且重点突出
  • 移除不需要的工具和上下文

Users Don't Read, They Listen

用户不会阅读,只会倾听

Voice interface constraints differ from text:
  • Long responses frustrate users—keep outputs concise
  • Users cannot scroll back—ensure clarity on first delivery
  • Interruptions are normal—design for graceful handling
  • Silence feels broken—acknowledge processing when needed
语音界面的限制与文本不同:
  • 冗长的回复会让用户感到厌烦——请保持输出简洁
  • 用户无法回溯内容——首次回复必须清晰明确
  • 中断是正常现象——设计优雅的处理逻辑
  • 沉默会让用户觉得系统故障——必要时需告知用户正在处理

Workflow Architecture: Handoffs and Tasks

工作流架构:切换与任务

Complex voice agents should not be monolithic. LiveKit Agents supports structured workflows that maintain low latency while handling sophisticated use cases.
复杂的语音代理不应是单体式的。LiveKit Agents支持结构化工作流,可在处理复杂场景的同时保持低延迟。

The Problem with Monolithic Agents

单体代理的问题

A single agent handling an entire conversation flow accumulates:
  • Tools for every possible action (bloated tool list)
  • Instructions for every conversation phase (bloated context)
  • State management for all scenarios (complexity)
This creates latency and reduces reliability.
一个处理整个对话流程的单代理会累积:
  • 适用于所有可能操作的工具(工具列表膨胀)
  • 适用于所有对话阶段的指令(上下文膨胀)
  • 适用于所有场景的状态管理(复杂度提升)
这会导致延迟增加,可靠性降低。

Handoffs: Agent-to-Agent Transitions

切换:代理间的控制权转移

Handoffs allow one agent to transfer control to another. Use handoffs to:
  • Separate distinct conversation phases (greeting → intake → resolution)
  • Isolate specialized capabilities (general support → billing specialist)
  • Manage context boundaries (each agent has only what it needs)
Design handoffs around natural conversation boundaries where context can be summarized rather than transferred wholesale.
切换功能允许一个代理将控制权转移给另一个代理。使用切换功能可以:
  • 分离不同的对话阶段(问候 → 信息收集 → 问题解决)
  • 隔离专业能力(通用支持 → 账单专员)
  • 管理上下文边界(每个代理仅拥有所需的上下文)
围绕自然对话边界设计切换功能,此时上下文可以被总结而非完整转移。

Tasks: Scoped Operations

任务:限定范围的操作

Tasks are tightly-scoped prompts designed to achieve a specific outcome. Use tasks for:
  • Discrete operations that don't require full agent capabilities
  • Situations where a focused prompt outperforms a general-purpose agent
  • Reducing context when only a specific capability is needed
Consult the documentation for implementation details on handoffs and tasks.
任务是为实现特定结果而设计的限定范围提示。使用任务可以:
  • 执行不需要完整代理能力的离散操作
  • 当聚焦式提示的表现优于通用代理时
  • 仅需要特定能力时减少上下文
请查阅文档了解切换和任务的实现细节。

REQUIRED: Write Tests for Agent Behavior

强制要求:为代理行为编写测试

Voice agent behavior is code. Every agent implementation MUST include tests. Shipping an agent without tests is shipping untested code.
语音代理的行为属于代码范畴。每个代理实现都必须包含测试。发布未测试的代理等同于发布未测试的代码。

Mandatory Testing Workflow

强制测试工作流

When building or modifying a LiveKit agent:
  1. Create a
    tests/
    directory
    if one doesn't exist
  2. Write at least one test before considering the implementation complete
  3. Test the core behavior the user requested
  4. Run the tests to verify they pass
A minimal test file structure:
project/
├── agent.py (or src/agent.py)
└── tests/
    └── test_agent.py
构建或修改LiveKit代理时:
  1. 创建
    tests/
    目录
    (如果不存在)
  2. 至少编写一个测试,再考虑实现是否完成
  3. 测试用户请求的核心行为
  4. 运行测试以验证通过
最小测试文件结构:
project/
├── agent.py (或 src/agent.py)
└── tests/
    └── test_agent.py

Test-Driven Development Process

测试驱动开发流程

When modifying agent behavior—instructions, tool descriptions, workflows—begin by writing tests for the desired behavior:
  1. Define what the agent should do in specific scenarios
  2. Write test cases that verify this behavior
  3. Implement the feature
  4. Iterate until tests pass
This approach prevents shipping agents that "seem to work" but fail in production.
修改代理行为(指令、工具描述、工作流)时,首先为期望行为编写测试:
  1. 定义代理在特定场景下应执行的操作
  2. 编写测试用例验证该行为
  3. 实现功能
  4. 迭代直到测试通过
这种方法可以避免发布“看似可用”但在生产环境中故障的代理。

What Every Agent Test Should Cover

每个代理测试都应覆盖的内容

At minimum, write tests for:
  • Basic conversation flow: Agent responds appropriately to a greeting
  • Tool invocation (if tools exist): Tools are called with correct parameters
  • Error handling: Agent handles unexpected input gracefully
Focus tests on:
  • Tool invocation: Does the agent call the right tools with correct parameters?
  • Response quality: Does the agent produce appropriate responses for given inputs?
  • Workflow transitions: Do handoffs and tasks trigger correctly?
  • Edge cases: How does the agent handle unexpected input, interruptions, silence?
至少编写以下测试:
  • 基本对话流程:代理对问候做出适当响应
  • 工具调用(如果存在工具):工具被调用时参数正确
  • 错误处理:代理可以优雅处理意外输入
测试重点:
  • 工具调用:代理是否调用了正确的工具并传入正确参数?
  • 回复质量:代理针对给定输入是否生成了合适的回复?
  • 工作流转换:切换和任务是否正确触发?
  • 边缘情况:代理如何处理意外输入、中断和沉默?

Test Implementation Pattern

测试实现模式

Use LiveKit's testing framework. Consult the testing documentation via MCP for current patterns:
search: "livekit agents testing"
The framework supports:
  • Simulated user input
  • Verification of agent responses
  • Tool call assertions
  • Workflow transition testing
使用LiveKit的测试框架。通过MCP查阅测试文档获取当前模式:
search: "livekit agents testing"
该框架支持:
  • 模拟用户输入
  • 验证代理回复
  • 工具调用断言
  • 工作流转换测试

Why This Is Non-Negotiable

为何这是必不可少的

Agents that "seem to work" in manual testing frequently fail in production:
  • Prompt changes silently break behavior
  • Tool descriptions affect when tools are called
  • Model updates change response patterns
Tests catch these issues before users do.
在手动测试中“看似可用”的代理往往在生产环境中故障:
  • 提示变更会悄然破坏行为
  • 工具描述会影响工具调用时机
  • 模型更新会改变回复模式
测试可以在用户发现问题前捕获这些问题。

Skipping Tests

跳过测试的情况

If a user explicitly requests no tests, proceed without them but inform them:
"I've built the agent without tests as requested. I strongly recommend adding tests before deploying to production. Voice agents are difficult to verify manually and tests prevent silent regressions."
如果用户明确要求不编写测试,可以按要求构建代理,但需告知用户:
“已按照要求构建了未包含测试的代理。我强烈建议在部署到生产环境前添加测试。语音代理难以手动验证,测试可以防止隐性回归问题。”

Common Mistakes to Avoid

需避免的常见错误

Overloading the Initial Agent

初始代理功能过载

Starting with one agent that "does everything" and adding tools/instructions over time. Instead, design workflow structure upfront, even if initial implementation is simple.
从一个“无所不能”的代理开始,随着时间推移不断添加工具和指令。相反,即使初始实现简单,也应提前设计工作流结构。

Ignoring Latency Until It's a Problem

直到出现问题才关注延迟

Latency issues compound. An agent that feels "a bit slow" in development becomes unusable in production with real network conditions. Measure and optimize latency continuously.
延迟问题会不断累积。在开发中感觉“有点慢”的代理,在真实网络环境的生产环境中会变得无法使用。持续测量并优化延迟。

Copying Examples Without Understanding

盲目复制示例而不理解其原理

Examples in documentation demonstrate specific patterns. Copying code without understanding its purpose leads to bloated, poorly-structured agents. Understand what each component does before including it.
文档中的示例展示了特定模式。盲目复制代码而不理解其用途会导致代理臃肿、结构混乱。在添加组件前,请先理解每个组件的作用。

Skipping Tests Because "It's Just Prompts"

因为“只是提示”而跳过测试

Agent behavior is code. Prompt changes affect behavior as much as code changes. Test agent behavior with the same rigor as traditional software. Never deliver an agent implementation without at least one test file.
代理行为属于代码范畴。提示变更对行为的影响与代码变更相同。请以与传统软件相同的严谨性测试代理行为。永远不要交付未包含至少一个测试文件的代理实现。

Assuming Model Knowledge Is Current

假设模型知识是最新的

Reiterating the critical rule: never trust model memory for LiveKit APIs. The SDK evolves faster than model training cycles. Verify everything.
再次强调重要规则:永远不要依赖模型记忆中的LiveKit API。SDK的迭代速度快于模型训练周期。所有内容都必须验证。

When to Consult Documentation

何时需要查阅文档

Always consult documentation for:
  • API method signatures and parameters
  • Configuration options and their valid values
  • SDK version-specific features or changes
  • Deployment and infrastructure setup
  • Model provider integration details
  • CLI commands and flags
This skill provides guidance on:
  • Architectural approach and design principles
  • Workflow structure decisions
  • Testing strategy
  • Common pitfalls to avoid
The distinction matters: this skill tells you how to think about building voice agents. The documentation tells you how to implement specific features.
以下情况务必查阅文档:
  • API方法签名和参数
  • 配置选项及其有效值
  • SDK版本特定的功能或变更
  • 部署和基础设施设置
  • 模型提供商集成细节
  • CLI命令和标志
本技能提供以下内容的指导:
  • 架构方法和设计原则
  • 工作流结构决策
  • 测试策略
  • 需避免的常见陷阱
两者的区别很重要:本技能告诉你如何构建语音代理的思路,而文档告诉你如何实现具体功能。

Feedback Loop

反馈循环

When using LiveKit documentation via MCP, note any gaps, outdated information, or confusing content. Reporting documentation issues helps improve the ecosystem for all developers.
当通过MCP使用LiveKit文档时,请记录任何文档缺口、过时信息或易混淆内容。报告文档问题有助于为所有开发者改善生态系统。

Summary

总结

Building effective voice agents with LiveKit Cloud requires:
  1. Use LiveKit Cloud + LiveKit Inference as the foundation—it's the fastest path to production
  2. Verify everything against live documentation—never trust model memory
  3. Minimize latency at every architectural decision point
  4. Structure workflows using handoffs and tasks to manage complexity
  5. Test behavior before and after changes—never ship without tests
  6. Keep context minimal—only include what's needed for the current phase
These principles remain valid regardless of SDK version or API changes. For all implementation specifics, consult the LiveKit documentation via MCP.
使用LiveKit Cloud构建高效语音代理需要:
  1. 以LiveKit Cloud + LiveKit Inference为基础——这是最快的生产部署路径
  2. 所有内容都要验证——永远不要依赖模型记忆,务必对照实时文档
  3. 在每个架构决策点都要最小化延迟
  4. 使用切换和任务构建工作流以管理复杂度
  5. 变更前后都要测试行为——永远不要发布未测试的代理
  6. 保持上下文最小化——仅包含当前阶段所需的内容
无论SDK版本或API如何变化,这些原则都始终有效。所有实现细节请通过MCP查阅LiveKit文档。