agents-py

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

LiveKit Agents Python SDK

LiveKit Agents Python SDK

Build voice AI agents with LiveKit's Python Agents SDK.
使用LiveKit的Python Agents SDK构建语音AI代理。

LiveKit MCP server tools

LiveKit MCP服务器工具

This skill works alongside the LiveKit MCP server, which provides direct access to the latest LiveKit documentation, code examples, and changelogs. Use these tools when you need up-to-date information that may have changed since this skill was created.
Available MCP tools:
  • docs_search
    - Search the LiveKit docs site
  • get_pages
    - Fetch specific documentation pages by path
  • get_changelog
    - Get recent releases and updates for LiveKit packages
  • code_search
    - Search LiveKit repositories for code examples
  • get_python_agent_example
    - Browse 100+ Python agent examples
When to use MCP tools:
  • You need the latest API documentation or feature updates
  • You're looking for recent examples or code patterns
  • You want to check if a feature has been added in recent releases
  • The local references don't cover a specific topic
When to use local references:
  • You need quick access to core concepts covered in this skill
  • You're working offline or want faster access to common patterns
  • The information in the references is sufficient for your needs
Use MCP tools and local references together for the best experience.
本技能可与LiveKit MCP服务器配合使用,该服务器可直接获取最新的LiveKit文档、代码示例和更新日志。当你需要本技能创建后可能发生变化的最新信息时,可以使用这些工具。
可用的MCP工具:
  • docs_search
    - 搜索LiveKit文档网站
  • get_pages
    - 通过路径获取特定文档页面
  • get_changelog
    - 获取LiveKit软件包的近期版本更新
  • code_search
    - 搜索LiveKit代码仓库中的示例代码
  • get_python_agent_example
    - 浏览100+个Python代理示例
何时使用MCP工具:
  • 你需要最新的API文档或功能更新
  • 你正在寻找最新的示例或代码模式
  • 你想检查某个功能是否在近期版本中新增
  • 本地参考内容未覆盖特定主题
何时使用本地参考:
  • 你需要快速获取本技能涵盖的核心概念
  • 你处于离线状态或希望快速访问常用模式
  • 参考中的信息已能满足你的需求
建议结合使用MCP工具和本地参考以获得最佳体验。

References

参考资源

Consult these resources as needed:
  • ./references/livekit-overview.md -- LiveKit ecosystem overview and how these skills work together
  • ./references/agent-session.md -- AgentSession lifecycle, events, and configuration
  • ./references/tools.md -- Function tools, RunContext, and tool results
  • ./references/models.md -- STT, LLM, TTS model strings and plugin configuration
  • ./references/workflows.md -- Multi-agent handoffs, Tasks, TaskGroups, and pipeline nodes
按需查阅以下资源:
  • ./references/livekit-overview.md -- LiveKit生态系统概述以及这些技能如何协同工作
  • ./references/agent-session.md -- AgentSession生命周期、事件和配置
  • ./references/tools.md -- 函数工具、RunContext和工具结果
  • ./references/models.md -- STT、LLM、TTS模型字符串和插件配置
  • ./references/workflows.md -- 多代理切换、Tasks、TaskGroups和流水线节点

Installation

安装

bash
uv add "livekit-agents[silero,turn-detector]~=1.3" \
  "livekit-plugins-noise-cancellation~=0.2" \
  "python-dotenv"
bash
uv add "livekit-agents[silero,turn-detector]~=1.3" \
  "livekit-plugins-noise-cancellation~=0.2" \
  "python-dotenv"

Environment variables

环境变量

Use the LiveKit CLI to load your credentials into a
.env.local
file:
bash
lk app env -w
Or manually create a
.env.local
file:
bash
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret
LIVEKIT_URL=wss://your-project.livekit.cloud
使用LiveKit CLI将你的凭据加载到
.env.local
文件中:
bash
lk app env -w
或者手动创建
.env.local
文件:
bash
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret
LIVEKIT_URL=wss://your-project.livekit.cloud

Quick start

快速开始

Basic agent with STT-LLM-TTS pipeline

基于STT-LLM-TTS流水线的基础代理

python
from dotenv import load_dotenv
from livekit import agents, rtc
from livekit.agents import AgentSession, Agent, AgentServer, room_io
from livekit.plugins import noise_cancellation, silero
from livekit.plugins.turn_detector.multilingual import MultilingualModel

load_dotenv(".env.local")

class Assistant(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="""You are a helpful voice AI assistant.
            Keep responses concise, 1-3 sentences. No markdown or emojis.""",
        )

server = AgentServer()

@server.rtc_session()
async def entrypoint(ctx: agents.JobContext):
    session = AgentSession(
        stt="assemblyai/universal-streaming:en",
        llm="openai/gpt-4.1-mini",
        tts="cartesia/sonic-3:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
        vad=silero.VAD.load(),
        turn_detection=MultilingualModel(),
    )

    await session.start(
        room=ctx.room,
        agent=Assistant(),
        room_options=room_io.RoomOptions(
            audio_input=room_io.AudioInputOptions(
                noise_cancellation=lambda params: noise_cancellation.BVCTelephony()
                    if params.participant.kind == rtc.ParticipantKind.PARTICIPANT_KIND_SIP
                    else noise_cancellation.BVC(),
            ),
        ),
    )

    await session.generate_reply(
        instructions="Greet the user and offer your assistance."
    )

if __name__ == "__main__":
    agents.cli.run_app(server)
python
from dotenv import load_dotenv
from livekit import agents, rtc
from livekit.agents import AgentSession, Agent, AgentServer, room_io
from livekit.plugins import noise_cancellation, silero
from livekit.plugins.turn_detector.multilingual import MultilingualModel

load_dotenv(".env.local")

class Assistant(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="""You are a helpful voice AI assistant.
            Keep responses concise, 1-3 sentences. No markdown or emojis.""",
        )

server = AgentServer()

@server.rtc_session()
async def entrypoint(ctx: agents.JobContext):
    session = AgentSession(
        stt="assemblyai/universal-streaming:en",
        llm="openai/gpt-4.1-mini",
        tts="cartesia/sonic-3:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
        vad=silero.VAD.load(),
        turn_detection=MultilingualModel(),
    )

    await session.start(
        room=ctx.room,
        agent=Assistant(),
        room_options=room_io.RoomOptions(
            audio_input=room_io.AudioInputOptions(
                noise_cancellation=lambda params: noise_cancellation.BVCTelephony()
                    if params.participant.kind == rtc.ParticipantKind.PARTICIPANT_KIND_SIP
                    else noise_cancellation.BVC(),
            ),
        ),
    )

    await session.generate_reply(
        instructions="Greet the user and offer your assistance."
    )

if __name__ == "__main__":
    agents.cli.run_app(server)

Basic agent with realtime model

基于实时模型的基础代理

python
from dotenv import load_dotenv
from livekit import agents, rtc
from livekit.agents import AgentSession, Agent, AgentServer, room_io
from livekit.plugins import openai, noise_cancellation

load_dotenv(".env.local")

class Assistant(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="You are a helpful voice AI assistant."
        )

server = AgentServer()

@server.rtc_session()
async def entrypoint(ctx: agents.JobContext):
    session = AgentSession(
        llm=openai.realtime.RealtimeModel(voice="coral")
    )

    await session.start(
        room=ctx.room,
        agent=Assistant(),
        room_options=room_io.RoomOptions(
            audio_input=room_io.AudioInputOptions(
                noise_cancellation=lambda params: noise_cancellation.BVCTelephony()
                    if params.participant.kind == rtc.ParticipantKind.PARTICIPANT_KIND_SIP
                    else noise_cancellation.BVC(),
            ),
        ),
    )

    await session.generate_reply(
        instructions="Greet the user and offer your assistance."
    )

if __name__ == "__main__":
    agents.cli.run_app(server)
python
from dotenv import load_dotenv
from livekit import agents, rtc
from livekit.agents import AgentSession, Agent, AgentServer, room_io
from livekit.plugins import openai, noise_cancellation

load_dotenv(".env.local")

class Assistant(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="You are a helpful voice AI assistant."
        )

server = AgentServer()

@server.rtc_session()
async def entrypoint(ctx: agents.JobContext):
    session = AgentSession(
        llm=openai.realtime.RealtimeModel(voice="coral")
    )

    await session.start(
        room=ctx.room,
        agent=Assistant(),
        room_options=room_io.RoomOptions(
            audio_input=room_io.AudioInputOptions(
                noise_cancellation=lambda params: noise_cancellation.BVCTelephony()
                    if params.participant.kind == rtc.ParticipantKind.PARTICIPANT_KIND_SIP
                    else noise_cancellation.BVC(),
            ),
        ),
    )

    await session.generate_reply(
        instructions="Greet the user and offer your assistance."
    )

if __name__ == "__main__":
    agents.cli.run_app(server)

Core concepts

核心概念

Agent class

Agent类

Define agent behavior by subclassing
Agent
:
python
from livekit.agents import Agent, function_tool

class MyAgent(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="Your system prompt here",
        )

    async def on_enter(self) -> None:
        """Called when agent becomes active."""
        await self.session.generate_reply(
            instructions="Greet the user"
        )

    async def on_exit(self) -> None:
        """Called before agent hands off to another agent."""
        pass

    @function_tool()
    async def my_tool(self, param: str) -> str:
        """Tool description for the LLM."""
        return f"Result: {param}"
通过继承
Agent
类定义代理行为:
python
from livekit.agents import Agent, function_tool

class MyAgent(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="Your system prompt here",
        )

    async def on_enter(self) -> None:
        """Called when agent becomes active."""
        await self.session.generate_reply(
            instructions="Greet the user"
        )

    async def on_exit(self) -> None:
        """Called before agent hands off to another agent."""
        pass

    @function_tool()
    async def my_tool(self, param: str) -> str:
        """Tool description for the LLM."""
        return f"Result: {param}"

AgentSession

AgentSession

The session orchestrates the voice pipeline:
python
session = AgentSession(
    stt="assemblyai/universal-streaming:en",
    llm="openai/gpt-4.1-mini",
    tts="cartesia/sonic-3:voice_id",
    vad=silero.VAD.load(),
    turn_detection=MultilingualModel(),
)
Key methods:
  • session.start(room, agent)
    - Start the session
  • session.say(text)
    - Speak text directly
  • session.generate_reply(instructions)
    - Generate LLM response
  • session.interrupt()
    - Stop current speech
  • session.update_agent(new_agent)
    - Switch to different agent
会话负责编排语音流水线:
python
session = AgentSession(
    stt="assemblyai/universal-streaming:en",
    llm="openai/gpt-4.1-mini",
    tts="cartesia/sonic-3:voice_id",
    vad=silero.VAD.load(),
    turn_detection=MultilingualModel(),
)
关键方法:
  • session.start(room, agent)
    - 启动会话
  • session.say(text)
    - 直接朗读文本
  • session.generate_reply(instructions)
    - 生成LLM响应
  • session.interrupt()
    - 停止当前语音
  • session.update_agent(new_agent)
    - 切换到其他代理

Function tools

函数工具

Use the
@function_tool
decorator:
python
from livekit.agents import function_tool, RunContext

@function_tool()
async def get_weather(self, context: RunContext, location: str) -> str:
    """Get the current weather for a location."""
    return f"Weather in {location}: Sunny, 72°F"
使用
@function_tool
装饰器:
python
from livekit.agents import function_tool, RunContext

@function_tool()
async def get_weather(self, context: RunContext, location: str) -> str:
    """Get the current weather for a location."""
    return f"Weather in {location}: Sunny, 72°F"

Running the agent

运行代理

bash
undefined
bash
undefined

Development mode with auto-reload

开发模式(自动重载)

uv run agent.py dev
uv run agent.py dev

Console mode (local testing)

控制台模式(本地测试)

uv run agent.py console
uv run agent.py console

Production mode

生产模式

uv run agent.py start
uv run agent.py start

Download required model files

下载所需模型文件

uv run agent.py download-files
undefined
uv run agent.py download-files
undefined

LiveKit Inference model strings

LiveKit Inference模型字符串

Use model strings for simple configuration without API keys:
STT (Speech-to-Text):
  • "assemblyai/universal-streaming:en"
    - AssemblyAI streaming
  • "deepgram/nova-3:en"
    - Deepgram Nova
  • "cartesia/ink"
    - Cartesia STT
LLM (Large Language Model):
  • "openai/gpt-4.1-mini"
    - GPT-4.1 mini (recommended)
  • "openai/gpt-4.1"
    - GPT-4.1
  • "openai/gpt-5"
    - GPT-5
  • "gemini/gemini-3-flash"
    - Gemini 3 Flash
  • "gemini/gemini-2.5-flash"
    - Gemini 2.5 Flash
TTS (Text-to-Speech):
  • "cartesia/sonic-3:{voice_id}"
    - Cartesia Sonic 3
  • "elevenlabs/eleven_turbo_v2_5:{voice_id}"
    - ElevenLabs
  • "deepgram/aura:{voice}"
    - Deepgram Aura
使用模型字符串进行简单配置,无需API密钥:
STT(语音转文字):
  • "assemblyai/universal-streaming:en"
    - AssemblyAI流式处理
  • "deepgram/nova-3:en"
    - Deepgram Nova
  • "cartesia/ink"
    - Cartesia STT
LLM(大语言模型):
  • "openai/gpt-4.1-mini"
    - GPT-4.1 mini(推荐)
  • "openai/gpt-4.1"
    - GPT-4.1
  • "openai/gpt-5"
    - GPT-5
  • "gemini/gemini-3-flash"
    - Gemini 3 Flash
  • "gemini/gemini-2.5-flash"
    - Gemini 2.5 Flash
TTS(文字转语音):
  • "cartesia/sonic-3:{voice_id}"
    - Cartesia Sonic 3
  • "elevenlabs/eleven_turbo_v2_5:{voice_id}"
    - ElevenLabs
  • "deepgram/aura:{voice}"
    - Deepgram Aura

Best practices

最佳实践

  1. Always use LiveKit Inference model strings as the default for STT, LLM, and TTS. This eliminates the need to manage individual provider API keys. Only use plugins when you specifically need custom models, voice cloning, Anthropic Claude, or self-hosted models.
  2. Use adaptive noise cancellation with a lambda to detect SIP participants and apply appropriate noise cancellation (BVCTelephony for phone calls, BVC for standard participants).
  3. Use MultilingualModel turn detection for natural conversation flow.
  4. Structure prompts with Identity, Output rules, Tools, Goals, and Guardrails sections.
  5. Test with console mode before deploying to LiveKit Cloud.
  6. Use
    lk app env -w
    to load LiveKit Cloud credentials into your environment.
  1. 始终使用LiveKit Inference模型字符串作为STT、LLM和TTS的默认配置。这样无需管理各个服务商的API密钥。仅当你特别需要自定义模型、语音克隆、Anthropic Claude或自托管模型时,才使用插件。
  2. 使用自适应降噪,通过lambda检测SIP参与者并应用相应的降噪(电话通话使用BVCTelephony,普通参与者使用BVC)。
  3. 使用MultilingualModel话轮检测以实现自然的对话流程。
  4. 结构化提示词,包含身份、输出规则、工具、目标和约束部分。
  5. 在部署到LiveKit Cloud之前,先使用控制台模式测试
  6. **使用
    lk app env -w
    **将LiveKit Cloud凭据加载到你的环境中。