agents-ts

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

LiveKit Agents TypeScript SDK

LiveKit Agents TypeScript SDK

Build voice AI agents with LiveKit's TypeScript/Node.js Agents SDK.
使用LiveKit的TypeScript/Node.js Agents SDK构建语音AI代理。

LiveKit MCP server tools

LiveKit MCP服务器工具

This skill works alongside the LiveKit MCP server, which provides direct access to the latest LiveKit documentation, code examples, and changelogs. Use these tools when you need up-to-date information that may have changed since this skill was created.
Available MCP tools:
  • docs_search
    - Search the LiveKit docs site
  • get_pages
    - Fetch specific documentation pages by path
  • get_changelog
    - Get recent releases and updates for LiveKit packages
  • code_search
    - Search LiveKit repositories for code examples
  • get_python_agent_example
    - Browse 100+ Python agent examples
When to use MCP tools:
  • You need the latest API documentation or feature updates
  • You're looking for recent examples or code patterns
  • You want to check if a feature has been added in recent releases
  • The local references don't cover a specific topic
When to use local references:
  • You need quick access to core concepts covered in this skill
  • You're working offline or want faster access to common patterns
  • The information in the references is sufficient for your needs
Use MCP tools and local references together for the best experience.
本技能可与LiveKit MCP服务器配合使用,该服务器可直接获取最新的LiveKit文档、代码示例和更新日志。当你需要获取自本技能创建以来可能已更新的最新信息时,可以使用这些工具。
可用的MCP工具:
  • docs_search
    - 搜索LiveKit文档站点
  • get_pages
    - 根据路径获取特定文档页面
  • get_changelog
    - 获取LiveKit软件包的近期版本更新信息
  • code_search
    - 搜索LiveKit代码仓库中的示例代码
  • get_python_agent_example
    - 浏览100+个Python代理示例
何时使用MCP工具:
  • 你需要最新的API文档或功能更新内容
  • 你正在寻找近期的示例或代码模式
  • 你想查看某个功能是否在近期版本中新增
  • 本地参考资料未涵盖特定主题
何时使用本地参考资料:
  • 你需要快速获取本技能涵盖的核心概念
  • 你处于离线状态,或想要更快地访问常见模式
  • 参考资料中的信息已能满足你的需求
建议结合使用MCP工具和本地参考资料,以获得最佳体验。

References

参考资料

Consult these resources as needed:
  • ./references/livekit-overview.md -- LiveKit ecosystem overview and how these skills work together
  • ./references/agent-session.md -- AgentSession lifecycle, events, and configuration
  • ./references/tools.md -- Function tools with zod schemas
  • ./references/models.md -- STT, LLM, TTS plugins and realtime models
按需查阅以下资源:
  • ./references/livekit-overview.md -- LiveKit生态系统概述及各技能的协同方式
  • ./references/agent-session.md -- AgentSession的生命周期、事件和配置
  • ./references/tools.md -- 结合zod schema的函数工具
  • ./references/models.md -- STT、LLM、TTS插件及实时模型

Installation

安装

bash
pnpm add @livekit/agents@1.x \
    @livekit/agents-plugin-silero@1.x \
    @livekit/agents-plugin-livekit@1.x \
    @livekit/noise-cancellation-node@0.x \
    dotenv
bash
pnpm add @livekit/agents@1.x \
    @livekit/agents-plugin-silero@1.x \
    @livekit/agents-plugin-livekit@1.x \
    @livekit/noise-cancellation-node@0.x \
    dotenv

Environment variables

环境变量

Use the LiveKit CLI to load your credentials into a
.env.local
file:
bash
lk app env -w
Or manually create a
.env.local
file:
bash
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret
LIVEKIT_URL=wss://your-project.livekit.cloud
使用LiveKit CLI将你的凭证加载到
.env.local
文件中:
bash
lk app env -w
或手动创建
.env.local
文件:
bash
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret
LIVEKIT_URL=wss://your-project.livekit.cloud

Quick start

快速开始

Basic agent with STT-LLM-TTS pipeline

基于STT-LLM-TTS流水线的基础代理

typescript
import {
  type JobContext,
  type JobProcess,
  WorkerOptions,
  cli,
  defineAgent,
  voice,
} from '@livekit/agents';
import * as livekit from '@livekit/agents-plugin-livekit';
import * as silero from '@livekit/agents-plugin-silero';
import { BackgroundVoiceCancellation } from '@livekit/noise-cancellation-node';
import { fileURLToPath } from 'node:url';
import dotenv from 'dotenv';

dotenv.config({ path: '.env.local' });

export default defineAgent({
  prewarm: async (proc: JobProcess) => {
    proc.userData.vad = await silero.VAD.load();
  },
  entry: async (ctx: JobContext) => {
    const vad = ctx.proc.userData.vad! as silero.VAD;
    
    const assistant = new voice.Agent({
      instructions: `You are a helpful voice AI assistant.
        Keep responses concise, 1-3 sentences. No markdown or emojis.`,
    });

    const session = new voice.AgentSession({
      vad,
      stt: "assemblyai/universal-streaming:en",
      llm: "openai/gpt-4.1-mini",
      tts: "cartesia/sonic-3:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
      turnDetection: new livekit.turnDetector.MultilingualModel(),
    });

    await session.start({
      agent: assistant,
      room: ctx.room,
      inputOptions: {
        // For standard web/mobile participants use BackgroundVoiceCancellation()
        // For telephony/SIP applications use TelephonyBackgroundVoiceCancellation()
        noiseCancellation: BackgroundVoiceCancellation(),
      },
    });

    await ctx.connect();

    const handle = session.generateReply({
      instructions: 'Greet the user and offer your assistance.',
    });
    await handle.waitForPlayout();
  },
});

cli.runApp(new WorkerOptions({ agent: fileURLToPath(import.meta.url) }));
typescript
import {
  type JobContext,
  type JobProcess,
  WorkerOptions,
  cli,
  defineAgent,
  voice,
} from '@livekit/agents';
import * as livekit from '@livekit/agents-plugin-livekit';
import * as silero from '@livekit/agents-plugin-silero';
import { BackgroundVoiceCancellation } from '@livekit/noise-cancellation-node';
import { fileURLToPath } from 'node:url';
import dotenv from 'dotenv';

dotenv.config({ path: '.env.local' });

export default defineAgent({
  prewarm: async (proc: JobProcess) => {
    proc.userData.vad = await silero.VAD.load();
  },
  entry: async (ctx: JobContext) => {
    const vad = ctx.proc.userData.vad! as silero.VAD;
    
    const assistant = new voice.Agent({
      instructions: `You are a helpful voice AI assistant.
        Keep responses concise, 1-3 sentences. No markdown or emojis.`,
    });

    const session = new voice.AgentSession({
      vad,
      stt: "assemblyai/universal-streaming:en",
      llm: "openai/gpt-4.1-mini",
      tts: "cartesia/sonic-3:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
      turnDetection: new livekit.turnDetector.MultilingualModel(),
    });

    await session.start({
      agent: assistant,
      room: ctx.room,
      inputOptions: {
        // For standard web/mobile participants use BackgroundVoiceCancellation()
        // For telephony/SIP applications use TelephonyBackgroundVoiceCancellation()
        noiseCancellation: BackgroundVoiceCancellation(),
      },
    });

    await ctx.connect();

    const handle = session.generateReply({
      instructions: 'Greet the user and offer your assistance.',
    });
    await handle.waitForPlayout();
  },
});

cli.runApp(new WorkerOptions({ agent: fileURLToPath(import.meta.url) }));

Basic agent with realtime model

基于实时模型的基础代理

typescript
import {
  type JobContext,
  WorkerOptions,
  cli,
  defineAgent,
  voice,
} from '@livekit/agents';
import * as openai from '@livekit/agents-plugin-openai';
import { BackgroundVoiceCancellation } from '@livekit/noise-cancellation-node';
import { fileURLToPath } from 'node:url';
import dotenv from 'dotenv';

dotenv.config({ path: '.env.local' });

export default defineAgent({
  entry: async (ctx: JobContext) => {
    const assistant = new voice.Agent({
      instructions: 'You are a helpful voice AI assistant.',
    });

    const session = new voice.AgentSession({
      llm: new openai.realtime.RealtimeModel({
        voice: 'coral',
      }),
    });

    await session.start({
      agent: assistant,
      room: ctx.room,
      inputOptions: {
        // For standard web/mobile participants use BackgroundVoiceCancellation()
        // For telephony/SIP applications use TelephonyBackgroundVoiceCancellation()
        noiseCancellation: BackgroundVoiceCancellation(),
      },
    });

    await ctx.connect();

    const handle = session.generateReply({
      instructions: 'Greet the user and offer your assistance.',
    });
    await handle.waitForPlayout();
  },
});

cli.runApp(new WorkerOptions({ agent: fileURLToPath(import.meta.url) }));
typescript
import {
  type JobContext,
  WorkerOptions,
  cli,
  defineAgent,
  voice,
} from '@livekit/agents';
import * as openai from '@livekit/agents-plugin-openai';
import { BackgroundVoiceCancellation } from '@livekit/noise-cancellation-node';
import { fileURLToPath } from 'node:url';
import dotenv from 'dotenv';

dotenv.config({ path: '.env.local' });

export default defineAgent({
  entry: async (ctx: JobContext) => {
    const assistant = new voice.Agent({
      instructions: 'You are a helpful voice AI assistant.',
    });

    const session = new voice.AgentSession({
      llm: new openai.realtime.RealtimeModel({
        voice: 'coral',
      }),
    });

    await session.start({
      agent: assistant,
      room: ctx.room,
      inputOptions: {
        // For standard web/mobile participants use BackgroundVoiceCancellation()
        // For telephony/SIP applications use TelephonyBackgroundVoiceCancellation()
        noiseCancellation: BackgroundVoiceCancellation(),
      },
    });

    await ctx.connect();

    const handle = session.generateReply({
      instructions: 'Greet the user and offer your assistance.',
    });
    await handle.waitForPlayout();
  },
});

cli.runApp(new WorkerOptions({ agent: fileURLToPath(import.meta.url) }));

Core concepts

核心概念

defineAgent

defineAgent

The entry point for defining your agent:
typescript
import { defineAgent, type JobContext, type JobProcess } from '@livekit/agents';

export default defineAgent({
  // Optional: Preload models before jobs start
  prewarm: async (proc: JobProcess) => {
    proc.userData.vad = await silero.VAD.load();
  },
  
  // Required: Main entry point for each job
  entry: async (ctx: JobContext) => {
    // Your agent logic here
  },
});
定义代理的入口点:
typescript
import { defineAgent, type JobContext, type JobProcess } from '@livekit/agents';

export default defineAgent({
  // Optional: Preload models before jobs start
  prewarm: async (proc: JobProcess) => {
    proc.userData.vad = await silero.VAD.load();
  },
  
  // Required: Main entry point for each job
  entry: async (ctx: JobContext) => {
    // Your agent logic here
  },
});

voice.Agent

voice.Agent

Define agent behavior. You can use the
voice.Agent
constructor directly or extend the class:
typescript
import { voice, llm } from '@livekit/agents';
import { z } from 'zod';

// Option 1: Direct instantiation
const assistant = new voice.Agent({
  instructions: 'Your system prompt here',
  tools: {
    getWeather: llm.tool({
      description: 'Get the current weather for a location',
      parameters: z.object({
        location: z.string().describe('The city name'),
      }),
      execute: async ({ location }) => {
        return `The weather in ${location} is sunny and 72°F`;
      },
    }),
  },
});

// Option 2: Class extension (recommended for complex agents)
class Assistant extends voice.Agent {
  constructor() {
    super({
      instructions: 'Your system prompt here',
      tools: {
        getWeather: llm.tool({
          description: 'Get the current weather for a location',
          parameters: z.object({
            location: z.string().describe('The city name'),
          }),
          execute: async ({ location }) => {
            return `The weather in ${location} is sunny and 72°F`;
          },
        }),
      },
    });
  }
}
定义代理行为。你可以直接使用
voice.Agent
构造函数,也可以扩展该类:
typescript
import { voice, llm } from '@livekit/agents';
import { z } from 'zod';

// Option 1: Direct instantiation
const assistant = new voice.Agent({
  instructions: 'Your system prompt here',
  tools: {
    getWeather: llm.tool({
      description: 'Get the current weather for a location',
      parameters: z.object({
        location: z.string().describe('The city name'),
      }),
      execute: async ({ location }) => {
        return `The weather in ${location} is sunny and 72°F`;
      },
    }),
  },
});

// Option 2: Class extension (recommended for complex agents)
class Assistant extends voice.Agent {
  constructor() {
    super({
      instructions: 'Your system prompt here',
      tools: {
        getWeather: llm.tool({
          description: 'Get the current weather for a location',
          parameters: z.object({
            location: z.string().describe('The city name'),
          }),
          execute: async ({ location }) => {
            return `The weather in ${location} is sunny and 72°F`;
          },
        }),
      },
    });
  }
}

voice.AgentSession

voice.AgentSession

The session orchestrates the voice pipeline:
typescript
const session = new voice.AgentSession({
  stt: "assemblyai/universal-streaming:en",
  llm: "openai/gpt-4.1-mini",
  tts: "cartesia/sonic-3:voice_id",
  vad: await silero.VAD.load(),
  turnDetection: new livekit.turnDetector.MultilingualModel(),
});
Key methods:
  • session.start({ agent, room })
    - Start the session
  • session.say(text)
    - Speak text directly
  • session.generateReply({ instructions })
    - Generate LLM response
  • session.interrupt()
    - Stop current speech
  • session.updateAgent(newAgent)
    - Switch to different agent
会话用于编排语音流水线:
typescript
const session = new voice.AgentSession({
  stt: "assemblyai/universal-streaming:en",
  llm: "openai/gpt-4.1-mini",
  tts: "cartesia/sonic-3:voice_id",
  vad: await silero.VAD.load(),
  turnDetection: new livekit.turnDetector.MultilingualModel(),
});
关键方法:
  • session.start({ agent, room })
    - 启动会话
  • session.say(text)
    - 直接朗读文本
  • session.generateReply({ instructions })
    - 生成LLM响应
  • session.interrupt()
    - 停止当前语音输出
  • session.updateAgent(newAgent)
    - 切换到不同的代理

Running the agent

运行代理

Add scripts to
package.json
:
json
{
  "scripts": {
    "dev": "tsx agent.ts dev",
    "build": "tsc",
    "start": "node agent.js start",
    "download-files": "tsc && node agent.js download-files"
  }
}
bash
undefined
package.json
中添加脚本:
json
{
  "scripts": {
    "dev": "tsx agent.ts dev",
    "build": "tsc",
    "start": "node agent.js start",
    "download-files": "tsc && node agent.js download-files"
  }
}
bash
undefined

Development mode with auto-reload

Development mode with auto-reload

pnpm dev
pnpm dev

Production mode

Production mode

pnpm build && pnpm start
pnpm build && pnpm start

Download required model files

Download required model files

pnpm download-files
undefined
pnpm download-files
undefined

LiveKit Inference model strings

LiveKit Inference模型字符串

Use model strings for simple configuration without API keys:
STT (Speech-to-Text):
  • "assemblyai/universal-streaming:en"
    - AssemblyAI streaming
  • "deepgram/nova-3:en"
    - Deepgram Nova
  • "cartesia/ink"
    - Cartesia STT
LLM (Large Language Model):
  • "openai/gpt-4.1-mini"
    - GPT-4.1 mini (recommended)
  • "openai/gpt-4.1"
    - GPT-4.1
  • "openai/gpt-5"
    - GPT-5
  • "gemini/gemini-3-flash"
    - Gemini 3 Flash
TTS (Text-to-Speech):
  • "cartesia/sonic-3:{voice_id}"
    - Cartesia Sonic 3
  • "elevenlabs/eleven_turbo_v2_5:{voice_id}"
    - ElevenLabs
  • "deepgram/aura:{voice}"
    - Deepgram Aura
使用模型字符串进行简单配置,无需API密钥:
STT(语音转文本):
  • "assemblyai/universal-streaming:en"
    - AssemblyAI流式处理
  • "deepgram/nova-3:en"
    - Deepgram Nova
  • "cartesia/ink"
    - Cartesia STT
LLM(大语言模型):
  • "openai/gpt-4.1-mini"
    - GPT-4.1 mini(推荐)
  • "openai/gpt-4.1"
    - GPT-4.1
  • "openai/gpt-5"
    - GPT-5
  • "gemini/gemini-3-flash"
    - Gemini 3 Flash
TTS(文本转语音):
  • "cartesia/sonic-3:{voice_id}"
    - Cartesia Sonic 3
  • "elevenlabs/eleven_turbo_v2_5:{voice_id}"
    - ElevenLabs
  • "deepgram/aura:{voice}"
    - Deepgram Aura

Package structure

软件包结构

@livekit/agents                    # Core framework
@livekit/agents-plugin-openai      # OpenAI (LLM, STT, TTS, Realtime)
@livekit/agents-plugin-deepgram    # Deepgram (STT, TTS)
@livekit/agents-plugin-elevenlabs  # ElevenLabs (TTS)
@livekit/agents-plugin-silero      # Silero (VAD)
@livekit/agents-plugin-livekit     # Turn detector
@livekit/agents-plugin-gemini      # Google Gemini
@livekit/agents-plugin-groq        # Groq
@livekit/noise-cancellation-node   # Noise cancellation
@livekit/agents                    # 核心框架
@livekit/agents-plugin-openai      # OpenAI集成(LLM、STT、TTS、实时模型)
@livekit/agents-plugin-deepgram    # Deepgram集成(STT、TTS)
@livekit/agents-plugin-elevenlabs  # ElevenLabs集成(TTS)
@livekit/agents-plugin-silero      # Silero集成(VAD)
@livekit/agents-plugin-livekit     # 话轮检测器
@livekit/agents-plugin-gemini      # Google Gemini集成
@livekit/agents-plugin-groq        # Groq集成
@livekit/noise-cancellation-node   # 噪声消除

Best practices

最佳实践

  1. Always use LiveKit Inference model strings as the default for STT, LLM, and TTS. This eliminates the need to manage individual provider API keys. Only use plugins when you specifically need custom models, voice cloning, or self-hosted models.
  2. Use defineAgent pattern for proper lifecycle management.
  3. Prewarm VAD models in the
    prewarm
    function for faster job startup.
  4. Use the appropriate noise cancellation for your use case:
    • BackgroundVoiceCancellation()
      for standard web/mobile participants
    • TelephonyBackgroundVoiceCancellation()
      for SIP/telephony applications
  5. Call ctx.connect() after session.start() to connect to the room.
  6. Await generateReply with
    waitForPlayout()
    when you need to wait for the greeting to complete.
  7. Use
    lk app env -w
    to load LiveKit Cloud credentials into your environment.
  1. 优先使用LiveKit Inference模型字符串作为STT、LLM和TTS的默认配置。这样无需管理各个服务商的API密钥。仅当你需要自定义模型、语音克隆或自托管模型时,才使用插件。
  2. 使用defineAgent模式以实现正确的生命周期管理。
  3. prewarm
    函数中预加载VAD模型
    ,以加快任务启动速度。
  4. 根据你的使用场景选择合适的噪声消除方案
    • 针对标准Web/移动参与者使用
      BackgroundVoiceCancellation()
    • 针对SIP/电话应用使用
      TelephonyBackgroundVoiceCancellation()
  5. **在session.start()之后调用ctx.connect()**以连接到房间。
  6. 当你需要等待问候语播放完成时,使用
    await generateReply(...).waitForPlayout()
  7. **使用
    lk app env -w
    **将LiveKit Cloud凭证加载到你的环境中。