streaming-stt-deepgram

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Deepgram Streaming STT

Deepgram流式STT

Use this skill when the user needs real-time speech-to-text transcription with the lowest possible latency. Deepgram's WebSocket API provides sub-300 ms interim transcripts using the Nova-2 model.

Prefer this provider over file-based Whisper when the agent needs live voice input during a conversation, or when speaker identification (diarization) is required without a separate processing step.

当用户需要低延迟的实时语音转文本转录时，可使用该Skill。Deepgram的WebSocket API借助Nova-2模型提供延迟低于300毫秒的临时转录结果。

当Agent需要在对话过程中接收实时语音输入，或者无需单独处理步骤即可实现说话人识别（分离）时，优先选择该服务商，而非基于文件的Whisper。

Setup

配置步骤

Set

DEEPGRAM_API_KEY

in the environment or agent secrets store before starting a voice session.

在启动语音会话前，在环境变量或Agent密钥存储中设置

DEEPGRAM_API_KEY

。

Configuration

参数配置

json

{
  "voice": {
    "stt": "deepgram"
  }
}

To enable diarization and keyword boosting:

json

{
  "voice": {
    "stt": "deepgram",
    "providerOptions": {
      "model": "nova-2",
      "diarize": true,
      "keywords": ["AgentOS:2"],
      "endpointing": 300
    }
  }
}

json

{
  "voice": {
    "stt": "deepgram"
  }
}

如需启用说话人分离和关键词增强功能：

json

{
  "voice": {
    "stt": "deepgram",
    "providerOptions": {
      "model": "nova-2",
      "diarize": true,
      "keywords": ["AgentOS:2"],
      "endpointing": 300
    }
  }
}

Provider Rules

服务商规则

Use
```
nova-2
```
as the default model — highest accuracy on Deepgram's current tier.
Enable
```
diarize: true
```
when the conversation involves multiple speakers; word-level
```
speaker
```
labels are included in the transcript events.
Tune
```
endpointing
```
(ms of silence before finalization) to balance responsiveness vs. over-splitting. Default 300 ms is suitable for most conversations.
The provider auto-reconnects on WebSocket drops using exponential back-off (100 ms → 5 s cap).
Use
```
providerOptions.keywords
```
to boost domain-specific terms (e.g. product names, abbreviations).

将
```
nova-2
```
设为默认模型——这是Deepgram当前层级中准确率最高的模型。
当对话涉及多位说话人时，启用
```
diarize: true
```
；转录事件中将包含单词级别的
```
speaker
```
标签。
调整
```
endpointing
```
（最终确认前的静音时长，单位：毫秒）以平衡响应速度与过度拆分问题。默认300毫秒适用于大多数对话场景。
当WebSocket连接断开时，服务商将使用指数退避策略（100毫秒→上限5秒）自动重连。
使用
```
providerOptions.keywords
```
增强特定领域术语（如产品名称、缩写）的识别优先级。

Events

事件列表

Event	Description
`transcript`	Every hypothesis (interim + final)
`interim_transcript`	Non-final hypothesis
`final_transcript`	Stable, final hypothesis
`speech_start`	First non-empty word in an utterance
`speech_end`	Deepgram `speech_final` flag raised
`error`	Unrecoverable provider error
`close`	Session fully terminated

事件名称	描述
`transcript`	所有转录假设结果（临时+最终）
`interim_transcript`	非最终转录假设结果
`final_transcript`	稳定的最终转录结果
`speech_start`	话语中的首个非空单词
`speech_end`	Deepgram触发 `speech_final` 标记
`error`	服务商无法恢复的错误
`close`	会话完全终止

Examples

使用示例

"Start a live voice session using Deepgram for transcription."
"Enable speaker diarization for this multi-person meeting transcription."
"Use Deepgram with keyword boosting for AgentOS and Wunderland terms."

“启动一个使用Deepgram进行转录的实时语音会话。”
“为本次多人会议转录启用说话人分离功能。”
“使用Deepgram并开启AgentOS和Wunderland术语的关键词增强。”

Constraints

限制条件

Requires
```
DEEPGRAM_API_KEY
```
. Free tier available at console.deepgram.com.
Audio must be streamed as PCM/WebSocket-compatible frames.
Diarization adds slight latency; disable if single-speaker performance is the priority.

需要
```
DEEPGRAM_API_KEY
```
。可在console.deepgram.com获取免费层级。
音频必须以PCM/WebSocket兼容的帧格式流式传输。
说话人分离会增加轻微延迟；若优先考虑单说话人场景的性能，可禁用该功能。