api

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Deepgram API

Build with Deepgram's speech-to-text, text-to-speech, voice agent, and audio intelligence APIs.

基于Deepgram的语音转文本、文本转语音、语音代理和音频智能API进行开发。

Getting Started

快速开始

All API requests require authentication via API key or JWT:

API Key:
```
Authorization: Token <API_KEY>
```
JWT:
```
Authorization: Bearer <JWT>
```

Base servers:

REST & STT/TTS WebSocket:
```
https://api.deepgram.com
```
Voice Agent WebSocket:
```
https://agent.deepgram.com
```

所有API请求都需要通过API密钥或JWT进行认证：

API密钥：
```
Authorization: Token <API_KEY>
```
JWT：
```
Authorization: Bearer <JWT>
```

基础服务器地址：

REST与STT/TTS WebSocket：
```
https://api.deepgram.com
```
语音代理WebSocket：
```
https://agent.deepgram.com
```

How Deepgram's APIs Fit Together

Deepgram各API的协作方式

                   ┌──────────────────────────────┐
                   │       api.deepgram.com        │
                   └──────────────────────────────┘
                                │
  ┌──────────────┬──────────────┼──────────────┬──────────────┐
  ▼              ▼              ▼              ▼              ▼
/v1/listen   /v2/listen     /v1/speak      /v1/read    /v1/projects/*
 Nova — ASR   Flux — conv.   TTS            Text AI     Management
REST or WSS   WSS only       REST or WSS    REST only   REST only

                   ┌──────────────────────────────┐
                   │      agent.deepgram.com       │
                   └──────────────────────────────┘
                                │
                                ▼
                   /v1/agent/converse
                   WebSocket only
                   audio ──▶ STT ──▶ LLM ──▶ TTS ──▶ audio
                   (Deepgram orchestrates the full pipeline)

                   ┌──────────────────────────────┐
                   │       api.deepgram.com        │
                   └──────────────────────────────┘
                                │
  ┌──────────────┬──────────────┼──────────────┬──────────────┐
  ▼              ▼              ▼              ▼              ▼
/v1/listen   /v2/listen     /v1/speak      /v1/read    /v1/projects/*
 Nova — ASR   Flux — conv.   TTS            Text AI     Management
REST or WSS   WSS only       REST or WSS    REST only   REST only

                   ┌──────────────────────────────┐
                   │      agent.deepgram.com       │
                   └──────────────────────────────┘
                                │
                                ▼
                   /v1/agent/converse
                   WebSocket only
                   audio ──▶ STT ──▶ LLM ──▶ TTS ──▶ audio
                   (Deepgram orchestrates the full pipeline)

Which API Should I Use?

如何选择合适的API？

Audio → text (transcription)?
├─ General-purpose transcription (captions, batch, call logs, live streams with custom turn logic)
│  └─ Nova models via /v1/listen
│     ├─ Pre-recorded file    →  REST  POST https://api.deepgram.com/v1/listen?model=nova-3
│     └─ Live stream          →  WSS   wss://api.deepgram.com/v1/listen?model=nova-3
│
└─ Conversational audio / voice-agent-style turn detection
   └─ Flux models via /v2/listen
      └─ Live stream          →  WSS   wss://api.deepgram.com/v2/listen?model=flux-general-en

Text → audio?
├─ One-shot                   →  REST POST /v1/speak
└─ Low-latency stream         →  WSS  wss://api.deepgram.com/v1/speak

Full conversational voice agent (audio in, audio out)?
└─ WSS wss://agent.deepgram.com/v1/agent/converse
   Deepgram handles STT + your configured LLM + TTS internally

Analyze text for insights?
└─ REST POST /v1/read
   (summaries, sentiment, topics, intents)

音频转文本（转录）？
├─ 通用转录（字幕、批量处理、通话记录、自定义轮次逻辑的直播流）
│  └─ 通过/v1/listen使用Nova模型
│     ├─ 预录制文件    →  REST  POST https://api.deepgram.com/v1/listen?model=nova-3
│     └─ 直播流          →  WSS   wss://api.deepgram.com/v1/listen?model=nova-3
│
└─ 对话式音频/语音代理风格的轮次检测
   └─ 通过/v2/listen使用Flux模型
      └─ 直播流          →  WSS   wss://api.deepgram.com/v2/listen?model=flux-general-en

文本转音频？
├─ 一次性转换                   →  REST POST /v1/speak
└─ 低延迟流转换         →  WSS  wss://api.deepgram.com/v1/speak

完整对话式语音代理（音频输入，音频输出）？
└─ WSS wss://agent.deepgram.com/v1/agent/converse
   Deepgram在内部处理STT + 您配置的LLM + TTS

分析文本获取洞察？
└─ REST POST /v1/read
   (摘要、情感分析、主题识别、意图识别)

Speech-to-Text: Nova (

/v1/listen

) vs Flux (

/v2/listen

)

语音转文本：Nova (

/v1/listen

) vs Flux (

/v2/listen

)

Both model families are actively maintained and industry-leading. They solve different problems — pick the one that matches your use case.

	Nova ( `/v1/listen` )	Flux ( `/v2/listen` )
Endpoint	`/v1/listen`	`/v2/listen`
Available models	`nova-3` , `nova-2` , `nova` , `enhanced` , `base`	`flux-general-en`
Best for	General transcription — captions, subtitles, call logs, batch	Conversational audio — voice agents, interactive assistants, turn-taking UIs
Output	Continuous transcript stream	Structured turn events + transcripts (built-in turn state machine)
Turn detection	Manual ( `utterance_end_ms` , VAD events)	Built-in (EOT, eager-EOT, turn_index)
Transports	REST + WebSocket	WebSocket only
Intelligence overlays	Yes — `summarize` , `sentiment` , `topics` , `intents` , `diarize` , `redact` , etc.	No — smaller focused param set; no `smart_format` / `diarize` / `punctuate`
Mid-session reconfig	No (reconnect to change)	Yes ( `Configure` message updates EOT thresholds + keyterms live)

Pick Nova (
/v1/listen
,
model=nova-3
) when:

Generating captions, subtitles, or transcripts for recorded media
Running batch transcription over files (REST)

You need analytics overlays (

summarize

sentiment

topics

intents

diarize

redact

)

You want WebSocket streaming with your own turn-detection logic

Pick Flux (
/v2/listen
,
model=flux-general-en
) when:

Building an interactive voice agent or assistant
You want end-of-turn detection handled for you
You need low-latency turn signals and barge-in support
You want to update EOT thresholds or keyterms mid-session without reconnecting

Migrating from Nova 3 to Flux? See the official Nova 3 → Flux migration guide.

这两个模型系列均处于活跃维护状态，且处于行业领先水平。它们适用于不同场景，请根据您的使用需求选择。

	Nova ( `/v1/listen` )	Flux ( `/v2/listen` )
端点	`/v1/listen`	`/v2/listen`
可用模型	`nova-3` , `nova-2` , `nova` , `enhanced` , `base`	`flux-general-en`
最佳适用场景	通用转录——字幕、副标题、通话记录、批量处理	对话式音频——语音代理、交互式助手、轮次交互界面
输出	连续转录流	结构化轮次事件 + 转录文本（内置轮次状态机）
轮次检测	手动配置（ `utterance_end_ms` 、VAD事件）	内置支持（EOT、eager-EOT、turn_index）
传输方式	REST + WebSocket	仅WebSocket
智能叠加功能	支持—— `summarize` 、 `sentiment` 、 `topics` 、 `intents` 、 `diarize` 、 `redact` 等	不支持——参数集精简；无 `smart_format` / `diarize` / `punctuate` 功能
会话中重新配置	不支持（需重新连接修改）	支持（ `Configure` 消息可实时更新EOT阈值和关键词）

选择Nova (
/v1/listen
,
model=nova-3
)的场景：

为录制媒体生成字幕、副标题或转录文本
对文件进行批量转录（REST）
需要分析叠加功能（
```
summarize
```
、
```
sentiment
```
、
```
topics
```
、
```
intents
```
、
```
diarize
```
、
```
redact
```
）
希望使用自定义轮次检测逻辑的WebSocket流

选择Flux (
/v2/listen
,
model=flux-general-en
)的场景：

构建交互式语音代理或助手
希望自动处理轮次结束检测
需要低延迟轮次信号和插话支持
希望在会话中更新EOT阈值或关键词，无需重新连接

从Nova 3迁移到Flux？请查看官方Nova 3 → Flux迁移指南。

API Domains

API领域

Domain	REST	WebSocket	Reference
Listen v1 — STT, Nova models	`POST /v1/listen`	`wss://api.deepgram.com/v1/listen`	listen.md
Listen v2 — STT, Flux (conversational)	—	`wss://api.deepgram.com/v2/listen`	listen.md
Speak (TTS)	`POST /v1/speak`	`wss://api.deepgram.com/v1/speak`	speak.md
Voice Agent	`GET /v1/agent/settings/think/models`	`wss://agent.deepgram.com/v1/agent/converse`	agent.md
Read (Intelligence)	`POST /v1/read`	—	read.md
Models	`GET /v1/models`	—	models.md
Projects	`/v1/projects/*`	—	projects.md
Auth	`POST /v1/auth/grant`	—	auth.md
Self-Hosted	`/v1/projects//selfhosted/`	—	self-hosted.md

领域	REST	WebSocket	参考文档
Listen v1 — 语音转文本，Nova模型	`POST /v1/listen`	`wss://api.deepgram.com/v1/listen`	listen.md
Listen v2 — 语音转文本，Flux（对话式）	—	`wss://api.deepgram.com/v2/listen`	listen.md
Speak（文本转语音）	`POST /v1/speak`	`wss://api.deepgram.com/v1/speak`	speak.md
语音代理	`GET /v1/agent/settings/think/models`	`wss://agent.deepgram.com/v1/agent/converse`	agent.md
Read（智能分析）	`POST /v1/read`	—	read.md
模型	`GET /v1/models`	—	models.md
项目	`/v1/projects/*`	—	projects.md
认证	`POST /v1/auth/grant`	—	auth.md
自托管	`/v1/projects//selfhosted/`	—	self-hosted.md

Common Mistakes to Avoid

需避免的常见错误

All APIs

所有API通用

Feature flags are query params — except for Voice Agent and Flux mid-session updates. For
```
/v1/listen
```
,
```
/v2/listen
```
, and
```
/v1/speak
```
, initial options go on the URL. The request body carries only audio data (REST) or audio frames (WebSocket). Two exceptions:
```
/v1/agent/converse
```
has no URL query params at all (all config goes in the
```
Settings
```
message); and
```
/v2/listen
```
supports a
```
Configure
```
message after connection to update EOT thresholds and keyterms mid-session. Also note that
```
/v2/listen
```
has a much smaller param set than
```
/v1/listen
```
— flags like
```
smart_format
```
,
```
diarize
```
, and
```
punctuate
```
are not available.
Rate limits are concurrent connections, not total requests. A 429 means too many simultaneous open connections, not too high a request volume. Diarization and other compute-heavy features reduce your concurrency allowance further.

功能标识为查询参数——语音代理和Flux会话中更新除外。对于
```
/v1/listen
```
、
```
/v2/listen
```
和
```
/v1/speak
```
，初始配置选项需放在URL中。请求体仅携带音频数据（REST）或音频帧（WebSocket）。两个例外：
```
/v1/agent/converse
```
完全没有URL查询参数（所有配置都在
```
Settings
```
消息中）；
```
/v2/listen
```
支持在连接后发送
```
Configure
```
消息，以便在会话中更新EOT阈值和关键词。另外注意，
```
/v2/listen
```
的参数集比
```
/v1/listen
```
精简得多——
```
smart_format
```
、
```
diarize
```
和
```
punctuate
```
等标识不可用。
速率限制针对并发连接数，而非总请求数。返回429状态码表示同时打开的连接过多，而非请求量过大。说话人分离和其他计算密集型功能会进一步降低您的并发连接限额。

STT WebSocket (

/v1/listen

)

STT WebSocket (

/v1/listen

)

Send KeepAlive as a text frame, not binary. The connection closes after 10 seconds of no audio. Send
```
{"type":"KeepAlive"}
```
as a text (JSON) frame every 3–5 seconds during silence. Sending it as a binary frame causes transcription delays — the audio pipeline chokes — not a silent no-op.
Never send empty byte payloads. Sending a zero-length binary frame to
```
/v1/listen
```
is treated as a close — it terminates the connection. Always check that your audio packet has length before sending.
encoding
must match the actual audio format. If
```
encoding=linear16
```
but you're sending opus, you'll get a DATA-0000 error or garbled output. Omit
```
encoding
```
entirely when sending containerized formats (mp3, wav, ogg) — Deepgram detects them automatically.
Timestamps reset on reconnect. Each new WebSocket connection restarts timestamps at 00:00:00. For real-time apps, maintain a timestamp offset across reconnections or you'll silently corrupt your transcript timeline.

发送KeepAlive作为文本帧，而非二进制帧。如果10秒内无音频，连接将关闭。在静默期间每隔3-5秒发送
```
{"type":"KeepAlive"}
```
作为文本（JSON）帧。如果以二进制帧发送会导致转录延迟——音频处理管道会阻塞——而非无操作。
切勿发送空字节负载。向
```
/v1/listen
```
发送零长度二进制帧会被视为关闭请求——连接将终止。发送前务必检查音频数据包的长度。
encoding
必须与实际音频格式匹配。如果设置
```
encoding=linear16
```
但发送的是opus格式，您会收到DATA-0000错误或乱码输出。发送容器化格式（mp3、wav、ogg）时可省略
```
encoding
```
参数——Deepgram会自动检测格式。
重新连接时时间戳会重置。每个新的WebSocket连接都会将时间戳重置为00:00:00。对于实时应用，需在重新连接时维护时间戳偏移量，否则会无声地破坏转录时间线。

TTS WebSocket (

/v1/speak

)

TTS WebSocket (

/v1/speak

)

Don't send empty text. A
```
Speak
```
message with an empty
```
text
```
field returns a 400 error. Always validate input before sending.
Character rate limiting (DATA-0001) means slow down, not retry. If you hit this, reduce how fast you're submitting text chunks — don't immediately retry or you'll compound the problem.

不要发送空文本。
```
Speak
```
消息中
```
text
```
字段为空会返回400错误。发送前务必验证输入内容。
字符速率限制（DATA-0001）意味着需要放慢速度，而非重试。如果遇到此限制，请降低提交文本块的速度——不要立即重试，否则会加剧问题。

Voice Agent (

/v1/agent/converse

)

语音代理 (

/v1/agent/converse

)

Send the
Settings
message before any audio. The agent ignores everything until it receives and acknowledges the Settings configuration. Message ordering is strictly required.

发送音频前先发送
Settings
消息。代理会忽略所有内容，直到接收并确认Settings配置。消息顺序严格要求。

Flux model

Flux模型

Use
/v2/listen
and
model=flux-general-en
.
```
/v1/listen
```
does not support Flux.
```
model=flux
```
alone is not a valid value. Do not include
```
language
```
or
```
encoding
```
params for containerized audio.
Use
Configure
to update EOT thresholds and keyterms mid-session. Unlike
```
/v1/listen
```
, Flux supports live reconfiguration after connection — no need to reconnect to change turn detection sensitivity or boost new keyterms:
json
```
{ "type": "Configure", "thresholds": { "eot_threshold": "0.8", "eot_timeout_ms": "3000" }, "keyterms": ["Deepgram"] }
```
The server responds with
```
ConfigureSuccess
```
(echoing back applied values) or
```
ConfigureFailure
```
. Omitted threshold fields keep their current values.

使用
/v2/listen
并设置
model=flux-general-en
。
```
/v1/listen
```
不支持Flux模型。仅设置
```
model=flux
```
不是有效值。发送容器化音频时请勿包含
```
language
```
或
```
encoding
```
参数。
使用
Configure
在会话中更新EOT阈值和关键词。与
```
/v1/listen
```
不同，Flux支持连接后的实时重新配置——无需重新连接即可修改轮次检测灵敏度或新增关键词：
json
```
{ "type": "Configure", "thresholds": { "eot_threshold": "0.8", "eot_timeout_ms": "3000" }, "keyterms": ["Deepgram"] }
```
服务器会返回
```
ConfigureSuccess
```
（回显已应用的值）或
```
ConfigureFailure
```
。未指定的阈值字段将保持当前值。

Authentication

认证

JWT TTL applies only to the initial handshake. Tokens default to 30 seconds. Once the WebSocket connection is established, the token expiring does not close it — tokens are only needed for the upgrade request.

JWT过期时间仅适用于初始握手。令牌默认有效期为30秒。一旦WebSocket连接建立，令牌过期不会关闭连接——令牌仅在升级请求时需要。

SDK-Specific Skills

SDK专属技能

This

api

skill covers the product contracts (endpoints, query params, message shapes) that are identical across SDKs. For language-idiomatic code — imports, async patterns, builder APIs, common errors — install the SDK-specific skills. Each Deepgram SDK publishes 7 product skills named

deepgram-{lang}-{product}

(e.g.

deepgram-python-speech-to-text

deepgram-js-voice-agent

) plus a maintainer skill

deepgram-{lang}-maintaining-sdk

. The

deepgram-{lang}-

prefix avoids collisions when you install skills from multiple SDKs.

bash

undefined

本

api

技能涵盖所有SDK通用的产品约定（端点、查询参数、消息格式）。如需符合语言习惯的代码——导入、异步模式、构建器API、常见错误——请安装SDK专属技能。每个Deepgram SDK都会发布7个产品技能，命名格式为

deepgram-{lang}-{product}

（例如

deepgram-python-speech-to-text

、

deepgram-js-voice-agent

），外加一个维护者技能

deepgram-{lang}-maintaining-sdk

。

deepgram-{lang}-

前缀可避免安装多个SDK的技能时发生冲突。

bash

undefined

Install all skills from a specific SDK

安装特定SDK的所有技能

npx skills add deepgram/deepgram-python-sdk # Python npx skills add deepgram/deepgram-js-sdk # JavaScript / TypeScript npx skills add deepgram/deepgram-java-sdk # Java npx skills add deepgram/deepgram-go-sdk # Go npx skills add deepgram/deepgram-rust-sdk # Rust npx skills add deepgram/deepgram-swift-sdk # Swift npx skills add deepgram/deepgram-kotlin-sdk # Kotlin npx skills add deepgram/deepgram-dotnet-sdk # C# / .NET npx skills add deepgram/deepgram-browser-sdk # Browser TypeScript

Or install a specific product skill from one SDK (note the deepgram-{lang}- prefix)

或安装某个SDK的特定产品技能（注意deepgram-{lang}-前缀）

npx skills add deepgram/deepgram-python-sdk --skill deepgram-python-speech-to-text npx skills add deepgram/deepgram-js-sdk --skill deepgram-js-voice-agent

undefined

npx skills add deepgram/deepgram-python-sdk --skill deepgram-python-speech-to-text npx skills add deepgram/deepgram-js-sdk --skill deepgram-js-voice-agent

undefined

Skill	Purpose
`recipes`	Minimal runnable snippets per feature per language
`examples`	Full integration examples with third-party platforms (Twilio, LiveKit, etc.)
`starters`	Runnable starter apps (framework × feature matrix)
`docs`	Navigate Deepgram documentation
`setup-mcp`	Install the Deepgram MCP server

技能	用途
`recipes`	各语言各功能的极简可运行代码片段
`examples`	与第三方平台（Twilio、LiveKit等）的完整集成示例
`starters`	可运行的启动应用（框架×功能矩阵）
`docs`	浏览Deepgram文档
`setup-mcp`	安装Deepgram MCP服务器

api

Original

Translation

Deepgram API

Deepgram API

Getting Started

快速开始

How Deepgram's APIs Fit Together

Deepgram各API的协作方式

Which API Should I Use?

如何选择合适的API？

Speech-to-Text: Nova (
`/v1/listen`
) vs Flux (
`/v2/listen`
)

语音转文本：Nova (
`/v1/listen`
) vs Flux (
`/v2/listen`
)

API Domains

API领域

Common Mistakes to Avoid

需避免的常见错误

All APIs

所有API通用

STT WebSocket (
`/v1/listen`
)

STT WebSocket (
`/v1/listen`
)

TTS WebSocket (
`/v1/speak`
)

TTS WebSocket (
`/v1/speak`
)

Voice Agent (
`/v1/agent/converse`
)

语音代理 (
`/v1/agent/converse`
)

Flux model

Flux模型

Authentication

认证

SDK-Specific Skills

SDK专属技能

Install all skills from a specific SDK

安装特定SDK的所有技能

Or install a specific product skill from one SDK (note the deepgram-{lang}- prefix)

或安装某个SDK的特定产品技能（注意deepgram-{lang}-前缀）

Related Deepgram skills

相关Deepgram技能

Documentation

文档链接

api

Original

Translation

Deepgram API

Deepgram API

Getting Started

快速开始

How Deepgram's APIs Fit Together

Deepgram各API的协作方式

Which API Should I Use?

如何选择合适的API？

Speech-to-Text: Nova (/v1/listen) vs Flux (/v2/listen)

语音转文本：Nova (/v1/listen) vs Flux (/v2/listen)

API Domains

API领域

Common Mistakes to Avoid

需避免的常见错误

All APIs

所有API通用

STT WebSocket (/v1/listen)

STT WebSocket (/v1/listen)

TTS WebSocket (/v1/speak)

TTS WebSocket (/v1/speak)

Voice Agent (/v1/agent/converse)

语音代理 (/v1/agent/converse)

Flux model

Flux模型

Authentication

认证

SDK-Specific Skills

SDK专属技能

Install all skills from a specific SDK

安装特定SDK的所有技能

Or install a specific product skill from one SDK (note the deepgram-{lang}- prefix)

或安装某个SDK的特定产品技能（注意deepgram-{lang}-前缀）

Related Deepgram skills

相关Deepgram技能

Documentation

文档链接

Speech-to-Text: Nova (
`/v1/listen`
) vs Flux (
`/v2/listen`
)

语音转文本：Nova (
`/v1/listen`
) vs Flux (
`/v2/listen`
)

STT WebSocket (
`/v1/listen`
)

STT WebSocket (
`/v1/listen`
)

TTS WebSocket (
`/v1/speak`
)

TTS WebSocket (
`/v1/speak`
)

Voice Agent (
`/v1/agent/converse`
)

语音代理 (
`/v1/agent/converse`
)