api
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDeepgram API
Deepgram API
Build with Deepgram's speech-to-text, text-to-speech, voice agent, and audio intelligence APIs.
基于Deepgram的语音转文本、文本转语音、语音代理和音频智能API进行开发。
Getting Started
快速开始
All API requests require authentication via API key or JWT:
- API Key:
Authorization: Token <API_KEY> - JWT:
Authorization: Bearer <JWT>
Base servers:
- REST & STT/TTS WebSocket:
https://api.deepgram.com - Voice Agent WebSocket:
https://agent.deepgram.com
所有API请求都需要通过API密钥或JWT进行认证:
- API密钥:
Authorization: Token <API_KEY> - JWT:
Authorization: Bearer <JWT>
基础服务器地址:
- REST与STT/TTS WebSocket:
https://api.deepgram.com - 语音代理WebSocket:
https://agent.deepgram.com
How Deepgram's APIs Fit Together
Deepgram各API的协作方式
┌──────────────────────────────┐
│ api.deepgram.com │
└──────────────────────────────┘
│
┌──────────────┬──────────────┼──────────────┬──────────────┐
▼ ▼ ▼ ▼ ▼
/v1/listen /v2/listen /v1/speak /v1/read /v1/projects/*
Nova — ASR Flux — conv. TTS Text AI Management
REST or WSS WSS only REST or WSS REST only REST only
┌──────────────────────────────┐
│ agent.deepgram.com │
└──────────────────────────────┘
│
▼
/v1/agent/converse
WebSocket only
audio ──▶ STT ──▶ LLM ──▶ TTS ──▶ audio
(Deepgram orchestrates the full pipeline) ┌──────────────────────────────┐
│ api.deepgram.com │
└──────────────────────────────┘
│
┌──────────────┬──────────────┼──────────────┬──────────────┐
▼ ▼ ▼ ▼ ▼
/v1/listen /v2/listen /v1/speak /v1/read /v1/projects/*
Nova — ASR Flux — conv. TTS Text AI Management
REST or WSS WSS only REST or WSS REST only REST only
┌──────────────────────────────┐
│ agent.deepgram.com │
└──────────────────────────────┘
│
▼
/v1/agent/converse
WebSocket only
audio ──▶ STT ──▶ LLM ──▶ TTS ──▶ audio
(Deepgram orchestrates the full pipeline)Which API Should I Use?
如何选择合适的API?
Audio → text (transcription)?
├─ General-purpose transcription (captions, batch, call logs, live streams with custom turn logic)
│ └─ Nova models via /v1/listen
│ ├─ Pre-recorded file → REST POST https://api.deepgram.com/v1/listen?model=nova-3
│ └─ Live stream → WSS wss://api.deepgram.com/v1/listen?model=nova-3
│
└─ Conversational audio / voice-agent-style turn detection
└─ Flux models via /v2/listen
└─ Live stream → WSS wss://api.deepgram.com/v2/listen?model=flux-general-en
Text → audio?
├─ One-shot → REST POST /v1/speak
└─ Low-latency stream → WSS wss://api.deepgram.com/v1/speak
Full conversational voice agent (audio in, audio out)?
└─ WSS wss://agent.deepgram.com/v1/agent/converse
Deepgram handles STT + your configured LLM + TTS internally
Analyze text for insights?
└─ REST POST /v1/read
(summaries, sentiment, topics, intents)音频转文本(转录)?
├─ 通用转录(字幕、批量处理、通话记录、自定义轮次逻辑的直播流)
│ └─ 通过/v1/listen使用Nova模型
│ ├─ 预录制文件 → REST POST https://api.deepgram.com/v1/listen?model=nova-3
│ └─ 直播流 → WSS wss://api.deepgram.com/v1/listen?model=nova-3
│
└─ 对话式音频/语音代理风格的轮次检测
└─ 通过/v2/listen使用Flux模型
└─ 直播流 → WSS wss://api.deepgram.com/v2/listen?model=flux-general-en
文本转音频?
├─ 一次性转换 → REST POST /v1/speak
└─ 低延迟流转换 → WSS wss://api.deepgram.com/v1/speak
完整对话式语音代理(音频输入,音频输出)?
└─ WSS wss://agent.deepgram.com/v1/agent/converse
Deepgram在内部处理STT + 您配置的LLM + TTS
分析文本获取洞察?
└─ REST POST /v1/read
(摘要、情感分析、主题识别、意图识别)Speech-to-Text: Nova (/v1/listen
) vs Flux (/v2/listen
)
/v1/listen/v2/listen语音转文本:Nova (/v1/listen
) vs Flux (/v2/listen
)
/v1/listen/v2/listenBoth model families are actively maintained and industry-leading. They solve different problems — pick the one that matches your use case.
Nova ( | Flux ( | |
|---|---|---|
| Endpoint | | |
| Available models | | |
| Best for | General transcription — captions, subtitles, call logs, batch | Conversational audio — voice agents, interactive assistants, turn-taking UIs |
| Output | Continuous transcript stream | Structured turn events + transcripts (built-in turn state machine) |
| Turn detection | Manual ( | Built-in (EOT, eager-EOT, turn_index) |
| Transports | REST + WebSocket | WebSocket only |
| Intelligence overlays | Yes — | No — smaller focused param set; no |
| Mid-session reconfig | No (reconnect to change) | Yes ( |
Pick Nova (, ) when:
/v1/listenmodel=nova-3- Generating captions, subtitles, or transcripts for recorded media
- Running batch transcription over files (REST)
- You need analytics overlays (,
summarize,sentiment,topics,intents,diarize)redact - You want WebSocket streaming with your own turn-detection logic
Pick Flux (, ) when:
/v2/listenmodel=flux-general-en- Building an interactive voice agent or assistant
- You want end-of-turn detection handled for you
- You need low-latency turn signals and barge-in support
- You want to update EOT thresholds or keyterms mid-session without reconnecting
Migrating from Nova 3 to Flux? See the official Nova 3 → Flux migration guide.
这两个模型系列均处于活跃维护状态,且处于行业领先水平。它们适用于不同场景,请根据您的使用需求选择。
Nova ( | Flux ( | |
|---|---|---|
| 端点 | | |
| 可用模型 | | |
| 最佳适用场景 | 通用转录——字幕、副标题、通话记录、批量处理 | 对话式音频——语音代理、交互式助手、轮次交互界面 |
| 输出 | 连续转录流 | 结构化轮次事件 + 转录文本(内置轮次状态机) |
| 轮次检测 | 手动配置( | 内置支持(EOT、eager-EOT、turn_index) |
| 传输方式 | REST + WebSocket | 仅WebSocket |
| 智能叠加功能 | 支持—— | 不支持——参数集精简;无 |
| 会话中重新配置 | 不支持(需重新连接修改) | 支持( |
选择Nova (, )的场景:
/v1/listenmodel=nova-3- 为录制媒体生成字幕、副标题或转录文本
- 对文件进行批量转录(REST)
- 需要分析叠加功能(、
summarize、sentiment、topics、intents、diarize)redact - 希望使用自定义轮次检测逻辑的WebSocket流
选择Flux (, )的场景:
/v2/listenmodel=flux-general-en- 构建交互式语音代理或助手
- 希望自动处理轮次结束检测
- 需要低延迟轮次信号和插话支持
- 希望在会话中更新EOT阈值或关键词,无需重新连接
从Nova 3迁移到Flux?请查看官方Nova 3 → Flux迁移指南。
API Domains
API领域
| Domain | REST | WebSocket | Reference |
|---|---|---|---|
| Listen v1 — STT, Nova models | | | listen.md |
| Listen v2 — STT, Flux (conversational) | — | | listen.md |
| Speak (TTS) | | | speak.md |
| Voice Agent | | | agent.md |
| Read (Intelligence) | | — | read.md |
| Models | | — | models.md |
| Projects | | — | projects.md |
| Auth | | — | auth.md |
| Self-Hosted | | — | self-hosted.md |
| 领域 | REST | WebSocket | 参考文档 |
|---|---|---|---|
| Listen v1 — 语音转文本,Nova模型 | | | listen.md |
| Listen v2 — 语音转文本,Flux(对话式) | — | | listen.md |
| Speak(文本转语音) | | | speak.md |
| 语音代理 | | | agent.md |
| Read(智能分析) | | — | read.md |
| 模型 | | — | models.md |
| 项目 | | — | projects.md |
| 认证 | | — | auth.md |
| 自托管 | | — | self-hosted.md |
Common Mistakes to Avoid
需避免的常见错误
All APIs
所有API通用
-
Feature flags are query params — except for Voice Agent and Flux mid-session updates. For,
/v1/listen, and/v2/listen, initial options go on the URL. The request body carries only audio data (REST) or audio frames (WebSocket). Two exceptions:/v1/speakhas no URL query params at all (all config goes in the/v1/agent/conversemessage); andSettingssupports a/v2/listenmessage after connection to update EOT thresholds and keyterms mid-session. Also note thatConfigurehas a much smaller param set than/v2/listen— flags like/v1/listen,smart_format, anddiarizeare not available.punctuate -
Rate limits are concurrent connections, not total requests. A 429 means too many simultaneous open connections, not too high a request volume. Diarization and other compute-heavy features reduce your concurrency allowance further.
-
功能标识为查询参数——语音代理和Flux会话中更新除外。对于、
/v1/listen和/v2/listen,初始配置选项需放在URL中。请求体仅携带音频数据(REST)或音频帧(WebSocket)。两个例外:/v1/speak完全没有URL查询参数(所有配置都在/v1/agent/converse消息中);Settings支持在连接后发送/v2/listen消息,以便在会话中更新EOT阈值和关键词。另外注意,Configure的参数集比/v2/listen精简得多——/v1/listen、smart_format和diarize等标识不可用。punctuate -
速率限制针对并发连接数,而非总请求数。返回429状态码表示同时打开的连接过多,而非请求量过大。说话人分离和其他计算密集型功能会进一步降低您的并发连接限额。
STT WebSocket (/v1/listen
)
/v1/listenSTT WebSocket (/v1/listen
)
/v1/listen-
Send KeepAlive as a text frame, not binary. The connection closes after 10 seconds of no audio. Sendas a text (JSON) frame every 3–5 seconds during silence. Sending it as a binary frame causes transcription delays — the audio pipeline chokes — not a silent no-op.
{"type":"KeepAlive"} -
Never send empty byte payloads. Sending a zero-length binary frame tois treated as a close — it terminates the connection. Always check that your audio packet has length before sending.
/v1/listen -
must match the actual audio format. If
encodingbut you're sending opus, you'll get a DATA-0000 error or garbled output. Omitencoding=linear16entirely when sending containerized formats (mp3, wav, ogg) — Deepgram detects them automatically.encoding -
Timestamps reset on reconnect. Each new WebSocket connection restarts timestamps at 00:00:00. For real-time apps, maintain a timestamp offset across reconnections or you'll silently corrupt your transcript timeline.
-
发送KeepAlive作为文本帧,而非二进制帧。如果10秒内无音频,连接将关闭。在静默期间每隔3-5秒发送作为文本(JSON)帧。如果以二进制帧发送会导致转录延迟——音频处理管道会阻塞——而非无操作。
{"type":"KeepAlive"} -
切勿发送空字节负载。向发送零长度二进制帧会被视为关闭请求——连接将终止。发送前务必检查音频数据包的长度。
/v1/listen -
必须与实际音频格式匹配。如果设置
encoding但发送的是opus格式,您会收到DATA-0000错误或乱码输出。发送容器化格式(mp3、wav、ogg)时可省略encoding=linear16参数——Deepgram会自动检测格式。encoding -
重新连接时时间戳会重置。每个新的WebSocket连接都会将时间戳重置为00:00:00。对于实时应用,需在重新连接时维护时间戳偏移量,否则会无声地破坏转录时间线。
TTS WebSocket (/v1/speak
)
/v1/speakTTS WebSocket (/v1/speak
)
/v1/speak-
Don't send empty text. Amessage with an empty
Speakfield returns a 400 error. Always validate input before sending.text -
Character rate limiting (DATA-0001) means slow down, not retry. If you hit this, reduce how fast you're submitting text chunks — don't immediately retry or you'll compound the problem.
-
不要发送空文本。消息中
Speak字段为空会返回400错误。发送前务必验证输入内容。text -
字符速率限制(DATA-0001)意味着需要放慢速度,而非重试。如果遇到此限制,请降低提交文本块的速度——不要立即重试,否则会加剧问题。
Voice Agent (/v1/agent/converse
)
/v1/agent/converse语音代理 (/v1/agent/converse
)
/v1/agent/converse- Send the message before any audio. The agent ignores everything until it receives and acknowledges the Settings configuration. Message ordering is strictly required.
Settings
- 发送音频前先发送消息。代理会忽略所有内容,直到接收并确认Settings配置。消息顺序严格要求。
Settings
Flux model
Flux模型
-
Useand
/v2/listen.model=flux-general-endoes not support Flux./v1/listenalone is not a valid value. Do not includemodel=fluxorlanguageparams for containerized audio.encoding -
Useto update EOT thresholds and keyterms mid-session. Unlike
Configure, Flux supports live reconfiguration after connection — no need to reconnect to change turn detection sensitivity or boost new keyterms:/v1/listenjson{ "type": "Configure", "thresholds": { "eot_threshold": "0.8", "eot_timeout_ms": "3000" }, "keyterms": ["Deepgram"] }The server responds with(echoing back applied values) orConfigureSuccess. Omitted threshold fields keep their current values.ConfigureFailure
-
使用并设置
/v2/listen。model=flux-general-en不支持Flux模型。仅设置/v1/listen不是有效值。发送容器化音频时请勿包含model=flux或language参数。encoding -
使用在会话中更新EOT阈值和关键词。与
Configure不同,Flux支持连接后的实时重新配置——无需重新连接即可修改轮次检测灵敏度或新增关键词:/v1/listenjson{ "type": "Configure", "thresholds": { "eot_threshold": "0.8", "eot_timeout_ms": "3000" }, "keyterms": ["Deepgram"] }服务器会返回(回显已应用的值)或ConfigureSuccess。未指定的阈值字段将保持当前值。ConfigureFailure
Authentication
认证
- JWT TTL applies only to the initial handshake. Tokens default to 30 seconds. Once the WebSocket connection is established, the token expiring does not close it — tokens are only needed for the upgrade request.
- JWT过期时间仅适用于初始握手。令牌默认有效期为30秒。一旦WebSocket连接建立,令牌过期不会关闭连接——令牌仅在升级请求时需要。
SDK-Specific Skills
SDK专属技能
This skill covers the product contracts (endpoints, query params, message shapes) that are identical across SDKs. For language-idiomatic code — imports, async patterns, builder APIs, common errors — install the SDK-specific skills. Each Deepgram SDK publishes 7 product skills named (e.g. , ) plus a maintainer skill . The prefix avoids collisions when you install skills from multiple SDKs.
apideepgram-{lang}-{product}deepgram-python-speech-to-textdeepgram-js-voice-agentdeepgram-{lang}-maintaining-sdkdeepgram-{lang}-bash
undefined本技能涵盖所有SDK通用的产品约定(端点、查询参数、消息格式)。如需符合语言习惯的代码——导入、异步模式、构建器API、常见错误——请安装SDK专属技能。每个Deepgram SDK都会发布7个产品技能,命名格式为(例如、),外加一个维护者技能。前缀可避免安装多个SDK的技能时发生冲突。
apideepgram-{lang}-{product}deepgram-python-speech-to-textdeepgram-js-voice-agentdeepgram-{lang}-maintaining-sdkdeepgram-{lang}-bash
undefinedInstall all skills from a specific SDK
安装特定SDK的所有技能
npx skills add deepgram/deepgram-python-sdk # Python
npx skills add deepgram/deepgram-js-sdk # JavaScript / TypeScript
npx skills add deepgram/deepgram-java-sdk # Java
npx skills add deepgram/deepgram-go-sdk # Go
npx skills add deepgram/deepgram-rust-sdk # Rust
npx skills add deepgram/deepgram-swift-sdk # Swift
npx skills add deepgram/deepgram-kotlin-sdk # Kotlin
npx skills add deepgram/deepgram-dotnet-sdk # C# / .NET
npx skills add deepgram/deepgram-browser-sdk # Browser TypeScript
npx skills add deepgram/deepgram-python-sdk # Python
npx skills add deepgram/deepgram-js-sdk # JavaScript / TypeScript
npx skills add deepgram/deepgram-java-sdk # Java
npx skills add deepgram/deepgram-go-sdk # Go
npx skills add deepgram/deepgram-rust-sdk # Rust
npx skills add deepgram/deepgram-swift-sdk # Swift
npx skills add deepgram/deepgram-kotlin-sdk # Kotlin
npx skills add deepgram/deepgram-dotnet-sdk # C# / .NET
npx skills add deepgram/deepgram-browser-sdk # Browser TypeScript
Or install a specific product skill from one SDK (note the deepgram-{lang}- prefix)
或安装某个SDK的特定产品技能(注意deepgram-{lang}-前缀)
npx skills add deepgram/deepgram-python-sdk --skill deepgram-python-speech-to-text
npx skills add deepgram/deepgram-js-sdk --skill deepgram-js-voice-agent
undefinednpx skills add deepgram/deepgram-python-sdk --skill deepgram-python-speech-to-text
npx skills add deepgram/deepgram-js-sdk --skill deepgram-js-voice-agent
undefinedRelated Deepgram skills
相关Deepgram技能
| Skill | Purpose |
|---|---|
| Minimal runnable snippets per feature per language |
| Full integration examples with third-party platforms (Twilio, LiveKit, etc.) |
| Runnable starter apps (framework × feature matrix) |
| Navigate Deepgram documentation |
| Install the Deepgram MCP server |
| 技能 | 用途 |
|---|---|
| 各语言各功能的极简可运行代码片段 |
| 与第三方平台(Twilio、LiveKit等)的完整集成示例 |
| 可运行的启动应用(框架×功能矩阵) |
| 浏览Deepgram文档 |
| 安装Deepgram MCP服务器 |