Loading...
Loading...
Deepgram API reference for speech-to-text, text-to-speech, voice agents, audio intelligence, and account management. Use whenever building with Deepgram APIs — REST or WebSocket. Covers authentication, all endpoints, query parameters, request/response schemas, and WebSocket message formats. Reference files are organized by domain: listen (STT), speak (TTS), agent (voice agents), read (text/audio intelligence), models, projects, auth, and self-hosted.
npx skill4agent add deepgram/skills apiAuthorization: Token <API_KEY>Authorization: Bearer <JWT>https://api.deepgram.comhttps://agent.deepgram.com ┌──────────────────────────────┐
│ api.deepgram.com │
└──────────────────────────────┘
│
┌──────────────┬──────────────┼──────────────┬──────────────┐
▼ ▼ ▼ ▼ ▼
/v1/listen /v2/listen /v1/speak /v1/read /v1/projects/*
Nova — ASR Flux — conv. TTS Text AI Management
REST or WSS WSS only REST or WSS REST only REST only
┌──────────────────────────────┐
│ agent.deepgram.com │
└──────────────────────────────┘
│
▼
/v1/agent/converse
WebSocket only
audio ──▶ STT ──▶ LLM ──▶ TTS ──▶ audio
(Deepgram orchestrates the full pipeline)Audio → text (transcription)?
├─ General-purpose transcription (captions, batch, call logs, live streams with custom turn logic)
│ └─ Nova models via /v1/listen
│ ├─ Pre-recorded file → REST POST https://api.deepgram.com/v1/listen?model=nova-3
│ └─ Live stream → WSS wss://api.deepgram.com/v1/listen?model=nova-3
│
└─ Conversational audio / voice-agent-style turn detection
└─ Flux models via /v2/listen
└─ Live stream → WSS wss://api.deepgram.com/v2/listen?model=flux-general-en
Text → audio?
├─ One-shot → REST POST /v1/speak
└─ Low-latency stream → WSS wss://api.deepgram.com/v1/speak
Full conversational voice agent (audio in, audio out)?
└─ WSS wss://agent.deepgram.com/v1/agent/converse
Deepgram handles STT + your configured LLM + TTS internally
Analyze text for insights?
└─ REST POST /v1/read
(summaries, sentiment, topics, intents)/v1/listen/v2/listenNova ( | Flux ( | |
|---|---|---|
| Endpoint | | |
| Available models | | |
| Best for | General transcription — captions, subtitles, call logs, batch | Conversational audio — voice agents, interactive assistants, turn-taking UIs |
| Output | Continuous transcript stream | Structured turn events + transcripts (built-in turn state machine) |
| Turn detection | Manual ( | Built-in (EOT, eager-EOT, turn_index) |
| Transports | REST + WebSocket | WebSocket only |
| Intelligence overlays | Yes — | No — smaller focused param set; no |
| Mid-session reconfig | No (reconnect to change) | Yes ( |
/v1/listenmodel=nova-3summarizesentimenttopicsintentsdiarizeredact/v2/listenmodel=flux-general-en| Domain | REST | WebSocket | Reference |
|---|---|---|---|
| Listen v1 — STT, Nova models | | | listen.md |
| Listen v2 — STT, Flux (conversational) | — | | listen.md |
| Speak (TTS) | | | speak.md |
| Voice Agent | | | agent.md |
| Read (Intelligence) | | — | read.md |
| Models | | — | models.md |
| Projects | | — | projects.md |
| Auth | | — | auth.md |
| Self-Hosted | | — | self-hosted.md |
/v1/listen/v2/listen/v1/speak/v1/agent/converseSettings/v2/listenConfigure/v2/listen/v1/listensmart_formatdiarizepunctuate/v1/listen{"type":"KeepAlive"}/v1/listenencodingencoding=linear16encoding/v1/speakSpeaktext/v1/agent/converseSettings/v2/listenmodel=flux-general-en/v1/listenmodel=fluxlanguageencodingConfigure/v1/listen{ "type": "Configure", "thresholds": { "eot_threshold": "0.8", "eot_timeout_ms": "3000" }, "keyterms": ["Deepgram"] }ConfigureSuccessConfigureFailureapideepgram-{lang}-{product}deepgram-python-speech-to-textdeepgram-js-voice-agentdeepgram-{lang}-maintaining-sdkdeepgram-{lang}-# Install all skills from a specific SDK
npx skills add deepgram/deepgram-python-sdk # Python
npx skills add deepgram/deepgram-js-sdk # JavaScript / TypeScript
npx skills add deepgram/deepgram-java-sdk # Java
npx skills add deepgram/deepgram-go-sdk # Go
npx skills add deepgram/deepgram-rust-sdk # Rust
npx skills add deepgram/deepgram-swift-sdk # Swift
npx skills add deepgram/deepgram-kotlin-sdk # Kotlin
npx skills add deepgram/deepgram-dotnet-sdk # C# / .NET
npx skills add deepgram/deepgram-browser-sdk # Browser TypeScript
# Or install a specific product skill from one SDK (note the deepgram-{lang}- prefix)
npx skills add deepgram/deepgram-python-sdk --skill deepgram-python-speech-to-text
npx skills add deepgram/deepgram-js-sdk --skill deepgram-js-voice-agent| Skill | Purpose |
|---|---|
| Minimal runnable snippets per feature per language |
| Full integration examples with third-party platforms (Twilio, LiveKit, etc.) |
| Runnable starter apps (framework × feature matrix) |
| Navigate Deepgram documentation |
| Install the Deepgram MCP server |