AssemblyAI Speech-to-Text and Voice AI
AssemblyAI provides speech-to-text APIs, audio intelligence models, and an LLM Gateway for applying language models to transcripts. This skill corrects common mistakes that training data gets wrong — deprecated APIs, discontinued SDKs, and non-obvious auth patterns.
Authentication
All endpoints use the same header:
Authorization: YOUR_API_KEY
NOT Authorization: Bearer ...
— just the raw API key, no Bearer prefix. This is the #1 mistake.
Base URLs
| Service | US | EU |
|---|
| REST API | https://api.assemblyai.com
| https://api.eu.assemblyai.com
|
| LLM Gateway | https://llm-gateway.assemblyai.com/v1
| https://llm-gateway.eu.assemblyai.com/v1
|
| Streaming v3 | wss://streaming.assemblyai.com/v3/ws
| wss://streaming.eu.assemblyai.com/v3/ws
|
| Streaming v2 (legacy) | wss://api.assemblyai.com/v2/realtime/ws
| — |
SDKs
| Language | Package | Status |
|---|
| Python | | Active |
| JavaScript/TypeScript | | Active |
| Ruby | gem | Active |
| Java | | Discontinued April 2025 |
| Go | | Discontinued April 2025 |
| C# .NET | NuGet | Discontinued April 2025 |
Only Python, JS/TS, and Ruby SDKs are maintained. For Java, Go, or C#, use the REST API directly.
Speech-to-Text Models
Pre-Recorded
| Model | Languages | Best For |
|---|
| Universal-3 Pro | 6 (en, es, de, fr, pt, it) | Highest accuracy, promptable transcription |
| Universal-2 | 99 | Broadest language coverage |
Use
as a priority list with fallback:
["universal-3-pro", "universal-2"]
.
Streaming
| Model | Languages | Best For |
|---|
| universal-streaming-english | 6 | Voice agents, ~300ms latency |
| universal-streaming-multilingual | 6 | Per-utterance language detection |
| whisper-rt | 99+ | Broadest streaming language support, auto-detect only |
| u3-rt-pro | 6 | Voice agents — punctuation-based turn detection, promptable |
Prompting (Universal-3 Pro only)
Two mutually exclusive customization parameters:
- (string, up to 1500 words): Natural language instructions for transcription style
- (string[], up to 1000 terms): Domain vocabulary for proper nouns, brands, technical terms
Prompting best practices:
- Use positive, authoritative instructions — NEVER use negative phrasing ("Don't", "Avoid", "Never") as the model gets confused
- Limit to 3-6 instructions for best results
- Prefix critical instructions with "Non-negotiable:" or "Required:"
LeMUR is Deprecated
LeMUR is deprecated (sunset March 31, 2026). Use the LLM Gateway instead. The LLM Gateway is an OpenAI-compatible API. Key difference: you pass transcript text directly in messages (no
). Transcribe first, then include
in your prompt.
See
references/llm-gateway.md
for models, tool calling, structured outputs, and examples.
Key Gotchas
| Gotcha | Details |
|---|
| + | Mutually exclusive — use one or the other |
| / | Deprecated. Use LLM Gateway instead (transcribe → send text to LLM) |
| PII redaction scope | Only redacts words in — other feature outputs (entities, summaries) may still expose sensitive data |
| Upload key scoping | Files uploaded with one API key project cannot be transcribed with a different project's key |
| Structured outputs | NOT supported by Claude models through LLM Gateway — only OpenAI and Gemini |
| U3 Pro turn detection | Uses punctuation ( ), NOT confidence thresholds — end_of_turn_confidence_threshold
has no effect |
| Negative prompts | Never use "Don't" or "Avoid" in prompts — rephrase as positive instructions |
| PII audio redaction method | override_audio_redaction_method: "silence"
replaces PII with silence instead of default beep |
| Language detection | Requires minimum 15 seconds of spoken audio for reliable results |
| LLM Gateway EU region | Only Anthropic Claude and Google Gemini models available — OpenAI models are NOT supported in EU |
| Disfluencies | works on Universal-2 only; for U3 Pro, use prompting instead |
Common Mistakes
| Mistake | Correction |
|---|
Authorization: Bearer KEY
| (no Bearer prefix) |
| Using LeMUR API | Deprecated. Use LLM Gateway instead |
| Using or | Deprecated. Use LLM Gateway instead (transcribe then summarize via LLM) |
| LeMUR with LLM Gateway | Pass transcript text in messages, not IDs |
| model IDs | No provider prefix: claude-sonnet-4-5-20250929
not anthropic/claude-sonnet-4-5-20250929
|
| Using Java/Go/C# SDKs | Discontinued. Use Python, JS/TS, Ruby, or raw API |
| parameter | Use instead |
| Hardcoding v2 streaming URL | v3 () is current; v2 still works but is legacy |
| Not using | Specify model priority list: ["universal-3-pro", "universal-2"]
|
Reference Files
Read the relevant reference file based on what the user needs:
| File | When to read |
|---|
| Python SDK patterns and examples |
| JavaScript/TypeScript SDK patterns |
| Real-time/streaming STT, v3 protocol, temp tokens, error codes |
references/voice-agents.md
| Voice agent integrations: LiveKit, Pipecat, turn detection, latency optimization |
references/llm-gateway.md
| Applying LLMs to transcripts, tool calling, available models |
references/speech-understanding.md
| Translation, speaker identification, custom formatting |
references/audio-intelligence.md
| PII redaction, diarization, summarization, sentiment, chapters |
references/api-reference.md
| Full parameter list, export endpoints, webhooks, upload, PII policies |
API Spec Source of Truth