assemblyai
Original:🇺🇸 English
Translated
Use when implementing speech-to-text, audio transcription, real-time streaming STT, audio intelligence features, or voice AI using AssemblyAI APIs or SDKs. Use when user mentions AssemblyAI, voice agents, transcription, speaker diarization, PII redaction of audio, LLM Gateway for audio understanding, or applying LLMs to transcripts. Also use when building voice agents with LiveKit or Pipecat that need speech-to-text, or when the user is working with any audio/video processing pipeline that could benefit from transcription, even if they don't mention AssemblyAI by name.
8installs
Added on
NPX Install
npx skill4agent add assemblyai/assemblyai-skill assemblyaiTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →AssemblyAI Speech-to-Text and Voice AI
AssemblyAI provides speech-to-text APIs, audio intelligence models, and an LLM Gateway for applying language models to transcripts. This skill corrects common mistakes that training data gets wrong — deprecated APIs, discontinued SDKs, and non-obvious auth patterns.
Authentication
All endpoints use the same header:
Authorization: YOUR_API_KEYNOT — just the raw API key, no Bearer prefix. This is the #1 mistake.
Authorization: Bearer ...Base URLs
| Service | US | EU |
|---|---|---|
| REST API | | |
| LLM Gateway | | |
| Streaming v3 | | |
| Streaming v2 (legacy) | | — |
SDKs
| Language | Package | Status |
|---|---|---|
| Python | | Active |
| JavaScript/TypeScript | | Active |
| Ruby | | Active |
| Java | | Discontinued April 2025 |
| Go | | Discontinued April 2025 |
| C# .NET | | Discontinued April 2025 |
Only Python, JS/TS, and Ruby SDKs are maintained. For Java, Go, or C#, use the REST API directly.
Speech-to-Text Models
Pre-Recorded
| Model | Languages | Best For |
|---|---|---|
| Universal-3 Pro | 6 (en, es, de, fr, pt, it) | Highest accuracy, promptable transcription |
| Universal-2 | 99 | Broadest language coverage |
Use as a priority list with fallback: .
speech_models["universal-3-pro", "universal-2"]Streaming
| Model | Languages | Best For |
|---|---|---|
| universal-streaming-english | 6 | Voice agents, ~300ms latency |
| universal-streaming-multilingual | 6 | Per-utterance language detection |
| whisper-rt | 99+ | Broadest streaming language support, auto-detect only |
| u3-rt-pro | 6 | Voice agents — punctuation-based turn detection, promptable |
Prompting (Universal-3 Pro only)
Two mutually exclusive customization parameters:
- (string, up to 1500 words): Natural language instructions for transcription style
prompt - (string[], up to 1000 terms): Domain vocabulary for proper nouns, brands, technical terms
keyterms_prompt
Prompting best practices:
- Use positive, authoritative instructions — NEVER use negative phrasing ("Don't", "Avoid", "Never") as the model gets confused
- Limit to 3-6 instructions for best results
- Prefix critical instructions with "Non-negotiable:" or "Required:"
LeMUR is Deprecated
LeMUR is deprecated (sunset March 31, 2026). Use the LLM Gateway instead. The LLM Gateway is an OpenAI-compatible API. Key difference: you pass transcript text directly in messages (no ). Transcribe first, then include in your prompt.
transcript_idstranscript.textSee for models, tool calling, structured outputs, and examples.
references/llm-gateway.mdKey Gotchas
| Gotcha | Details |
|---|---|
| Mutually exclusive — use one or the other |
| Deprecated. Use LLM Gateway instead (transcribe → send text to LLM) |
| PII redaction scope | Only redacts words in |
| Upload key scoping | Files uploaded with one API key project cannot be transcribed with a different project's key |
| Structured outputs | NOT supported by Claude models through LLM Gateway — only OpenAI and Gemini |
| U3 Pro turn detection | Uses punctuation ( |
| Negative prompts | Never use "Don't" or "Avoid" in prompts — rephrase as positive instructions |
| PII audio redaction method | |
| Language detection | Requires minimum 15 seconds of spoken audio for reliable results |
| LLM Gateway EU region | Only Anthropic Claude and Google Gemini models available — OpenAI models are NOT supported in EU |
| Disfluencies | |
Common Mistakes
| Mistake | Correction |
|---|---|
| |
| Using LeMUR API | Deprecated. Use LLM Gateway instead |
Using | Deprecated. Use LLM Gateway instead (transcribe then summarize via LLM) |
LeMUR | Pass transcript text in messages, not IDs |
| No provider prefix: |
| Using Java/Go/C# SDKs | Discontinued. Use Python, JS/TS, Ruby, or raw API |
| Use |
| Hardcoding v2 streaming URL | v3 ( |
Not using | Specify model priority list: |
Reference Files
Read the relevant reference file based on what the user needs:
| File | When to read |
|---|---|
| Python SDK patterns and examples |
| JavaScript/TypeScript SDK patterns |
| Real-time/streaming STT, v3 protocol, temp tokens, error codes |
| Voice agent integrations: LiveKit, Pipecat, turn detection, latency optimization |
| Applying LLMs to transcripts, tool calling, available models |
| Translation, speaker identification, custom formatting |
| PII redaction, diarization, summarization, sentiment, chapters |
| Full parameter list, export endpoints, webhooks, upload, PII policies |