Loading...
Loading...
Use when implementing speech-to-text, audio transcription, real-time streaming STT, audio intelligence features, or voice AI using AssemblyAI APIs or SDKs. Use when user mentions AssemblyAI, voice agents, transcription, speaker diarization, PII redaction of audio, LLM Gateway for audio understanding, or applying LLMs to transcripts. Also use when building voice agents with LiveKit or Pipecat that need speech-to-text, or when the user is working with any audio/video processing pipeline that could benefit from transcription, even if they don't mention AssemblyAI by name.
npx skill4agent add assemblyai/assemblyai-skill assemblyaiAuthorization: YOUR_API_KEYAuthorization: Bearer ...| Service | US | EU |
|---|---|---|
| REST API | | |
| LLM Gateway | | |
| Streaming v3 | | |
| Streaming v2 (legacy) | | — |
| Language | Package | Status |
|---|---|---|
| Python | | Active |
| JavaScript/TypeScript | | Active |
| Ruby | | Active |
| Java | | Discontinued April 2025 |
| Go | | Discontinued April 2025 |
| C# .NET | | Discontinued April 2025 |
| Model | Languages | Best For |
|---|---|---|
| Universal-3 Pro | 6 (en, es, de, fr, pt, it) | Highest accuracy, promptable transcription |
| Universal-2 | 99 | Broadest language coverage |
speech_models["universal-3-pro", "universal-2"]| Model | Languages | Best For |
|---|---|---|
| universal-streaming-english | 6 | Voice agents, ~300ms latency |
| universal-streaming-multilingual | 6 | Per-utterance language detection |
| whisper-rt | 99+ | Broadest streaming language support, auto-detect only |
| u3-rt-pro | 6 | Voice agents — punctuation-based turn detection, promptable |
promptkeyterms_prompttranscript_idstranscript.textreferences/llm-gateway.md| Gotcha | Details |
|---|---|
| Mutually exclusive — use one or the other |
| Deprecated. Use LLM Gateway instead (transcribe → send text to LLM) |
| PII redaction scope | Only redacts words in |
| Upload key scoping | Files uploaded with one API key project cannot be transcribed with a different project's key |
| Structured outputs | NOT supported by Claude models through LLM Gateway — only OpenAI and Gemini |
| U3 Pro turn detection | Uses punctuation ( |
| Negative prompts | Never use "Don't" or "Avoid" in prompts — rephrase as positive instructions |
| PII audio redaction method | |
| Language detection | Requires minimum 15 seconds of spoken audio for reliable results |
| LLM Gateway EU region | Only Anthropic Claude and Google Gemini models available — OpenAI models are NOT supported in EU |
| Disfluencies | |
| Mistake | Correction |
|---|---|
| |
| Using LeMUR API | Deprecated. Use LLM Gateway instead |
Using | Deprecated. Use LLM Gateway instead (transcribe then summarize via LLM) |
LeMUR | Pass transcript text in messages, not IDs |
| No provider prefix: |
| Using Java/Go/C# SDKs | Discontinued. Use Python, JS/TS, Ruby, or raw API |
| Use |
| Hardcoding v2 streaming URL | v3 ( |
Not using | Specify model priority list: |
| File | When to read |
|---|---|
| Python SDK patterns and examples |
| JavaScript/TypeScript SDK patterns |
| Real-time/streaming STT, v3 protocol, temp tokens, error codes |
| Voice agent integrations: LiveKit, Pipecat, turn detection, latency optimization |
| Applying LLMs to transcripts, tool calling, available models |
| Translation, speaker identification, custom formatting |
| PII redaction, diarization, summarization, sentiment, chapters |
| Full parameter list, export endpoints, webhooks, upload, PII policies |