assemblyai
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAssemblyAI Speech-to-Text and Voice AI
AssemblyAI 语音转文字与语音AI
AssemblyAI provides speech-to-text APIs, audio intelligence models, and an LLM Gateway for applying language models to transcripts. This skill corrects common mistakes that training data gets wrong — deprecated APIs, discontinued SDKs, and non-obvious auth patterns.
AssemblyAI提供语音转文字API、音频智能模型,以及可将大语言模型应用于转录内容的LLM Gateway。本技能可修正训练数据常犯的错误——包括废弃API、已停止维护的SDK,以及不易察觉的鉴权模式。
Authentication
身份验证
All endpoints use the same header:
Authorization: YOUR_API_KEYNOT — just the raw API key, no Bearer prefix. This is the #1 mistake.
Authorization: Bearer ...所有端点使用统一的请求头:
Authorization: YOUR_API_KEY不要使用格式,直接传入原始API密钥即可,无需Bearer前缀,这是最常见的错误。
Authorization: Bearer ...Base URLs
基础URL
| Service | US | EU |
|---|---|---|
| REST API | | |
| LLM Gateway | | |
| Streaming v3 | | |
| Streaming v2 (legacy) | | — |
| 服务 | 美国区 | 欧盟区 |
|---|---|---|
| REST API | | |
| LLM Gateway | | |
| 流式API v3 | | |
| 流式API v2( legacy版本) | | — |
SDKs
SDKs
| Language | Package | Status |
|---|---|---|
| Python | | Active |
| JavaScript/TypeScript | | Active |
| Ruby | | Active |
| Java | | Discontinued April 2025 |
| Go | | Discontinued April 2025 |
| C# .NET | | Discontinued April 2025 |
Only Python, JS/TS, and Ruby SDKs are maintained. For Java, Go, or C#, use the REST API directly.
| 开发语言 | 安装包 | 维护状态 |
|---|---|---|
| Python | | 活跃维护 |
| JavaScript/TypeScript | | 活跃维护 |
| Ruby | | 活跃维护 |
| Java | | 2025年4月停止维护 |
| Go | | 2025年4月停止维护 |
| C# .NET | | 2025年4月停止维护 |
仅Python、JS/TS和Ruby SDK仍在维护。 若使用Java、Go或C#,请直接调用REST API。
Speech-to-Text Models
语音转文字模型
Pre-Recorded
预录音频场景
| Model | Languages | Best For |
|---|---|---|
| Universal-3 Pro | 6 (en, es, de, fr, pt, it) | Highest accuracy, promptable transcription |
| Universal-2 | 99 | Broadest language coverage |
Use as a priority list with fallback: .
speech_models["universal-3-pro", "universal-2"]| 模型 | 支持语言 | 适用场景 |
|---|---|---|
| Universal-3 Pro | 6种(en, es, de, fr, pt, it) | 最高准确率,支持提示词自定义转录 |
| Universal-2 | 99种 | 最广语言覆盖范围 |
建议将设置为带降级的优先级列表:。
speech_models["universal-3-pro", "universal-2"]Streaming
流式音频场景
| Model | Languages | Best For |
|---|---|---|
| universal-streaming-english | 6 | Voice agents, ~300ms latency |
| universal-streaming-multilingual | 6 | Per-utterance language detection |
| whisper-rt | 99+ | Broadest streaming language support, auto-detect only |
| u3-rt-pro | 6 | Voice agents — punctuation-based turn detection, promptable |
| 模型 | 支持语言 | 适用场景 |
|---|---|---|
| universal-streaming-english | 6种 | 语音代理,延迟约300ms |
| universal-streaming-multilingual | 6种 | 按语句自动检测语言 |
| whisper-rt | 99种以上 | 最广流式语言支持,仅支持自动检测 |
| u3-rt-pro | 6种 | 语音代理——基于标点的话轮检测,支持提示词 |
Prompting (Universal-3 Pro only)
提示词功能(仅Universal-3 Pro支持)
Two mutually exclusive customization parameters:
- (string, up to 1500 words): Natural language instructions for transcription style
prompt - (string[], up to 1000 terms): Domain vocabulary for proper nouns, brands, technical terms
keyterms_prompt
Prompting best practices:
- Use positive, authoritative instructions — NEVER use negative phrasing ("Don't", "Avoid", "Never") as the model gets confused
- Limit to 3-6 instructions for best results
- Prefix critical instructions with "Non-negotiable:" or "Required:"
两个互斥的自定义参数:
- (字符串,最长1500词):针对转录风格的自然语言指令
prompt - (字符串数组,最多1000个术语):专有名词、品牌、技术术语等领域词汇
keyterms_prompt
提示词最佳实践:
- 使用肯定、明确的指令——绝对不要使用否定表述("Don't"、"Avoid"、"Never"),否则会导致模型混淆
- 最多保留3-6条指令效果最佳
- 关键指令前加上"Non-negotiable:"或"Required:"前缀
LeMUR is Deprecated
LeMUR已废弃
LeMUR is deprecated (sunset March 31, 2026). Use the LLM Gateway instead. The LLM Gateway is an OpenAI-compatible API. Key difference: you pass transcript text directly in messages (no ). Transcribe first, then include in your prompt.
transcript_idstranscript.textSee for models, tool calling, structured outputs, and examples.
references/llm-gateway.mdLeMUR已废弃(2026年3月31日下线)。 请改用LLM Gateway。LLM Gateway是兼容OpenAI规范的API。核心差异:你需要直接在消息中传入转录文本(无需)。先完成转录,再将包含到你的提示词中。
transcript_idstranscript.text可查看了解支持的模型、工具调用、结构化输出以及示例。
references/llm-gateway.mdKey Gotchas
常见陷阱
| Gotcha | Details |
|---|---|
| Mutually exclusive — use one or the other |
| Deprecated. Use LLM Gateway instead (transcribe → send text to LLM) |
| PII redaction scope | Only redacts words in |
| Upload key scoping | Files uploaded with one API key project cannot be transcribed with a different project's key |
| Structured outputs | NOT supported by Claude models through LLM Gateway — only OpenAI and Gemini |
| U3 Pro turn detection | Uses punctuation ( |
| Negative prompts | Never use "Don't" or "Avoid" in prompts — rephrase as positive instructions |
| PII audio redaction method | |
| Language detection | Requires minimum 15 seconds of spoken audio for reliable results |
| LLM Gateway EU region | Only Anthropic Claude and Google Gemini models available — OpenAI models are NOT supported in EU |
| Disfluencies | |
| 陷阱 | 详情 |
|---|---|
| 互斥参数——二者只能选其一 |
| 已废弃。请改用LLM Gateway(先转录→再将文本发送给LLM处理) |
| PII脱敏范围 | 仅会脱敏 |
| 上传密钥权限隔离 | 用一个API密钥项目上传的文件,无法用其他项目的密钥进行转录 |
| 结构化输出 | LLM Gateway的Claude模型不支持结构化输出——仅OpenAI和Gemini模型支持 |
| U3 Pro话轮检测 | 基于标点( |
| 否定提示词 | 不要在提示词中使用"Don't"或"Avoid"这类表述——请改写为肯定指令 |
| PII音频脱敏方式 | 设置 |
| 语言检测 | 需要至少15秒的语音音频才能得到可靠的检测结果 |
| LLM Gateway欧盟区 | 仅支持Anthropic Claude和Google Gemini模型——不支持OpenAI模型 |
| 非流畅语气词 | |
Common Mistakes
常见错误
| Mistake | Correction |
|---|---|
| |
| Using LeMUR API | Deprecated. Use LLM Gateway instead |
Using | Deprecated. Use LLM Gateway instead (transcribe then summarize via LLM) |
LeMUR | Pass transcript text in messages, not IDs |
| No provider prefix: |
| Using Java/Go/C# SDKs | Discontinued. Use Python, JS/TS, Ruby, or raw API |
| Use |
| Hardcoding v2 streaming URL | v3 ( |
Not using | Specify model priority list: |
| 错误 | 修正方案 |
|---|---|
| |
| 使用LeMUR API | 已废弃,请改用LLM Gateway |
使用 | 已废弃,请改用LLM Gateway(先转录再通过LLM生成摘要) |
在LLM Gateway中使用LeMUR的 | 直接在消息中传入转录文本,不要传ID |
使用 | 不要加服务商前缀:使用 |
| 使用Java/Go/C# SDK | 已停止维护,请使用Python、JS/TS、Ruby SDK或是直接调用原生API |
使用 | 请改用 |
| 硬编码v2版本的流式URL | 当前最新版本是v3( |
未设置 | 指定模型优先级列表: |
Reference Files
参考文件
Read the relevant reference file based on what the user needs:
| File | When to read |
|---|---|
| Python SDK patterns and examples |
| JavaScript/TypeScript SDK patterns |
| Real-time/streaming STT, v3 protocol, temp tokens, error codes |
| Voice agent integrations: LiveKit, Pipecat, turn detection, latency optimization |
| Applying LLMs to transcripts, tool calling, available models |
| Translation, speaker identification, custom formatting |
| PII redaction, diarization, summarization, sentiment, chapters |
| Full parameter list, export endpoints, webhooks, upload, PII policies |
根据用户需求阅读对应的参考文件:
| 文件路径 | 适用场景 |
|---|---|
| Python SDK使用规范与示例 |
| JavaScript/TypeScript SDK使用规范 |
| 实时/流式STT、v3协议、临时令牌、错误码 |
| 语音代理集成:LiveKit、Pipecat、话轮检测、延迟优化 |
| 将LLM应用于转录内容、工具调用、可用模型 |
| 翻译、说话人识别、自定义格式化 |
| PII脱敏、说话人分箱、摘要、情感分析、章节拆分 |
| 完整参数列表、导出端点、Webhook、上传、PII政策 |