assemblyai

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

AssemblyAI Speech-to-Text and Voice AI

AssemblyAI 语音转文字与语音AI

AssemblyAI provides speech-to-text APIs, audio intelligence models, and an LLM Gateway for applying language models to transcripts. This skill corrects common mistakes that training data gets wrong — deprecated APIs, discontinued SDKs, and non-obvious auth patterns.
AssemblyAI提供语音转文字API、音频智能模型,以及可将大语言模型应用于转录内容的LLM Gateway。本技能可修正训练数据常犯的错误——包括废弃API、已停止维护的SDK,以及不易察觉的鉴权模式。

Authentication

身份验证

All endpoints use the same header:
Authorization: YOUR_API_KEY
NOT
Authorization: Bearer ...
— just the raw API key, no Bearer prefix. This is the #1 mistake.
所有端点使用统一的请求头:
Authorization: YOUR_API_KEY
不要使用
Authorization: Bearer ...
格式,直接传入原始API密钥即可,无需Bearer前缀,这是最常见的错误。

Base URLs

基础URL

ServiceUSEU
REST API
https://api.assemblyai.com
https://api.eu.assemblyai.com
LLM Gateway
https://llm-gateway.assemblyai.com/v1
https://llm-gateway.eu.assemblyai.com/v1
Streaming v3
wss://streaming.assemblyai.com/v3/ws
wss://streaming.eu.assemblyai.com/v3/ws
Streaming v2 (legacy)
wss://api.assemblyai.com/v2/realtime/ws
服务美国区欧盟区
REST API
https://api.assemblyai.com
https://api.eu.assemblyai.com
LLM Gateway
https://llm-gateway.assemblyai.com/v1
https://llm-gateway.eu.assemblyai.com/v1
流式API v3
wss://streaming.assemblyai.com/v3/ws
wss://streaming.eu.assemblyai.com/v3/ws
流式API v2( legacy版本)
wss://api.assemblyai.com/v2/realtime/ws

SDKs

SDKs

LanguagePackageStatus
Python
pip install assemblyai
Active
JavaScript/TypeScript
npm i assemblyai
Active
Ruby
assemblyai
gem
Active
Java
assemblyai-java-sdk
Discontinued April 2025
Go
assemblyai-go-sdk
Discontinued April 2025
C# .NET
AssemblyAI
NuGet
Discontinued April 2025
Only Python, JS/TS, and Ruby SDKs are maintained. For Java, Go, or C#, use the REST API directly.
开发语言安装包维护状态
Python
pip install assemblyai
活跃维护
JavaScript/TypeScript
npm i assemblyai
活跃维护
Ruby
assemblyai
gem
活跃维护
Java
assemblyai-java-sdk
2025年4月停止维护
Go
assemblyai-go-sdk
2025年4月停止维护
C# .NET
AssemblyAI
NuGet
2025年4月停止维护
仅Python、JS/TS和Ruby SDK仍在维护。 若使用Java、Go或C#,请直接调用REST API。

Speech-to-Text Models

语音转文字模型

Pre-Recorded

预录音频场景

ModelLanguagesBest For
Universal-3 Pro6 (en, es, de, fr, pt, it)Highest accuracy, promptable transcription
Universal-299Broadest language coverage
Use
speech_models
as a priority list with fallback:
["universal-3-pro", "universal-2"]
.
模型支持语言适用场景
Universal-3 Pro6种(en, es, de, fr, pt, it)最高准确率,支持提示词自定义转录
Universal-299种最广语言覆盖范围
建议将
speech_models
设置为带降级的优先级列表:
["universal-3-pro", "universal-2"]

Streaming

流式音频场景

ModelLanguagesBest For
universal-streaming-english6Voice agents, ~300ms latency
universal-streaming-multilingual6Per-utterance language detection
whisper-rt99+Broadest streaming language support, auto-detect only
u3-rt-pro6Voice agents — punctuation-based turn detection, promptable
模型支持语言适用场景
universal-streaming-english6种语音代理,延迟约300ms
universal-streaming-multilingual6种按语句自动检测语言
whisper-rt99种以上最广流式语言支持,仅支持自动检测
u3-rt-pro6种语音代理——基于标点的话轮检测,支持提示词

Prompting (Universal-3 Pro only)

提示词功能(仅Universal-3 Pro支持)

Two mutually exclusive customization parameters:
  • prompt
    (string, up to 1500 words): Natural language instructions for transcription style
  • keyterms_prompt
    (string[], up to 1000 terms): Domain vocabulary for proper nouns, brands, technical terms
Prompting best practices:
  • Use positive, authoritative instructions — NEVER use negative phrasing ("Don't", "Avoid", "Never") as the model gets confused
  • Limit to 3-6 instructions for best results
  • Prefix critical instructions with "Non-negotiable:" or "Required:"
两个互斥的自定义参数:
  • prompt
    (字符串,最长1500词):针对转录风格的自然语言指令
  • keyterms_prompt
    (字符串数组,最多1000个术语):专有名词、品牌、技术术语等领域词汇
提示词最佳实践:
  • 使用肯定、明确的指令——绝对不要使用否定表述("Don't"、"Avoid"、"Never"),否则会导致模型混淆
  • 最多保留3-6条指令效果最佳
  • 关键指令前加上"Non-negotiable:"或"Required:"前缀

LeMUR is Deprecated

LeMUR已废弃

LeMUR is deprecated (sunset March 31, 2026). Use the LLM Gateway instead. The LLM Gateway is an OpenAI-compatible API. Key difference: you pass transcript text directly in messages (no
transcript_ids
). Transcribe first, then include
transcript.text
in your prompt.
See
references/llm-gateway.md
for models, tool calling, structured outputs, and examples.
LeMUR已废弃(2026年3月31日下线)。 请改用LLM Gateway。LLM Gateway是兼容OpenAI规范的API。核心差异:你需要直接在消息中传入转录文本(无需
transcript_ids
)。先完成转录,再将
transcript.text
包含到你的提示词中。
可查看
references/llm-gateway.md
了解支持的模型、工具调用、结构化输出以及示例。

Key Gotchas

常见陷阱

GotchaDetails
prompt
+
keyterms_prompt
Mutually exclusive — use one or the other
summarization
/
auto_chapters
Deprecated. Use LLM Gateway instead (transcribe → send text to LLM)
PII redaction scopeOnly redacts words in
text
— other feature outputs (entities, summaries) may still expose sensitive data
Upload key scopingFiles uploaded with one API key project cannot be transcribed with a different project's key
Structured outputsNOT supported by Claude models through LLM Gateway — only OpenAI and Gemini
U3 Pro turn detectionUses punctuation (
.
?
!
), NOT confidence thresholds —
end_of_turn_confidence_threshold
has no effect
Negative promptsNever use "Don't" or "Avoid" in prompts — rephrase as positive instructions
PII audio redaction method
override_audio_redaction_method: "silence"
replaces PII with silence instead of default beep
Language detectionRequires minimum 15 seconds of spoken audio for reliable results
LLM Gateway EU regionOnly Anthropic Claude and Google Gemini models available — OpenAI models are NOT supported in EU
Disfluencies
disfluencies: true
works on Universal-2 only; for U3 Pro, use prompting instead
陷阱详情
prompt
+
keyterms_prompt
互斥参数——二者只能选其一
summarization
/
auto_chapters
已废弃。请改用LLM Gateway(先转录→再将文本发送给LLM处理)
PII脱敏范围仅会脱敏
text
字段中的词汇——其他功能输出(实体、摘要)可能仍会暴露敏感数据
上传密钥权限隔离用一个API密钥项目上传的文件,无法用其他项目的密钥进行转录
结构化输出LLM Gateway的Claude模型不支持结构化输出——仅OpenAI和Gemini模型支持
U3 Pro话轮检测基于标点(
.
?
!
)实现,而非置信度阈值——
end_of_turn_confidence_threshold
参数无效
否定提示词不要在提示词中使用"Don't"或"Avoid"这类表述——请改写为肯定指令
PII音频脱敏方式设置
override_audio_redaction_method: "silence"
可将PII替换为静音,而非默认的蜂鸣声
语言检测需要至少15秒的语音音频才能得到可靠的检测结果
LLM Gateway欧盟区仅支持Anthropic Claude和Google Gemini模型——不支持OpenAI模型
非流畅语气词
disfluencies: true
仅在Universal-2模型生效;U3 Pro请使用提示词实现该需求

Common Mistakes

常见错误

MistakeCorrection
Authorization: Bearer KEY
Authorization: KEY
(no Bearer prefix)
Using LeMUR APIDeprecated. Use LLM Gateway instead
Using
summarization
or
auto_chapters
Deprecated. Use LLM Gateway instead (transcribe then summarize via LLM)
LeMUR
transcript_ids
with LLM Gateway
Pass transcript text in messages, not IDs
anthropic/claude-...
model IDs
No provider prefix:
claude-sonnet-4-5-20250929
not
anthropic/claude-sonnet-4-5-20250929
Using Java/Go/C# SDKsDiscontinued. Use Python, JS/TS, Ruby, or raw API
word_boost
parameter
Use
keyterms_prompt
instead
Hardcoding v2 streaming URLv3 (
/v3/ws
) is current; v2 still works but is legacy
Not using
speech_models
Specify model priority list:
["universal-3-pro", "universal-2"]
错误修正方案
Authorization: Bearer KEY
Authorization: KEY
(无需Bearer前缀)
使用LeMUR API已废弃,请改用LLM Gateway
使用
summarization
auto_chapters
已废弃,请改用LLM Gateway(先转录再通过LLM生成摘要)
在LLM Gateway中使用LeMUR的
transcript_ids
直接在消息中传入转录文本,不要传ID
使用
anthropic/claude-...
格式的模型ID
不要加服务商前缀:使用
claude-sonnet-4-5-20250929
而非
anthropic/claude-sonnet-4-5-20250929
使用Java/Go/C# SDK已停止维护,请使用Python、JS/TS、Ruby SDK或是直接调用原生API
使用
word_boost
参数
请改用
keyterms_prompt
硬编码v2版本的流式URL当前最新版本是v3(
/v3/ws
);v2仍可使用但属于legacy版本
未设置
speech_models
指定模型优先级列表:
["universal-3-pro", "universal-2"]

Reference Files

参考文件

Read the relevant reference file based on what the user needs:
FileWhen to read
references/python-sdk.md
Python SDK patterns and examples
references/js-sdk.md
JavaScript/TypeScript SDK patterns
references/streaming.md
Real-time/streaming STT, v3 protocol, temp tokens, error codes
references/voice-agents.md
Voice agent integrations: LiveKit, Pipecat, turn detection, latency optimization
references/llm-gateway.md
Applying LLMs to transcripts, tool calling, available models
references/speech-understanding.md
Translation, speaker identification, custom formatting
references/audio-intelligence.md
PII redaction, diarization, summarization, sentiment, chapters
references/api-reference.md
Full parameter list, export endpoints, webhooks, upload, PII policies
根据用户需求阅读对应的参考文件:
文件路径适用场景
references/python-sdk.md
Python SDK使用规范与示例
references/js-sdk.md
JavaScript/TypeScript SDK使用规范
references/streaming.md
实时/流式STT、v3协议、临时令牌、错误码
references/voice-agents.md
语音代理集成:LiveKit、Pipecat、话轮检测、延迟优化
references/llm-gateway.md
将LLM应用于转录内容、工具调用、可用模型
references/speech-understanding.md
翻译、说话人识别、自定义格式化
references/audio-intelligence.md
PII脱敏、说话人分箱、摘要、情感分析、章节拆分
references/api-reference.md
完整参数列表、导出端点、Webhook、上传、PII政策

API Spec Source of Truth

API规范权威来源