assemblyai

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

AssemblyAI Speech-to-Text and Voice AI

AssemblyAI 语音转文字与语音AI

AssemblyAI provides speech-to-text APIs, audio intelligence models, and an LLM Gateway for applying language models to transcripts. This skill corrects common mistakes that training data gets wrong — deprecated APIs, discontinued SDKs, and non-obvious auth patterns.

AssemblyAI提供语音转文字API、音频智能模型，以及可将大语言模型应用于转录内容的LLM Gateway。本技能可修正训练数据常犯的错误——包括废弃API、已停止维护的SDK，以及不易察觉的鉴权模式。

Authentication

身份验证

All endpoints use the same header:

Authorization: YOUR_API_KEY

NOT

Authorization: Bearer ...

— just the raw API key, no Bearer prefix. This is the #1 mistake.

所有端点使用统一的请求头：

Authorization: YOUR_API_KEY

不要使用

Authorization: Bearer ...

格式，直接传入原始API密钥即可，无需Bearer前缀，这是最常见的错误。

Base URLs

基础URL

Service	US	EU
REST API	`https://api.assemblyai.com`	`https://api.eu.assemblyai.com`
LLM Gateway	`https://llm-gateway.assemblyai.com/v1`	`https://llm-gateway.eu.assemblyai.com/v1`
Streaming v3	`wss://streaming.assemblyai.com/v3/ws`	`wss://streaming.eu.assemblyai.com/v3/ws`
Streaming v2 (legacy)	`wss://api.assemblyai.com/v2/realtime/ws`	—

服务	美国区	欧盟区
REST API	`https://api.assemblyai.com`	`https://api.eu.assemblyai.com`
LLM Gateway	`https://llm-gateway.assemblyai.com/v1`	`https://llm-gateway.eu.assemblyai.com/v1`
流式API v3	`wss://streaming.assemblyai.com/v3/ws`	`wss://streaming.eu.assemblyai.com/v3/ws`
流式API v2（ legacy版本）	`wss://api.assemblyai.com/v2/realtime/ws`	—

SDKs

Language	Package	Status
Python	`pip install assemblyai`	Active
JavaScript/TypeScript	`npm i assemblyai`	Active
Ruby	`assemblyai` gem	Active
Java	`assemblyai-java-sdk`	Discontinued April 2025
Go	`assemblyai-go-sdk`	Discontinued April 2025
C# .NET	`AssemblyAI` NuGet	Discontinued April 2025

Only Python, JS/TS, and Ruby SDKs are maintained. For Java, Go, or C#, use the REST API directly.

开发语言	安装包	维护状态
Python	`pip install assemblyai`	活跃维护
JavaScript/TypeScript	`npm i assemblyai`	活跃维护
Ruby	`assemblyai` gem	活跃维护
Java	`assemblyai-java-sdk`	2025年4月停止维护
Go	`assemblyai-go-sdk`	2025年4月停止维护
C# .NET	`AssemblyAI` NuGet	2025年4月停止维护

仅Python、JS/TS和Ruby SDK仍在维护。 若使用Java、Go或C#，请直接调用REST API。

Speech-to-Text Models

语音转文字模型

Pre-Recorded

预录音频场景

Model	Languages	Best For
Universal-3 Pro	6 (en, es, de, fr, pt, it)	Highest accuracy, promptable transcription
Universal-2	99	Broadest language coverage

Use

speech_models

as a priority list with fallback:

["universal-3-pro", "universal-2"]

模型	支持语言	适用场景
Universal-3 Pro	6种（en, es, de, fr, pt, it）	最高准确率，支持提示词自定义转录
Universal-2	99种	最广语言覆盖范围

建议将

speech_models

设置为带降级的优先级列表：

["universal-3-pro", "universal-2"]

。

Streaming

流式音频场景

Model	Languages	Best For
universal-streaming-english	6	Voice agents, ~300ms latency
universal-streaming-multilingual	6	Per-utterance language detection
whisper-rt	99+	Broadest streaming language support, auto-detect only
u3-rt-pro	6	Voice agents — punctuation-based turn detection, promptable

模型	支持语言	适用场景
universal-streaming-english	6种	语音代理，延迟约300ms
universal-streaming-multilingual	6种	按语句自动检测语言
whisper-rt	99种以上	最广流式语言支持，仅支持自动检测
u3-rt-pro	6种	语音代理——基于标点的话轮检测，支持提示词

Prompting (Universal-3 Pro only)

提示词功能（仅Universal-3 Pro支持）

Two mutually exclusive customization parameters:

prompt
(string, up to 1500 words): Natural language instructions for transcription style
keyterms_prompt
(string[], up to 1000 terms): Domain vocabulary for proper nouns, brands, technical terms

Prompting best practices:

Use positive, authoritative instructions — NEVER use negative phrasing ("Don't", "Avoid", "Never") as the model gets confused
Limit to 3-6 instructions for best results
Prefix critical instructions with "Non-negotiable:" or "Required:"

两个互斥的自定义参数：

prompt
（字符串，最长1500词）：针对转录风格的自然语言指令
keyterms_prompt
（字符串数组，最多1000个术语）：专有名词、品牌、技术术语等领域词汇

提示词最佳实践：

使用肯定、明确的指令——绝对不要使用否定表述（"Don't"、"Avoid"、"Never"），否则会导致模型混淆
最多保留3-6条指令效果最佳
关键指令前加上"Non-negotiable:"或"Required:"前缀

LeMUR is Deprecated

LeMUR已废弃

LeMUR is deprecated (sunset March 31, 2026). Use the LLM Gateway instead. The LLM Gateway is an OpenAI-compatible API. Key difference: you pass transcript text directly in messages (no

transcript_ids

). Transcribe first, then include

transcript.text

in your prompt.

See

references/llm-gateway.md

for models, tool calling, structured outputs, and examples.

LeMUR已废弃（2026年3月31日下线）。 请改用LLM Gateway。LLM Gateway是兼容OpenAI规范的API。核心差异：你需要直接在消息中传入转录文本（无需

transcript_ids

）。先完成转录，再将

transcript.text

包含到你的提示词中。

可查看

references/llm-gateway.md

了解支持的模型、工具调用、结构化输出以及示例。

Key Gotchas

常见陷阱

Gotcha	Details
`prompt` + `keyterms_prompt`	Mutually exclusive — use one or the other
`summarization` / `auto_chapters`	Deprecated. Use LLM Gateway instead (transcribe → send text to LLM)
PII redaction scope	Only redacts words in `text` — other feature outputs (entities, summaries) may still expose sensitive data
Upload key scoping	Files uploaded with one API key project cannot be transcribed with a different project's key
Structured outputs	NOT supported by Claude models through LLM Gateway — only OpenAI and Gemini
U3 Pro turn detection	Uses punctuation ( `.` `?` `!` ), NOT confidence thresholds — `end_of_turn_confidence_threshold` has no effect
Negative prompts	Never use "Don't" or "Avoid" in prompts — rephrase as positive instructions
PII audio redaction method	`override_audio_redaction_method: "silence"` replaces PII with silence instead of default beep
Language detection	Requires minimum 15 seconds of spoken audio for reliable results
LLM Gateway EU region	Only Anthropic Claude and Google Gemini models available — OpenAI models are NOT supported in EU
Disfluencies	`disfluencies: true` works on Universal-2 only; for U3 Pro, use prompting instead

陷阱	详情
`prompt` + `keyterms_prompt`	互斥参数——二者只能选其一
`summarization` / `auto_chapters`	已废弃。请改用LLM Gateway（先转录→再将文本发送给LLM处理）
PII脱敏范围	仅会脱敏 `text` 字段中的词汇——其他功能输出（实体、摘要）可能仍会暴露敏感数据
上传密钥权限隔离	用一个API密钥项目上传的文件，无法用其他项目的密钥进行转录
结构化输出	LLM Gateway的Claude模型不支持结构化输出——仅OpenAI和Gemini模型支持
U3 Pro话轮检测	基于标点（ `.` `?` `!` ）实现，而非置信度阈值—— `end_of_turn_confidence_threshold` 参数无效
否定提示词	不要在提示词中使用"Don't"或"Avoid"这类表述——请改写为肯定指令
PII音频脱敏方式	设置 `override_audio_redaction_method: "silence"` 可将PII替换为静音，而非默认的蜂鸣声
语言检测	需要至少15秒的语音音频才能得到可靠的检测结果
LLM Gateway欧盟区	仅支持Anthropic Claude和Google Gemini模型——不支持OpenAI模型
非流畅语气词	`disfluencies: true` 仅在Universal-2模型生效；U3 Pro请使用提示词实现该需求

Common Mistakes

常见错误

Mistake	Correction
`Authorization: Bearer KEY`	`Authorization: KEY` (no Bearer prefix)
Using LeMUR API	Deprecated. Use LLM Gateway instead
Using `summarization` or `auto_chapters`	Deprecated. Use LLM Gateway instead (transcribe then summarize via LLM)
LeMUR `transcript_ids` with LLM Gateway	Pass transcript text in messages, not IDs
`anthropic/claude-...` model IDs	No provider prefix: `claude-sonnet-4-5-20250929` not `anthropic/claude-sonnet-4-5-20250929`
Using Java/Go/C# SDKs	Discontinued. Use Python, JS/TS, Ruby, or raw API
`word_boost` parameter	Use `keyterms_prompt` instead
Hardcoding v2 streaming URL	v3 ( `/v3/ws` ) is current; v2 still works but is legacy
Not using `speech_models`	Specify model priority list: `["universal-3-pro", "universal-2"]`

错误	修正方案
`Authorization: Bearer KEY`	`Authorization: KEY` （无需Bearer前缀）
使用LeMUR API	已废弃，请改用LLM Gateway
使用 `summarization` 或 `auto_chapters`	已废弃，请改用LLM Gateway（先转录再通过LLM生成摘要）
在LLM Gateway中使用LeMUR的 `transcript_ids`	直接在消息中传入转录文本，不要传ID
使用 `anthropic/claude-...` 格式的模型ID	不要加服务商前缀：使用 `claude-sonnet-4-5-20250929` 而非 `anthropic/claude-sonnet-4-5-20250929`
使用Java/Go/C# SDK	已停止维护，请使用Python、JS/TS、Ruby SDK或是直接调用原生API
使用 `word_boost` 参数	请改用 `keyterms_prompt`
硬编码v2版本的流式URL	当前最新版本是v3（ `/v3/ws` ）；v2仍可使用但属于legacy版本
未设置 `speech_models`	指定模型优先级列表： `["universal-3-pro", "universal-2"]`

Reference Files

参考文件

Read the relevant reference file based on what the user needs:

File	When to read
`references/python-sdk.md`	Python SDK patterns and examples
`references/js-sdk.md`	JavaScript/TypeScript SDK patterns
`references/streaming.md`	Real-time/streaming STT, v3 protocol, temp tokens, error codes
`references/voice-agents.md`	Voice agent integrations: LiveKit, Pipecat, turn detection, latency optimization
`references/llm-gateway.md`	Applying LLMs to transcripts, tool calling, available models
`references/speech-understanding.md`	Translation, speaker identification, custom formatting
`references/audio-intelligence.md`	PII redaction, diarization, summarization, sentiment, chapters
`references/api-reference.md`	Full parameter list, export endpoints, webhooks, upload, PII policies

根据用户需求阅读对应的参考文件：

文件路径	适用场景
`references/python-sdk.md`	Python SDK使用规范与示例
`references/js-sdk.md`	JavaScript/TypeScript SDK使用规范
`references/streaming.md`	实时/流式STT、v3协议、临时令牌、错误码
`references/voice-agents.md`	语音代理集成：LiveKit、Pipecat、话轮检测、延迟优化
`references/llm-gateway.md`	将LLM应用于转录内容、工具调用、可用模型
`references/speech-understanding.md`	翻译、说话人识别、自定义格式化
`references/audio-intelligence.md`	PII脱敏、说话人分箱、摘要、情感分析、章节拆分
`references/api-reference.md`	完整参数列表、导出端点、Webhook、上传、PII政策

API Spec Source of Truth

API规范权威来源

https://github.com/AssemblyAI/assemblyai-api-spec