pre-recorded-transcription
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePre-Recorded Transcription
预录制内容转录
Gladia's pre-recorded API transcribes audio and video files asynchronously.
SDK-first: always use the official SDK — see sdk-integration for policy, setup, and fallback criteria.
Gladia的预录制API可异步转录音频和视频文件。
优先使用SDK:始终使用官方SDK — 有关策略、设置和回退标准,请参阅sdk-integration。
When to Use
适用场景
- User has an existing audio or video file (local file, URL, YouTube/social video) to transcribe
- Batch or async transcription workflows — processing recordings after they are captured
- Need audio intelligence features: speaker diarization, PII redaction, subtitles (SRT/VTT), summarization, translation, NER, chapterization, audio-to-LLM
- File-based uploads from disk, cloud storage, or user-submitted content
When NOT to use: If the user needs real-time / live transcription of a stream, microphone, or ongoing audio feed, use the live-transcription skill instead. Live transcription uses WebSocket sessions, not the pre-recorded API.
- 用户已有需要转录的音频或视频文件(本地文件、URL、YouTube/社交平台视频)
- 批量或异步转录工作流 — 在录制完成后处理音频
- 需要音频智能功能:说话人分离(speaker diarization)、PII脱敏、字幕(SRT/VTT)、摘要生成、翻译、命名实体识别(NER)、章节划分、音频转LLM
- 从磁盘、云存储或用户提交内容进行基于文件的上传
不适用场景:如果用户需要对流、麦克风或正在进行的音频源进行实时转录,请改用实时转录技能。实时转录使用WebSocket会话,而非预录制API。
References
参考资源
Consult these resources as needed:
- ./references/transcription-options.md -- Full transcription options with JS/Python code examples
- ./references/audio-intelligence.md -- Detailed configuration for all audio intelligence features
- ../sdk-integration/SKILL.md -- SDK setup, client initialization, error handling, retry/timeout config, and SDK vs raw API decision guide
- ../sdk-integration/references/sdk-versions.md -- Current SDK versions (auto-synced by CI)
- ../troubleshooting/SKILL.md -- Common errors, gotchas, and verification checklist
根据需要参考以下资源:
- ./references/transcription-options.md -- 完整转录选项及JS/Python代码示例
- ./references/audio-intelligence.md -- 所有音频智能功能的详细配置说明
- ../sdk-integration/SKILL.md -- SDK设置、客户端初始化、错误处理、重试/超时配置,以及SDK与原生API的决策指南
- ../sdk-integration/references/sdk-versions.md -- 当前SDK版本(由CI自动同步)
- ../troubleshooting/SKILL.md -- 常见错误、注意事项及验证清单
API Endpoints (reference — prefer SDK methods instead)
API端点(参考 — 优先使用SDK方法)
The SDK wraps all these endpoints. Use them directly only when falling back to raw REST.
| Endpoint | Method | SDK equivalent |
|---|---|---|
| POST | |
| POST | |
| GET | |
| DELETE | |
| GET | |
SDK封装了所有这些端点。仅当退回到原生REST时才直接使用它们。
| 端点 | 请求方法 | SDK等效方法 |
|---|---|---|
| POST | |
| POST | |
| GET | |
| DELETE | |
| GET | |
Workflow
工作流程
Recommended (SDK)
推荐方案(SDK)
The SDK method handles upload, job creation, and polling in one call. This is the default approach — use it unless you have a specific reason not to. For SDK installation and client initialization, see the sdk-integration skill.
transcribe()typescript
const result = await client.preRecorded().transcribe("./audio.mp3", {
language_config: { languages: ["en"] },
diarization: true,
});
console.log(result.result?.transcription?.full_transcript);python
result = client.prerecorded().transcribe(
"audio.mp3",
{"language_config": {"languages": ["en"]}, "diarization": True},
)
print(result.result.transcription.full_transcript)Audio input can be a local file path, HTTP(S) URL, YouTube/social video URL, or binary file object. For the full input types table, see the sdk-integration skill. YouTube and social video URLs are passed as ; Gladia extracts audio server-side.
audio_urlSDK的方法可一站式处理上传、任务创建和轮询。**这是默认方法 — 除非有特定原因否则请使用此方法。**有关SDK安装和客户端初始化,请参阅sdk-integration技能。
transcribe()typescript
const result = await client.preRecorded().transcribe("./audio.mp3", {
language_config: { languages: ["en"] },
diarization: true,
});
console.log(result.result?.transcription?.full_transcript);python
result = client.prerecorded().transcribe(
"audio.mp3",
{"language_config": {"languages": ["en"]}, "diarization": True},
)
print(result.result.transcription.full_transcript)音频输入可以是本地文件路径、HTTP(S) URL、YouTube/社交平台视频URL或二进制文件对象。有关完整的输入类型表,请参阅sdk-integration技能。YouTube和社交平台视频URL需作为传入;Gladia会在服务器端提取音频。
audio_urlFallback (raw REST — only when SDK is not feasible)
备选方案(原生REST — 仅当SDK不可用时)
Use this path only when the SDK cannot satisfy the requirement (e.g., custom HTTP client, language without an SDK, or explicit user request for raw calls).
- Upload (if local file): with multipart form data → get
POST /v2/uploadaudio_url - Create job: with
POST /v2/pre-recordedand config → getaudio_urlid - Poll: until
GET /v2/pre-recorded/:id(or use webhooks/callbacks)status: "done" - Parse results: Extract ,
transcription,diarization, etc. from responsetranslation
仅当SDK无法满足需求时才使用此方式(例如:自定义HTTP客户端、无SDK支持的语言,或用户明确要求原生调用)。
- 上传(如果是本地文件):使用多部分表单数据调用→ 获取
POST /v2/uploadaudio_url - 创建任务:传入和配置调用
audio_url→ 获取POST /v2/pre-recordedid - 轮询:调用直到
GET /v2/pre-recorded/:id(或使用webhook/回调)status: "done" - 解析结果:从响应中提取、
transcription、diarization等内容translation
Transcription Options
转录选项
All options are passed as the second argument to . Key options:
transcribe()| Option | Description |
|---|---|
| Expected languages, code switching |
| Speaker identification (pre-recorded only) |
| Translate to target languages |
| Generate bullet points or paragraph summary |
| Generate SRT/VTT files |
| Redact PII (pre-recorded only) |
| Run custom LLM prompts on transcript |
| Async webhook delivery |
For the full options reference with JS/Python code examples, see ./references/transcription-options.md. For detailed audio intelligence feature configuration, see ./references/audio-intelligence.md. For client-level config (retry, timeouts), see sdk-integration.
所有选项都作为的第二个参数传入。关键选项:
transcribe()| 选项 | 描述 |
|---|---|
| 预期语言、代码切换 |
| 说话人识别(仅适用于预录制内容) |
| 翻译为目标语言 |
| 生成项目符号或段落式摘要 |
| 生成SRT/VTT文件 |
| PII脱敏(仅适用于预录制内容) |
| 在转录文本上运行自定义LLM提示 |
| 异步webhook交付 |
有关带JS/Python代码示例的完整选项参考,请参阅./references/transcription-options.md。有关音频智能功能的详细配置,请参阅./references/audio-intelligence.md。有关客户端级配置(重试、超时),请参阅sdk-integration。
Response Structure
响应结构
json
{
"id": "job-uuid",
"status": "done",
"result": {
"transcription": {
"full_transcript": "Hello, welcome to the meeting...",
"utterances": [
{
"text": "Hello, welcome to the meeting",
"language": "en",
"start": 0.5,
"end": 2.1,
"speaker": 0,
"words": [{ "word": "Hello", "start": 0.5, "end": 0.8, "confidence": 0.98 }]
}
]
},
"diarization": { ... },
"translation": { ... },
"summarization": { ... },
"sentiment_analysis": { ... }
}
}json
{
"id": "job-uuid",
"status": "done",
"result": {
"transcription": {
"full_transcript": "Hello, welcome to the meeting...",
"utterances": [
{
"text": "Hello, welcome to the meeting",
"language": "en",
"start": 0.5,
"end": 2.1,
"speaker": 0,
"words": [{ "word": "Hello", "start": 0.5, "end": 0.8, "confidence": 0.98 }]
}
]
},
"diarization": { ... },
"translation": { ... },
"summarization": { ... },
"sentiment_analysis": { ... }
}
}Limits and Specifications
限制与规格
| Constraint | Value |
|---|---|
| Max file size | 1000 MB |
| Max duration | 135 minutes (120 min for YouTube) |
| Enterprise max duration | 4h15 |
| Supported audio formats | AAC, AC3, FLAC, M4A, MP2, MP3, OGG, Opus, WAV |
| Supported video formats | MP4, MOV, AVI, FLV, WebM, Matroska, 3GP |
| Online platforms | YouTube, TikTok, Instagram, Facebook, Vimeo, LinkedIn |
| Concurrency (paid) | 25 concurrent jobs |
| Concurrency (free) | 3 concurrent jobs |
| 约束条件 | 取值 |
|---|---|
| 最大文件大小 | 1000 MB |
| 最长时长 | 135分钟(YouTube视频为120分钟) |
| 企业版最长时长 | 4小时15分钟 |
| 支持的音频格式 | AAC、AC3、FLAC、M4A、MP2、MP3、OGG、Opus、WAV |
| 支持的视频格式 | MP4、MOV、AVI、FLV、WebM、Matroska、3GP |
| 支持的在线平台 | YouTube、TikTok、Instagram、Facebook、Vimeo、LinkedIn |
| 并发数(付费版) | 25个并发任务 |
| 并发数(免费版) | 3个并发任务 |
Polling Best Practices
轮询最佳实践
The SDK handles polling automatically — polls until the job completes with configurable and :
transcribe()intervaltimeouttypescript
const result = await client.preRecorded().transcribe(audio, options, {
interval: 5000, // Poll every 5s
timeout: 600000, // Timeout after 10 minutes
});If using raw REST instead of the SDK:
- Use webhooks or callbacks instead of polling when possible
- If polling, implement exponential backoff (start at 3s, max 30s)
SDK会自动处理轮询 — 会按照可配置的和轮询直到任务完成:
transcribe()intervaltimeouttypescript
const result = await client.preRecorded().transcribe(audio, options, {
interval: 5000, // 每5秒轮询一次
timeout: 600000, // 10分钟后超时
});如果不使用SDK而改用原生REST:
- 尽可能使用webhook或回调而非轮询
- 如果必须轮询,实现指数退避(初始3秒,最大30秒)
Webhooks and Callbacks
Webhook与回调
Callback (sent to in request body):
callback_url- — job completed successfully
transcription.success - — job failed
transcription.error
Webhook (configured in dashboard → Account → Webhooks):
- — job queued
transcription.created - — job done
transcription.success - — job failed
transcription.error
Webhooks are powered by Svix with signed requests for verification.
回调(发送至请求体中的):
callback_url- — 任务成功完成
transcription.success - — 任务失败
transcription.error
Webhook(在控制台 → 账户 → Webhooks中配置):
- — 任务已排队
transcription.created - — 任务完成
transcription.success - — 任务失败
transcription.error
Webhook由Svix提供支持,请求经过签名以便验证。
Common Mistakes
常见错误
- Code switching without language list: enabling with empty
code_switching: truetriggers 100+ language evaluation. Always provide 3-5 expected languages.languages - Exceeding duration limits: files over 135 minutes may fail silently. Split into ~60 min chunks.
- Custom vocabulary intensity too high: values above 0.6 cause false positives. Keep at 0.4-0.6.
- Polling without backoff: rapid polling wastes requests and may trigger 429s. The SDK handles this; for raw REST, use webhooks or exponential backoff.
- Expecting live-only features: diarization, PII redaction, and subtitles are pre-recorded only — not available in live mode.
For the full list of gotchas and diagnostics, see the troubleshooting skill.
- 未指定语言列表却开启代码切换:启用但
code_switching: true为空会触发100多种语言的评估。请始终提供3-5种预期语言。languages - 超过时长限制:超过135分钟的文件可能会静默失败。请将文件分割为约60分钟的片段。
- 自定义词汇强度过高:值超过0.6会导致误报。请保持在0.4-0.6之间。
- 未使用退避策略进行轮询:频繁轮询会浪费请求,可能触发429错误。SDK会处理此问题;如果使用原生REST,请使用webhook或指数退避。
- 期望仅实时模式支持的功能:说话人分离、PII脱敏和字幕仅适用于预录制模式 — 实时模式不支持这些功能。
有关完整的注意事项和诊断信息,请参阅故障排查技能。