pre-recorded-transcription

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Pre-Recorded Transcription

预录制内容转录

Gladia's pre-recorded API transcribes audio and video files asynchronously.

SDK-first: always use the official SDK — see sdk-integration for policy, setup, and fallback criteria.

Gladia的预录制API可异步转录音频和视频文件。

优先使用SDK：始终使用官方SDK — 有关策略、设置和回退标准，请参阅sdk-integration。

When to Use

适用场景

User has an existing audio or video file (local file, URL, YouTube/social video) to transcribe
Batch or async transcription workflows — processing recordings after they are captured
Need audio intelligence features: speaker diarization, PII redaction, subtitles (SRT/VTT), summarization, translation, NER, chapterization, audio-to-LLM
File-based uploads from disk, cloud storage, or user-submitted content

When NOT to use: If the user needs real-time / live transcription of a stream, microphone, or ongoing audio feed, use the live-transcription skill instead. Live transcription uses WebSocket sessions, not the pre-recorded API.

用户已有需要转录的音频或视频文件（本地文件、URL、YouTube/社交平台视频）
批量或异步转录工作流 — 在录制完成后处理音频
需要音频智能功能：说话人分离（speaker diarization）、PII脱敏、字幕（SRT/VTT）、摘要生成、翻译、命名实体识别（NER）、章节划分、音频转LLM
从磁盘、云存储或用户提交内容进行基于文件的上传

不适用场景：如果用户需要对流、麦克风或正在进行的音频源进行实时转录，请改用实时转录技能。实时转录使用WebSocket会话，而非预录制API。

References

参考资源

Consult these resources as needed:

./references/transcription-options.md -- Full transcription options with JS/Python code examples
./references/audio-intelligence.md -- Detailed configuration for all audio intelligence features
../sdk-integration/SKILL.md -- SDK setup, client initialization, error handling, retry/timeout config, and SDK vs raw API decision guide
../sdk-integration/references/sdk-versions.md -- Current SDK versions (auto-synced by CI)
../troubleshooting/SKILL.md -- Common errors, gotchas, and verification checklist

根据需要参考以下资源：

./references/transcription-options.md -- 完整转录选项及JS/Python代码示例
./references/audio-intelligence.md -- 所有音频智能功能的详细配置说明
../sdk-integration/SKILL.md -- SDK设置、客户端初始化、错误处理、重试/超时配置，以及SDK与原生API的决策指南
../sdk-integration/references/sdk-versions.md -- 当前SDK版本（由CI自动同步）
../troubleshooting/SKILL.md -- 常见错误、注意事项及验证清单

API Endpoints (reference — prefer SDK methods instead)

API端点（参考 — 优先使用SDK方法）

The SDK wraps all these endpoints. Use them directly only when falling back to raw REST.

Endpoint	Method	SDK equivalent
`/v2/upload`	POST	`transcribe()` auto-uploads local files
`/v2/pre-recorded`	POST	`create()` / `transcribe()`
`/v2/pre-recorded/:id`	GET	`get()` / `poll()` / `transcribe()`
`/v2/pre-recorded/:id`	DELETE	`delete()`
`/v2/pre-recorded/:id/audio`	GET	`getFile()`

SDK封装了所有这些端点。仅当退回到原生REST时才直接使用它们。

端点	请求方法	SDK等效方法
`/v2/upload`	POST	`transcribe()` 自动上传本地文件
`/v2/pre-recorded`	POST	`create()` / `transcribe()`
`/v2/pre-recorded/:id`	GET	`get()` / `poll()` / `transcribe()`
`/v2/pre-recorded/:id`	DELETE	`delete()`
`/v2/pre-recorded/:id/audio`	GET	`getFile()`

Workflow

工作流程

Recommended (SDK)

Fallback (raw REST — only when SDK is not feasible)

备选方案（原生REST — 仅当SDK不可用时）

Use this path only when the SDK cannot satisfy the requirement (e.g., custom HTTP client, language without an SDK, or explicit user request for raw calls).

Upload (if local file):
```
POST /v2/upload
```
with multipart form data → get
```
audio_url
```
Create job:
```
POST /v2/pre-recorded
```
with
```
audio_url
```
and config → get
```
id
```
Poll:
```
GET /v2/pre-recorded/:id
```
until
```
status: "done"
```
(or use webhooks/callbacks)
Parse results: Extract
```
transcription
```
,
```
diarization
```
,
```
translation
```
, etc. from response

仅当SDK无法满足需求时才使用此方式（例如：自定义HTTP客户端、无SDK支持的语言，或用户明确要求原生调用）。

上传（如果是本地文件）：使用多部分表单数据调用
```
POST /v2/upload
```
→ 获取
```
audio_url
```
创建任务：传入
```
audio_url
```
和配置调用
```
POST /v2/pre-recorded
```
→ 获取
```
id
```
轮询：调用
```
GET /v2/pre-recorded/:id
```
直到
```
status: "done"
```
（或使用webhook/回调）
解析结果：从响应中提取
```
transcription
```
、
```
diarization
```
、
```
translation
```
等内容

Transcription Options

转录选项

All options are passed as the second argument to

transcribe()

. Key options:

Option	Description
`language_config`	Expected languages, code switching
`diarization`	Speaker identification (pre-recorded only)
`translation`	Translate to target languages
`summarization`	Generate bullet points or paragraph summary
`subtitles`	Generate SRT/VTT files
`pii_redaction`	Redact PII (pre-recorded only)
`audio_to_llm`	Run custom LLM prompts on transcript
`callback_url`	Async webhook delivery

For the full options reference with JS/Python code examples, see ./references/transcription-options.md. For detailed audio intelligence feature configuration, see ./references/audio-intelligence.md. For client-level config (retry, timeouts), see sdk-integration.

所有选项都作为

transcribe()

的第二个参数传入。关键选项：

选项	描述
`language_config`	预期语言、代码切换
`diarization`	说话人识别（仅适用于预录制内容）
`translation`	翻译为目标语言
`summarization`	生成项目符号或段落式摘要
`subtitles`	生成SRT/VTT文件
`pii_redaction`	PII脱敏（仅适用于预录制内容）
`audio_to_llm`	在转录文本上运行自定义LLM提示
`callback_url`	异步webhook交付

有关带JS/Python代码示例的完整选项参考，请参阅./references/transcription-options.md。有关音频智能功能的详细配置，请参阅./references/audio-intelligence.md。有关客户端级配置（重试、超时），请参阅sdk-integration。

Response Structure

响应结构

json

{
  "id": "job-uuid",
  "status": "done",
  "result": {
    "transcription": {
      "full_transcript": "Hello, welcome to the meeting...",
      "utterances": [
        {
          "text": "Hello, welcome to the meeting",
          "language": "en",
          "start": 0.5,
          "end": 2.1,
          "speaker": 0,
          "words": [{ "word": "Hello", "start": 0.5, "end": 0.8, "confidence": 0.98 }]
        }
      ]
    },
    "diarization": { ... },
    "translation": { ... },
    "summarization": { ... },
    "sentiment_analysis": { ... }
  }
}

json

{
  "id": "job-uuid",
  "status": "done",
  "result": {
    "transcription": {
      "full_transcript": "Hello, welcome to the meeting...",
      "utterances": [
        {
          "text": "Hello, welcome to the meeting",
          "language": "en",
          "start": 0.5,
          "end": 2.1,
          "speaker": 0,
          "words": [{ "word": "Hello", "start": 0.5, "end": 0.8, "confidence": 0.98 }]
        }
      ]
    },
    "diarization": { ... },
    "translation": { ... },
    "summarization": { ... },
    "sentiment_analysis": { ... }
  }
}

Limits and Specifications

限制与规格

Constraint	Value
Max file size	1000 MB
Max duration	135 minutes (120 min for YouTube)
Enterprise max duration	4h15
Supported audio formats	AAC, AC3, FLAC, M4A, MP2, MP3, OGG, Opus, WAV
Supported video formats	MP4, MOV, AVI, FLV, WebM, Matroska, 3GP
Online platforms	YouTube, TikTok, Instagram, Facebook, Vimeo, LinkedIn
Concurrency (paid)	25 concurrent jobs
Concurrency (free)	3 concurrent jobs

约束条件	取值
最大文件大小	1000 MB
最长时长	135分钟（YouTube视频为120分钟）
企业版最长时长	4小时15分钟
支持的音频格式	AAC、AC3、FLAC、M4A、MP2、MP3、OGG、Opus、WAV
支持的视频格式	MP4、MOV、AVI、FLV、WebM、Matroska、3GP
支持的在线平台	YouTube、TikTok、Instagram、Facebook、Vimeo、LinkedIn
并发数（付费版）	25个并发任务
并发数（免费版）	3个并发任务

Polling Best Practices

轮询最佳实践

The SDK handles polling automatically —

transcribe()

polls until the job completes with configurable

interval

and

timeout

typescript

const result = await client.preRecorded().transcribe(audio, options, {
  interval: 5000, // Poll every 5s
  timeout: 600000, // Timeout after 10 minutes
});

If using raw REST instead of the SDK:

Use webhooks or callbacks instead of polling when possible
If polling, implement exponential backoff (start at 3s, max 30s)

SDK会自动处理轮询 —

transcribe()

会按照可配置的

interval

和

timeout

轮询直到任务完成：

typescript

const result = await client.preRecorded().transcribe(audio, options, {
  interval: 5000, // 每5秒轮询一次
  timeout: 600000, // 10分钟后超时
});

如果不使用SDK而改用原生REST：

尽可能使用webhook或回调而非轮询
如果必须轮询，实现指数退避（初始3秒，最大30秒）

Webhooks and Callbacks

Webhook与回调

Callback (sent to

callback_url

in request body):

```
transcription.success
```
— job completed successfully
```
transcription.error
```
— job failed

Webhook (configured in dashboard → Account → Webhooks):

```
transcription.created
```
— job queued
```
transcription.success
```
— job done
```
transcription.error
```
— job failed

Webhooks are powered by Svix with signed requests for verification.

回调（发送至请求体中的

callback_url

）：

```
transcription.success
```
— 任务成功完成
```
transcription.error
```
— 任务失败

Webhook（在控制台 → 账户 → Webhooks中配置）：

```
transcription.created
```
— 任务已排队
```
transcription.success
```
— 任务完成
```
transcription.error
```
— 任务失败

Webhook由Svix提供支持，请求经过签名以便验证。

Common Mistakes

常见错误

Code switching without language list: enabling
```
code_switching: true
```
with empty
```
languages
```
triggers 100+ language evaluation. Always provide 3-5 expected languages.
Exceeding duration limits: files over 135 minutes may fail silently. Split into ~60 min chunks.
Custom vocabulary intensity too high: values above 0.6 cause false positives. Keep at 0.4-0.6.
Polling without backoff: rapid polling wastes requests and may trigger 429s. The SDK handles this; for raw REST, use webhooks or exponential backoff.
Expecting live-only features: diarization, PII redaction, and subtitles are pre-recorded only — not available in live mode.

For the full list of gotchas and diagnostics, see the troubleshooting skill.

未指定语言列表却开启代码切换：启用
```
code_switching: true
```
但
```
languages
```
为空会触发100多种语言的评估。请始终提供3-5种预期语言。
超过时长限制：超过135分钟的文件可能会静默失败。请将文件分割为约60分钟的片段。
自定义词汇强度过高：值超过0.6会导致误报。请保持在0.4-0.6之间。
未使用退避策略进行轮询：频繁轮询会浪费请求，可能触发429错误。SDK会处理此问题；如果使用原生REST，请使用webhook或指数退避。
期望仅实时模式支持的功能：说话人分离、PII脱敏和字幕仅适用于预录制模式 — 实时模式不支持这些功能。

有关完整的注意事项和诊断信息，请参阅故障排查技能。