pre-recorded-transcription

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Pre-Recorded Transcription

预录制内容转录

Gladia's pre-recorded API transcribes audio and video files asynchronously.
SDK-first: always use the official SDK — see sdk-integration for policy, setup, and fallback criteria.
Gladia的预录制API可异步转录音频和视频文件。
优先使用SDK:始终使用官方SDK — 有关策略、设置和回退标准,请参阅sdk-integration

When to Use

适用场景

  • User has an existing audio or video file (local file, URL, YouTube/social video) to transcribe
  • Batch or async transcription workflows — processing recordings after they are captured
  • Need audio intelligence features: speaker diarization, PII redaction, subtitles (SRT/VTT), summarization, translation, NER, chapterization, audio-to-LLM
  • File-based uploads from disk, cloud storage, or user-submitted content
When NOT to use: If the user needs real-time / live transcription of a stream, microphone, or ongoing audio feed, use the live-transcription skill instead. Live transcription uses WebSocket sessions, not the pre-recorded API.
  • 用户已有需要转录的音频或视频文件(本地文件、URL、YouTube/社交平台视频)
  • 批量或异步转录工作流 — 在录制完成后处理音频
  • 需要音频智能功能:说话人分离(speaker diarization)、PII脱敏、字幕(SRT/VTT)、摘要生成、翻译、命名实体识别(NER)、章节划分、音频转LLM
  • 从磁盘、云存储或用户提交内容进行基于文件的上传
不适用场景:如果用户需要对流、麦克风或正在进行的音频源进行实时转录,请改用实时转录技能。实时转录使用WebSocket会话,而非预录制API。

References

参考资源

Consult these resources as needed:
  • ./references/transcription-options.md -- Full transcription options with JS/Python code examples
  • ./references/audio-intelligence.md -- Detailed configuration for all audio intelligence features
  • ../sdk-integration/SKILL.md -- SDK setup, client initialization, error handling, retry/timeout config, and SDK vs raw API decision guide
  • ../sdk-integration/references/sdk-versions.md -- Current SDK versions (auto-synced by CI)
  • ../troubleshooting/SKILL.md -- Common errors, gotchas, and verification checklist
根据需要参考以下资源:
  • ./references/transcription-options.md -- 完整转录选项及JS/Python代码示例
  • ./references/audio-intelligence.md -- 所有音频智能功能的详细配置说明
  • ../sdk-integration/SKILL.md -- SDK设置、客户端初始化、错误处理、重试/超时配置,以及SDK与原生API的决策指南
  • ../sdk-integration/references/sdk-versions.md -- 当前SDK版本(由CI自动同步)
  • ../troubleshooting/SKILL.md -- 常见错误、注意事项及验证清单

API Endpoints (reference — prefer SDK methods instead)

API端点(参考 — 优先使用SDK方法)

The SDK wraps all these endpoints. Use them directly only when falling back to raw REST.
EndpointMethodSDK equivalent
/v2/upload
POST
transcribe()
auto-uploads local files
/v2/pre-recorded
POST
create()
/
transcribe()
/v2/pre-recorded/:id
GET
get()
/
poll()
/
transcribe()
/v2/pre-recorded/:id
DELETE
delete()
/v2/pre-recorded/:id/audio
GET
getFile()
SDK封装了所有这些端点。仅当退回到原生REST时才直接使用它们。
端点请求方法SDK等效方法
/v2/upload
POST
transcribe()
自动上传本地文件
/v2/pre-recorded
POST
create()
/
transcribe()
/v2/pre-recorded/:id
GET
get()
/
poll()
/
transcribe()
/v2/pre-recorded/:id
DELETE
delete()
/v2/pre-recorded/:id/audio
GET
getFile()

Workflow

工作流程

Recommended (SDK)

推荐方案(SDK)

The SDK
transcribe()
method handles upload, job creation, and polling in one call. This is the default approach — use it unless you have a specific reason not to. For SDK installation and client initialization, see the sdk-integration skill.
typescript
const result = await client.preRecorded().transcribe("./audio.mp3", {
  language_config: { languages: ["en"] },
  diarization: true,
});

console.log(result.result?.transcription?.full_transcript);
python
result = client.prerecorded().transcribe(
    "audio.mp3",
    {"language_config": {"languages": ["en"]}, "diarization": True},
)

print(result.result.transcription.full_transcript)
Audio input can be a local file path, HTTP(S) URL, YouTube/social video URL, or binary file object. For the full input types table, see the sdk-integration skill. YouTube and social video URLs are passed as
audio_url
; Gladia extracts audio server-side.
SDK的
transcribe()
方法可一站式处理上传、任务创建和轮询。**这是默认方法 — 除非有特定原因否则请使用此方法。**有关SDK安装和客户端初始化,请参阅sdk-integration技能
typescript
const result = await client.preRecorded().transcribe("./audio.mp3", {
  language_config: { languages: ["en"] },
  diarization: true,
});

console.log(result.result?.transcription?.full_transcript);
python
result = client.prerecorded().transcribe(
    "audio.mp3",
    {"language_config": {"languages": ["en"]}, "diarization": True},
)

print(result.result.transcription.full_transcript)
音频输入可以是本地文件路径、HTTP(S) URL、YouTube/社交平台视频URL或二进制文件对象。有关完整的输入类型表,请参阅sdk-integration技能。YouTube和社交平台视频URL需作为
audio_url
传入;Gladia会在服务器端提取音频。

Fallback (raw REST — only when SDK is not feasible)

备选方案(原生REST — 仅当SDK不可用时)

Use this path only when the SDK cannot satisfy the requirement (e.g., custom HTTP client, language without an SDK, or explicit user request for raw calls).
  1. Upload (if local file):
    POST /v2/upload
    with multipart form data → get
    audio_url
  2. Create job:
    POST /v2/pre-recorded
    with
    audio_url
    and config → get
    id
  3. Poll:
    GET /v2/pre-recorded/:id
    until
    status: "done"
    (or use webhooks/callbacks)
  4. Parse results: Extract
    transcription
    ,
    diarization
    ,
    translation
    , etc. from response
仅当SDK无法满足需求时才使用此方式(例如:自定义HTTP客户端、无SDK支持的语言,或用户明确要求原生调用)。
  1. 上传(如果是本地文件):使用多部分表单数据调用
    POST /v2/upload
    → 获取
    audio_url
  2. 创建任务:传入
    audio_url
    和配置调用
    POST /v2/pre-recorded
    → 获取
    id
  3. 轮询:调用
    GET /v2/pre-recorded/:id
    直到
    status: "done"
    (或使用webhook/回调)
  4. 解析结果:从响应中提取
    transcription
    diarization
    translation
    等内容

Transcription Options

转录选项

All options are passed as the second argument to
transcribe()
. Key options:
OptionDescription
language_config
Expected languages, code switching
diarization
Speaker identification (pre-recorded only)
translation
Translate to target languages
summarization
Generate bullet points or paragraph summary
subtitles
Generate SRT/VTT files
pii_redaction
Redact PII (pre-recorded only)
audio_to_llm
Run custom LLM prompts on transcript
callback_url
Async webhook delivery
For the full options reference with JS/Python code examples, see ./references/transcription-options.md. For detailed audio intelligence feature configuration, see ./references/audio-intelligence.md. For client-level config (retry, timeouts), see sdk-integration.
所有选项都作为
transcribe()
的第二个参数传入。关键选项:
选项描述
language_config
预期语言、代码切换
diarization
说话人识别(仅适用于预录制内容)
translation
翻译为目标语言
summarization
生成项目符号或段落式摘要
subtitles
生成SRT/VTT文件
pii_redaction
PII脱敏(仅适用于预录制内容)
audio_to_llm
在转录文本上运行自定义LLM提示
callback_url
异步webhook交付
有关带JS/Python代码示例的完整选项参考,请参阅./references/transcription-options.md。有关音频智能功能的详细配置,请参阅./references/audio-intelligence.md。有关客户端级配置(重试、超时),请参阅sdk-integration

Response Structure

响应结构

json
{
  "id": "job-uuid",
  "status": "done",
  "result": {
    "transcription": {
      "full_transcript": "Hello, welcome to the meeting...",
      "utterances": [
        {
          "text": "Hello, welcome to the meeting",
          "language": "en",
          "start": 0.5,
          "end": 2.1,
          "speaker": 0,
          "words": [{ "word": "Hello", "start": 0.5, "end": 0.8, "confidence": 0.98 }]
        }
      ]
    },
    "diarization": { ... },
    "translation": { ... },
    "summarization": { ... },
    "sentiment_analysis": { ... }
  }
}
json
{
  "id": "job-uuid",
  "status": "done",
  "result": {
    "transcription": {
      "full_transcript": "Hello, welcome to the meeting...",
      "utterances": [
        {
          "text": "Hello, welcome to the meeting",
          "language": "en",
          "start": 0.5,
          "end": 2.1,
          "speaker": 0,
          "words": [{ "word": "Hello", "start": 0.5, "end": 0.8, "confidence": 0.98 }]
        }
      ]
    },
    "diarization": { ... },
    "translation": { ... },
    "summarization": { ... },
    "sentiment_analysis": { ... }
  }
}

Limits and Specifications

限制与规格

ConstraintValue
Max file size1000 MB
Max duration135 minutes (120 min for YouTube)
Enterprise max duration4h15
Supported audio formatsAAC, AC3, FLAC, M4A, MP2, MP3, OGG, Opus, WAV
Supported video formatsMP4, MOV, AVI, FLV, WebM, Matroska, 3GP
Online platformsYouTube, TikTok, Instagram, Facebook, Vimeo, LinkedIn
Concurrency (paid)25 concurrent jobs
Concurrency (free)3 concurrent jobs
约束条件取值
最大文件大小1000 MB
最长时长135分钟(YouTube视频为120分钟)
企业版最长时长4小时15分钟
支持的音频格式AAC、AC3、FLAC、M4A、MP2、MP3、OGG、Opus、WAV
支持的视频格式MP4、MOV、AVI、FLV、WebM、Matroska、3GP
支持的在线平台YouTube、TikTok、Instagram、Facebook、Vimeo、LinkedIn
并发数(付费版)25个并发任务
并发数(免费版)3个并发任务

Polling Best Practices

轮询最佳实践

The SDK handles polling automatically —
transcribe()
polls until the job completes with configurable
interval
and
timeout
:
typescript
const result = await client.preRecorded().transcribe(audio, options, {
  interval: 5000, // Poll every 5s
  timeout: 600000, // Timeout after 10 minutes
});
If using raw REST instead of the SDK:
  • Use webhooks or callbacks instead of polling when possible
  • If polling, implement exponential backoff (start at 3s, max 30s)
SDK会自动处理轮询 —
transcribe()
会按照可配置的
interval
timeout
轮询直到任务完成:
typescript
const result = await client.preRecorded().transcribe(audio, options, {
  interval: 5000, // 每5秒轮询一次
  timeout: 600000, // 10分钟后超时
});
如果不使用SDK而改用原生REST:
  • 尽可能使用webhook或回调而非轮询
  • 如果必须轮询,实现指数退避(初始3秒,最大30秒)

Webhooks and Callbacks

Webhook与回调

Callback (sent to
callback_url
in request body):
  • transcription.success
    — job completed successfully
  • transcription.error
    — job failed
Webhook (configured in dashboard → Account → Webhooks):
  • transcription.created
    — job queued
  • transcription.success
    — job done
  • transcription.error
    — job failed
Webhooks are powered by Svix with signed requests for verification.
回调(发送至请求体中的
callback_url
):
  • transcription.success
    — 任务成功完成
  • transcription.error
    — 任务失败
Webhook(在控制台 → 账户 → Webhooks中配置):
  • transcription.created
    — 任务已排队
  • transcription.success
    — 任务完成
  • transcription.error
    — 任务失败
Webhook由Svix提供支持,请求经过签名以便验证。

Common Mistakes

常见错误

  • Code switching without language list: enabling
    code_switching: true
    with empty
    languages
    triggers 100+ language evaluation. Always provide 3-5 expected languages.
  • Exceeding duration limits: files over 135 minutes may fail silently. Split into ~60 min chunks.
  • Custom vocabulary intensity too high: values above 0.6 cause false positives. Keep at 0.4-0.6.
  • Polling without backoff: rapid polling wastes requests and may trigger 429s. The SDK handles this; for raw REST, use webhooks or exponential backoff.
  • Expecting live-only features: diarization, PII redaction, and subtitles are pre-recorded only — not available in live mode.
For the full list of gotchas and diagnostics, see the troubleshooting skill.
  • 未指定语言列表却开启代码切换:启用
    code_switching: true
    languages
    为空会触发100多种语言的评估。请始终提供3-5种预期语言。
  • 超过时长限制:超过135分钟的文件可能会静默失败。请将文件分割为约60分钟的片段。
  • 自定义词汇强度过高:值超过0.6会导致误报。请保持在0.4-0.6之间。
  • 未使用退避策略进行轮询:频繁轮询会浪费请求,可能触发429错误。SDK会处理此问题;如果使用原生REST,请使用webhook或指数退避。
  • 期望仅实时模式支持的功能:说话人分离、PII脱敏和字幕仅适用于预录制模式 — 实时模式不支持这些功能。
有关完整的注意事项和诊断信息,请参阅故障排查技能

Further Reading

扩展阅读