Pre-Recorded Transcription
Gladia's pre-recorded API transcribes audio and video files asynchronously.
SDK-first: always use the official SDK — see sdk-integration for policy, setup, and fallback criteria.
When to Use
- User has an existing audio or video file (local file, URL, YouTube/social video) to transcribe
- Batch or async transcription workflows — processing recordings after they are captured
- Need audio intelligence features: speaker diarization, PII redaction, subtitles (SRT/VTT), summarization, translation, NER, chapterization, audio-to-LLM
- File-based uploads from disk, cloud storage, or user-submitted content
When NOT to use: If the user needs real-time / live transcription of a stream, microphone, or ongoing audio feed, use the live-transcription skill instead. Live transcription uses WebSocket sessions, not the pre-recorded API.
References
Consult these resources as needed:
- ./references/transcription-options.md -- Full transcription options with JS/Python code examples
- ./references/audio-intelligence.md -- Detailed configuration for all audio intelligence features
- ../sdk-integration/SKILL.md -- SDK setup, client initialization, error handling, retry/timeout config, and SDK vs raw API decision guide
- ../sdk-integration/references/sdk-versions.md -- Current SDK versions (auto-synced by CI)
- ../troubleshooting/SKILL.md -- Common errors, gotchas, and verification checklist
API Endpoints (reference — prefer SDK methods instead)
The SDK wraps all these endpoints. Use them directly only when falling back to raw REST.
| Endpoint | Method | SDK equivalent |
|---|
| POST | auto-uploads local files |
| POST | / |
| GET | / / |
| DELETE | |
/v2/pre-recorded/:id/audio
| GET | |
Workflow
Recommended (SDK)
The SDK
method handles upload, job creation, and polling in one call.
This is the default approach — use it unless you have a specific reason not to. For SDK installation and client initialization, see the
sdk-integration skill.
typescript
const result = await client.preRecorded().transcribe("./audio.mp3", {
language_config: { languages: ["en"] },
diarization: true,
});
console.log(result.result?.transcription?.full_transcript);
python
result = client.prerecorded().transcribe(
"audio.mp3",
{"language_config": {"languages": ["en"]}, "diarization": True},
)
print(result.result.transcription.full_transcript)
Audio input can be a local file path, HTTP(S) URL, YouTube/social video URL, or binary file object. For the full input types table, see the
sdk-integration skill. YouTube and social video URLs are passed as
; Gladia extracts audio server-side.
Fallback (raw REST — only when SDK is not feasible)
Use this path only when the SDK cannot satisfy the requirement (e.g., custom HTTP client, language without an SDK, or explicit user request for raw calls).
- Upload (if local file): with multipart form data → get
- Create job: with and config → get
- Poll: until (or use webhooks/callbacks)
- Parse results: Extract , , , etc. from response
Transcription Options
All options are passed as the second argument to
. Key options:
| Option | Description |
|---|
| Expected languages, code switching |
| Speaker identification (pre-recorded only) |
| Translate to target languages |
| Generate bullet points or paragraph summary |
| Generate SRT/VTT files |
| Redact PII (pre-recorded only) |
| Run custom LLM prompts on transcript |
| Async webhook delivery |
For the full options reference with JS/Python code examples, see ./references/transcription-options.md. For detailed audio intelligence feature configuration, see ./references/audio-intelligence.md. For client-level config (retry, timeouts), see sdk-integration.
Response Structure
json
{
"id": "job-uuid",
"status": "done",
"result": {
"transcription": {
"full_transcript": "Hello, welcome to the meeting...",
"utterances": [
{
"text": "Hello, welcome to the meeting",
"language": "en",
"start": 0.5,
"end": 2.1,
"speaker": 0,
"words": [{ "word": "Hello", "start": 0.5, "end": 0.8, "confidence": 0.98 }]
}
]
},
"diarization": { ... },
"translation": { ... },
"summarization": { ... },
"sentiment_analysis": { ... }
}
}
Limits and Specifications
| Constraint | Value |
|---|
| Max file size | 1000 MB |
| Max duration | 135 minutes (120 min for YouTube) |
| Enterprise max duration | 4h15 |
| Supported audio formats | AAC, AC3, FLAC, M4A, MP2, MP3, OGG, Opus, WAV |
| Supported video formats | MP4, MOV, AVI, FLV, WebM, Matroska, 3GP |
| Online platforms | YouTube, TikTok, Instagram, Facebook, Vimeo, LinkedIn |
| Concurrency (paid) | 25 concurrent jobs |
| Concurrency (free) | 3 concurrent jobs |
Polling Best Practices
The SDK handles polling automatically —
polls until the job completes with configurable
and
:
typescript
const result = await client.preRecorded().transcribe(audio, options, {
interval: 5000, // Poll every 5s
timeout: 600000, // Timeout after 10 minutes
});
If using raw REST instead of the SDK:
- Use webhooks or callbacks instead of polling when possible
- If polling, implement exponential backoff (start at 3s, max 30s)
Webhooks and Callbacks
Callback (sent to
in request body):
- — job completed successfully
- — job failed
Webhook (configured in dashboard → Account → Webhooks):
- — job queued
- — job done
- — job failed
Webhooks are powered by Svix with signed requests for verification.
Common Mistakes
- Code switching without language list: enabling with empty triggers 100+ language evaluation. Always provide 3-5 expected languages.
- Exceeding duration limits: files over 135 minutes may fail silently. Split into ~60 min chunks.
- Custom vocabulary intensity too high: values above 0.6 cause false positives. Keep at 0.4-0.6.
- Polling without backoff: rapid polling wastes requests and may trigger 429s. The SDK handles this; for raw REST, use webhooks or exponential backoff.
- Expecting live-only features: diarization, PII redaction, and subtitles are pre-recorded only — not available in live mode.
For the full list of gotchas and diagnostics, see the troubleshooting skill.
Further Reading