deepgram-transcription
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDeepgram Transcription
Deepgram 音视频转录
Overview
概述
This skill enables efficient transcription of audio and video files using the Deepgram API. It automatically handles large video files by extracting audio first, reducing upload time and API costs. Outputs include both full JSON responses with timestamps and clean text transcripts.
此技能可借助Deepgram API高效转录音频和视频文件。它会自动处理大型视频文件,先提取音频,从而减少上传时间和API成本。输出内容包括带时间戳的完整JSON响应和纯文本转录稿。
When to Use This Skill
使用场景
Use this skill when:
- Transcribing audio files (mp3, wav, m4a, aac, etc.)
- Transcribing video files (mp4, mov, avi, mkv, etc.)
- Converting speech in media files to text
- Creating transcripts with or without timestamps
- Processing multiple recordings for documentation
在以下场景中使用此技能:
- 转录音频文件(mp3、wav、m4a、aac等)
- 转录视频文件(mp4、mov、avi、mkv等)
- 将媒体文件中的语音转换为文本
- 创建带或不带时间戳的转录稿
- 处理多个录音用于文档记录
Core Workflow
核心工作流程
1. Determine Input Type
1. 确定输入类型
First, identify the input file type:
Audio files (mp3, wav, m4a, aac):
- Can be transcribed directly
- Smaller file sizes, faster uploads
Video files (mp4, mov, avi, etc.):
- Should extract audio first for files >50MB
- Reduces upload time significantly (e.g., 190MB video → 3MB audio)
- No quality loss for transcription purposes
首先,识别输入文件类型:
音频文件(mp3、wav、m4a、aac):
- 可直接转录
- 文件尺寸更小,上传速度更快
视频文件(mp4、mov、avi等):
- 对于大于50MB的文件,应先提取音频
- 显著减少上传时间(例如:190MB视频 → 3MB音频)
- 转录质量无损失
2. Extract Audio (For Video Files)
2. 提取音频(针对视频文件)
For video files, especially those larger than 50MB, extract audio before transcription:
bash
ffmpeg -i input_video.mp4 -vn -acodec aac -b:a 128k output_audio.m4a -yParameters:
- : No video (audio only)
-vn - : AAC audio codec
-acodec aac - : 128kbps bitrate (good quality, small size)
-b:a 128k - : Overwrite output file
-y
This reduces file size by ~98% while preserving speech quality.
对于视频文件,尤其是大于50MB的文件,转录前需先提取音频:
bash
ffmpeg -i input_video.mp4 -vn -acodec aac -b:a 128k output_audio.m4a -y参数说明:
- :不包含视频(仅提取音频)
-vn - :采用AAC音频编码
-acodec aac - :128kbps比特率(音质良好,文件小巧)
-b:a 128k - :覆盖输出文件
-y
此操作可在保留语音质量的同时,将文件大小减少约98%。
3. Transcribe with Deepgram
3. 使用Deepgram进行转录
Use the provided script for automated transcription:
scripts/transcribe.pybash
scripts/transcribe.py input_file.mp4 \
--api-key YOUR_DEEPGRAM_API_KEY \
--output-dir ./transcripts \
--extract-audioOr use curl directly for manual control:
bash
curl -X POST "https://api.deepgram.com/v1/listen?model=nova-2&smart_format=true" \
-H "Authorization: Token YOUR_API_KEY" \
-H "Content-Type: audio/mp4" \
--data-binary @audio_file.m4a \
-o transcription.json使用提供的脚本实现自动化转录:
scripts/transcribe.pybash
scripts/transcribe.py input_file.mp4 \
--api-key YOUR_DEEPGRAM_API_KEY \
--output-dir ./transcripts \
--extract-audio或直接使用curl进行手动控制:
bash
curl -X POST "https://api.deepgram.com/v1/listen?model=nova-2&smart_format=true" \
-H "Authorization: Token YOUR_API_KEY" \
-H "Content-Type: audio/mp4" \
--data-binary @audio_file.m4a \
-o transcription.json4. Extract and Save Results
4. 提取并保存结果
The transcription response includes:
Full JSON (with timestamps, confidence scores, metadata):
json
{
"results": {
"channels": [{
"alternatives": [{
"transcript": "Full text here...",
"words": [
{"word": "hello", "start": 0.5, "end": 0.9, "confidence": 0.99}
]
}]
}]
}
}Extract plain text transcript:
bash
cat transcription.json | python3 -c "import json, sys; data=json.load(sys.stdin); print(data['results']['channels'][0]['alternatives'][0]['transcript'])" > transcript.txt转录响应包含以下内容:
完整JSON(含时间戳、置信度分数、元数据):
json
{
"results": {
"channels": [{
"alternatives": [{
"transcript": "Full text here...",
"words": [
{"word": "hello", "start": 0.5, "end": 0.9, "confidence": 0.99}
]
}]
}]
}
}提取纯文本转录稿:
bash
cat transcription.json | python3 -c "import json, sys; data=json.load(sys.stdin); print(data['results']['channels'][0]['alternatives'][0]['transcript'])" > transcript.txtUsing the Transcription Script
使用转录脚本
The provides a complete workflow:
scripts/transcribe.pyscripts/transcribe.pyBasic Usage
基础用法
bash
undefinedbash
undefinedTranscribe a video file (auto-extracts audio)
转录视频文件(自动提取音频)
scripts/transcribe.py video.mp4 --api-key YOUR_KEY --extract-audio
scripts/transcribe.py video.mp4 --api-key YOUR_KEY --extract-audio
Transcribe an audio file directly
直接转录音频文件
scripts/transcribe.py audio.mp3 --api-key YOUR_KEY
scripts/transcribe.py audio.mp3 --api-key YOUR_KEY
Specify output directory
指定输出目录
scripts/transcribe.py video.mov --api-key YOUR_KEY --output-dir ./transcripts
undefinedscripts/transcribe.py video.mov --api-key YOUR_KEY --output-dir ./transcripts
undefinedAdvanced Options
高级选项
bash
undefinedbash
undefinedUse a different Deepgram model
使用不同的Deepgram模型
scripts/transcribe.py file.mp4 --api-key YOUR_KEY --model whisper-large
scripts/transcribe.py file.mp4 --api-key YOUR_KEY --model whisper-large
Disable smart formatting
禁用智能格式化
scripts/transcribe.py file.mp4 --api-key YOUR_KEY --no-smart-format
scripts/transcribe.py file.mp4 --api-key YOUR_KEY --no-smart-format
Custom audio bitrate when extracting
提取音频时自定义比特率
scripts/transcribe.py file.mp4 --api-key YOUR_KEY --extract-audio --audio-bitrate 192k
undefinedscripts/transcribe.py file.mp4 --api-key YOUR_KEY --extract-audio --audio-bitrate 192k
undefinedOutput Files
输出文件
The script generates:
- - Full Deepgram response with timestamps
{filename}_transcription.json - - Clean text transcript only
{filename}_transcript.txt
该脚本会生成以下文件:
- - 带时间戳的完整Deepgram响应
{filename}_transcription.json - - 仅含纯文本的转录稿
{filename}_transcript.txt
Recommended Settings
推荐设置
Deepgram Model:
nova-2- Latest and most accurate model
- Good balance of speed and quality
- Handles various accents and audio quality
Smart Formatting: Enabled (default)
- Automatic punctuation
- Proper capitalization
- Number formatting
- Better readability
Audio Bitrate: 128kbps
- Excellent speech quality
- Small file size
- Fast uploads
Deepgram模型:
nova-2- 最新且最准确的模型
- 在速度和质量间达到良好平衡
- 可处理各种口音和不同音质的音频
智能格式化:启用(默认设置)
- 自动添加标点
- 正确大写
- 数字格式化
- 提升可读性
音频比特率:128kbps
- 极佳的语音质量
- 文件尺寸小
- 上传速度快
Common Scenarios
常见场景
Scenario 1: Single Video File
场景1:单个视频文件
bash
undefinedbash
undefinedUser: "Transcribe this video recording"
用户:"转录这个视频录制文件"
scripts/transcribe.py recording.mp4 --api-key KEY --extract-audio
undefinedscripts/transcribe.py recording.mp4 --api-key KEY --extract-audio
undefinedScenario 2: Multiple Screen Recordings
场景2:多个屏幕录制文件
bash
undefinedbash
undefinedExtract audio from all videos first
先从所有视频中提取音频
for f in *.mov; do
ffmpeg -i "$f" -vn -acodec aac -b:a 128k "${f%.mov}_audio.m4a" -y
done
for f in *.mov; do
ffmpeg -i "$f" -vn -acodec aac -b:a 128k "${f%.mov}_audio.m4a" -y
done
Transcribe all audio files
转录所有音频文件
for f in *_audio.m4a; do
scripts/transcribe.py "$f" --api-key KEY --output-dir ./transcripts
done
undefinedfor f in *_audio.m4a; do
scripts/transcribe.py "$f" --api-key KEY --output-dir ./transcripts
done
undefinedScenario 3: Audio-Only File
场景3:仅音频文件
bash
undefinedbash
undefinedDirect transcription (no extraction needed)
直接转录(无需提取)
scripts/transcribe.py podcast.mp3 --api-key KEY
undefinedscripts/transcribe.py podcast.mp3 --api-key KEY
undefinedTroubleshooting
故障排除
File Access Issues
文件访问问题
If encountering permission errors:
- Check file permissions:
ls -l filename - Ensure file exists:
file filename - Use absolute paths if needed
若遇到权限错误:
- 检查文件权限:
ls -l filename - 确保文件存在:
file filename - 必要时使用绝对路径
Large File Uploads Timing Out
大文件上传超时
For very large files:
- Always extract audio first ()
--extract-audio - Increase timeout in script if needed
- Consider splitting long recordings
对于超大文件:
- 务必先提取音频()
--extract-audio - 必要时增加脚本中的超时时间
- 考虑拆分长录音
API Key Issues
API密钥问题
- Verify API key is correct
- Check Deepgram account has available credits
- Ensure no extra spaces in key
- 验证API密钥是否正确
- 检查Deepgram账户是否有可用余额
- 确保密钥中无多余空格
File Size Guidelines
文件尺寸指南
| Input Type | Size | Recommendation |
|---|---|---|
| Audio | Any | Transcribe directly |
| Video | < 50MB | Can transcribe directly |
| Video | 50-200MB | Extract audio first |
| Video | > 200MB | Must extract audio first |
| 输入类型 | 尺寸 | 建议操作 |
|---|---|---|
| 音频 | 任意 | 直接转录 |
| 视频 | < 50MB | 可直接转录 |
| 视频 | 50-200MB | 先提取音频 |
| 视频 | > 200MB | 必须先提取音频 |
Resources
资源
scripts/transcribe.py
scripts/transcribe.py
Complete Python script handling:
- Audio extraction from video
- Deepgram API calls
- Response parsing
- Output file generation
Execute without loading into context for efficiency.
完整的Python脚本,可处理:
- 从视频中提取音频
- 调用Deepgram API
- 解析响应
- 生成输出文件
为提升效率,无需将其加载至上下文。
references/api_reference.md
references/api_reference.md
Deepgram API documentation including:
- Available models and features
- API parameters and options
- Response format details
- Best practices
Load into context when needing detailed API information.
Deepgram API文档,包含:
- 可用模型和功能
- API参数和选项
- 响应格式详情
- 最佳实践
当需要详细API信息时,可将其加载至上下文。