deepgram-transcription

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Deepgram Transcription

Deepgram 音视频转录

Overview

概述

This skill enables efficient transcription of audio and video files using the Deepgram API. It automatically handles large video files by extracting audio first, reducing upload time and API costs. Outputs include both full JSON responses with timestamps and clean text transcripts.
此技能可借助Deepgram API高效转录音频和视频文件。它会自动处理大型视频文件,先提取音频,从而减少上传时间和API成本。输出内容包括带时间戳的完整JSON响应和纯文本转录稿。

When to Use This Skill

使用场景

Use this skill when:
  • Transcribing audio files (mp3, wav, m4a, aac, etc.)
  • Transcribing video files (mp4, mov, avi, mkv, etc.)
  • Converting speech in media files to text
  • Creating transcripts with or without timestamps
  • Processing multiple recordings for documentation
在以下场景中使用此技能:
  • 转录音频文件(mp3、wav、m4a、aac等)
  • 转录视频文件(mp4、mov、avi、mkv等)
  • 将媒体文件中的语音转换为文本
  • 创建带或不带时间戳的转录稿
  • 处理多个录音用于文档记录

Core Workflow

核心工作流程

1. Determine Input Type

1. 确定输入类型

First, identify the input file type:
Audio files (mp3, wav, m4a, aac):
  • Can be transcribed directly
  • Smaller file sizes, faster uploads
Video files (mp4, mov, avi, etc.):
  • Should extract audio first for files >50MB
  • Reduces upload time significantly (e.g., 190MB video → 3MB audio)
  • No quality loss for transcription purposes
首先,识别输入文件类型:
音频文件(mp3、wav、m4a、aac):
  • 可直接转录
  • 文件尺寸更小,上传速度更快
视频文件(mp4、mov、avi等):
  • 对于大于50MB的文件,应先提取音频
  • 显著减少上传时间(例如:190MB视频 → 3MB音频)
  • 转录质量无损失

2. Extract Audio (For Video Files)

2. 提取音频(针对视频文件)

For video files, especially those larger than 50MB, extract audio before transcription:
bash
ffmpeg -i input_video.mp4 -vn -acodec aac -b:a 128k output_audio.m4a -y
Parameters:
  • -vn
    : No video (audio only)
  • -acodec aac
    : AAC audio codec
  • -b:a 128k
    : 128kbps bitrate (good quality, small size)
  • -y
    : Overwrite output file
This reduces file size by ~98% while preserving speech quality.
对于视频文件,尤其是大于50MB的文件,转录前需先提取音频:
bash
ffmpeg -i input_video.mp4 -vn -acodec aac -b:a 128k output_audio.m4a -y
参数说明:
  • -vn
    :不包含视频(仅提取音频)
  • -acodec aac
    :采用AAC音频编码
  • -b:a 128k
    :128kbps比特率(音质良好,文件小巧)
  • -y
    :覆盖输出文件
此操作可在保留语音质量的同时,将文件大小减少约98%。

3. Transcribe with Deepgram

3. 使用Deepgram进行转录

Use the provided
scripts/transcribe.py
script for automated transcription:
bash
scripts/transcribe.py input_file.mp4 \
  --api-key YOUR_DEEPGRAM_API_KEY \
  --output-dir ./transcripts \
  --extract-audio
Or use curl directly for manual control:
bash
curl -X POST "https://api.deepgram.com/v1/listen?model=nova-2&smart_format=true" \
  -H "Authorization: Token YOUR_API_KEY" \
  -H "Content-Type: audio/mp4" \
  --data-binary @audio_file.m4a \
  -o transcription.json
使用提供的
scripts/transcribe.py
脚本实现自动化转录:
bash
scripts/transcribe.py input_file.mp4 \
  --api-key YOUR_DEEPGRAM_API_KEY \
  --output-dir ./transcripts \
  --extract-audio
或直接使用curl进行手动控制:
bash
curl -X POST "https://api.deepgram.com/v1/listen?model=nova-2&smart_format=true" \
  -H "Authorization: Token YOUR_API_KEY" \
  -H "Content-Type: audio/mp4" \
  --data-binary @audio_file.m4a \
  -o transcription.json

4. Extract and Save Results

4. 提取并保存结果

The transcription response includes:
Full JSON (with timestamps, confidence scores, metadata):
json
{
  "results": {
    "channels": [{
      "alternatives": [{
        "transcript": "Full text here...",
        "words": [
          {"word": "hello", "start": 0.5, "end": 0.9, "confidence": 0.99}
        ]
      }]
    }]
  }
}
Extract plain text transcript:
bash
cat transcription.json | python3 -c "import json, sys; data=json.load(sys.stdin); print(data['results']['channels'][0]['alternatives'][0]['transcript'])" > transcript.txt
转录响应包含以下内容:
完整JSON(含时间戳、置信度分数、元数据):
json
{
  "results": {
    "channels": [{
      "alternatives": [{
        "transcript": "Full text here...",
        "words": [
          {"word": "hello", "start": 0.5, "end": 0.9, "confidence": 0.99}
        ]
      }]
    }]
  }
}
提取纯文本转录稿
bash
cat transcription.json | python3 -c "import json, sys; data=json.load(sys.stdin); print(data['results']['channels'][0]['alternatives'][0]['transcript'])" > transcript.txt

Using the Transcription Script

使用转录脚本

The
scripts/transcribe.py
provides a complete workflow:
scripts/transcribe.py
提供完整的工作流:

Basic Usage

基础用法

bash
undefined
bash
undefined

Transcribe a video file (auto-extracts audio)

转录视频文件(自动提取音频)

scripts/transcribe.py video.mp4 --api-key YOUR_KEY --extract-audio
scripts/transcribe.py video.mp4 --api-key YOUR_KEY --extract-audio

Transcribe an audio file directly

直接转录音频文件

scripts/transcribe.py audio.mp3 --api-key YOUR_KEY
scripts/transcribe.py audio.mp3 --api-key YOUR_KEY

Specify output directory

指定输出目录

scripts/transcribe.py video.mov --api-key YOUR_KEY --output-dir ./transcripts
undefined
scripts/transcribe.py video.mov --api-key YOUR_KEY --output-dir ./transcripts
undefined

Advanced Options

高级选项

bash
undefined
bash
undefined

Use a different Deepgram model

使用不同的Deepgram模型

scripts/transcribe.py file.mp4 --api-key YOUR_KEY --model whisper-large
scripts/transcribe.py file.mp4 --api-key YOUR_KEY --model whisper-large

Disable smart formatting

禁用智能格式化

scripts/transcribe.py file.mp4 --api-key YOUR_KEY --no-smart-format
scripts/transcribe.py file.mp4 --api-key YOUR_KEY --no-smart-format

Custom audio bitrate when extracting

提取音频时自定义比特率

scripts/transcribe.py file.mp4 --api-key YOUR_KEY --extract-audio --audio-bitrate 192k
undefined
scripts/transcribe.py file.mp4 --api-key YOUR_KEY --extract-audio --audio-bitrate 192k
undefined

Output Files

输出文件

The script generates:
  1. {filename}_transcription.json
    - Full Deepgram response with timestamps
  2. {filename}_transcript.txt
    - Clean text transcript only
该脚本会生成以下文件:
  1. {filename}_transcription.json
    - 带时间戳的完整Deepgram响应
  2. {filename}_transcript.txt
    - 仅含纯文本的转录稿

Recommended Settings

推荐设置

Deepgram Model:
nova-2
  • Latest and most accurate model
  • Good balance of speed and quality
  • Handles various accents and audio quality
Smart Formatting: Enabled (default)
  • Automatic punctuation
  • Proper capitalization
  • Number formatting
  • Better readability
Audio Bitrate: 128kbps
  • Excellent speech quality
  • Small file size
  • Fast uploads
Deepgram模型
nova-2
  • 最新且最准确的模型
  • 在速度和质量间达到良好平衡
  • 可处理各种口音和不同音质的音频
智能格式化:启用(默认设置)
  • 自动添加标点
  • 正确大写
  • 数字格式化
  • 提升可读性
音频比特率:128kbps
  • 极佳的语音质量
  • 文件尺寸小
  • 上传速度快

Common Scenarios

常见场景

Scenario 1: Single Video File

场景1:单个视频文件

bash
undefined
bash
undefined

User: "Transcribe this video recording"

用户:"转录这个视频录制文件"

scripts/transcribe.py recording.mp4 --api-key KEY --extract-audio
undefined
scripts/transcribe.py recording.mp4 --api-key KEY --extract-audio
undefined

Scenario 2: Multiple Screen Recordings

场景2:多个屏幕录制文件

bash
undefined
bash
undefined

Extract audio from all videos first

先从所有视频中提取音频

for f in *.mov; do ffmpeg -i "$f" -vn -acodec aac -b:a 128k "${f%.mov}_audio.m4a" -y done
for f in *.mov; do ffmpeg -i "$f" -vn -acodec aac -b:a 128k "${f%.mov}_audio.m4a" -y done

Transcribe all audio files

转录所有音频文件

for f in *_audio.m4a; do scripts/transcribe.py "$f" --api-key KEY --output-dir ./transcripts done
undefined
for f in *_audio.m4a; do scripts/transcribe.py "$f" --api-key KEY --output-dir ./transcripts done
undefined

Scenario 3: Audio-Only File

场景3:仅音频文件

bash
undefined
bash
undefined

Direct transcription (no extraction needed)

直接转录(无需提取)

scripts/transcribe.py podcast.mp3 --api-key KEY
undefined
scripts/transcribe.py podcast.mp3 --api-key KEY
undefined

Troubleshooting

故障排除

File Access Issues

文件访问问题

If encountering permission errors:
  • Check file permissions:
    ls -l filename
  • Ensure file exists:
    file filename
  • Use absolute paths if needed
若遇到权限错误:
  • 检查文件权限:
    ls -l filename
  • 确保文件存在:
    file filename
  • 必要时使用绝对路径

Large File Uploads Timing Out

大文件上传超时

For very large files:
  1. Always extract audio first (
    --extract-audio
    )
  2. Increase timeout in script if needed
  3. Consider splitting long recordings
对于超大文件:
  1. 务必先提取音频(
    --extract-audio
  2. 必要时增加脚本中的超时时间
  3. 考虑拆分长录音

API Key Issues

API密钥问题

  • Verify API key is correct
  • Check Deepgram account has available credits
  • Ensure no extra spaces in key
  • 验证API密钥是否正确
  • 检查Deepgram账户是否有可用余额
  • 确保密钥中无多余空格

File Size Guidelines

文件尺寸指南

Input TypeSizeRecommendation
AudioAnyTranscribe directly
Video< 50MBCan transcribe directly
Video50-200MBExtract audio first
Video> 200MBMust extract audio first
输入类型尺寸建议操作
音频任意直接转录
视频< 50MB可直接转录
视频50-200MB先提取音频
视频> 200MB必须先提取音频

Resources

资源

scripts/transcribe.py

scripts/transcribe.py

Complete Python script handling:
  • Audio extraction from video
  • Deepgram API calls
  • Response parsing
  • Output file generation
Execute without loading into context for efficiency.
完整的Python脚本,可处理:
  • 从视频中提取音频
  • 调用Deepgram API
  • 解析响应
  • 生成输出文件
为提升效率,无需将其加载至上下文。

references/api_reference.md

references/api_reference.md

Deepgram API documentation including:
  • Available models and features
  • API parameters and options
  • Response format details
  • Best practices
Load into context when needing detailed API information.
Deepgram API文档,包含:
  • 可用模型和功能
  • API参数和选项
  • 响应格式详情
  • 最佳实践
当需要详细API信息时,可将其加载至上下文。