deepgram-transcription

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Deepgram Transcription

Deepgram 音视频转录

Overview

概述

This skill enables efficient transcription of audio and video files using the Deepgram API. It automatically handles large video files by extracting audio first, reducing upload time and API costs. Outputs include both full JSON responses with timestamps and clean text transcripts.

此技能可借助Deepgram API高效转录音频和视频文件。它会自动处理大型视频文件，先提取音频，从而减少上传时间和API成本。输出内容包括带时间戳的完整JSON响应和纯文本转录稿。

When to Use This Skill

使用场景

Use this skill when:

Transcribing audio files (mp3, wav, m4a, aac, etc.)
Transcribing video files (mp4, mov, avi, mkv, etc.)
Converting speech in media files to text
Creating transcripts with or without timestamps
Processing multiple recordings for documentation

在以下场景中使用此技能：

转录音频文件（mp3、wav、m4a、aac等）
转录视频文件（mp4、mov、avi、mkv等）
将媒体文件中的语音转换为文本
创建带或不带时间戳的转录稿
处理多个录音用于文档记录

Core Workflow

核心工作流程

1. Determine Input Type

1. 确定输入类型

First, identify the input file type:

Audio files (mp3, wav, m4a, aac):

Can be transcribed directly
Smaller file sizes, faster uploads

Video files (mp4, mov, avi, etc.):

Should extract audio first for files >50MB
Reduces upload time significantly (e.g., 190MB video → 3MB audio)
No quality loss for transcription purposes

首先，识别输入文件类型：

音频文件（mp3、wav、m4a、aac）：

可直接转录
文件尺寸更小，上传速度更快

视频文件（mp4、mov、avi等）：

对于大于50MB的文件，应先提取音频
显著减少上传时间（例如：190MB视频 → 3MB音频）
转录质量无损失

2. Extract Audio (For Video Files)

2. 提取音频（针对视频文件）

For video files, especially those larger than 50MB, extract audio before transcription:

bash

ffmpeg -i input_video.mp4 -vn -acodec aac -b:a 128k output_audio.m4a -y

Parameters:

```
-vn
```
: No video (audio only)
```
-acodec aac
```
: AAC audio codec
```
-b:a 128k
```
: 128kbps bitrate (good quality, small size)
```
-y
```
: Overwrite output file

This reduces file size by ~98% while preserving speech quality.

对于视频文件，尤其是大于50MB的文件，转录前需先提取音频：

bash

ffmpeg -i input_video.mp4 -vn -acodec aac -b:a 128k output_audio.m4a -y

参数说明：

```
-vn
```
：不包含视频（仅提取音频）
```
-acodec aac
```
：采用AAC音频编码
```
-b:a 128k
```
：128kbps比特率（音质良好，文件小巧）
```
-y
```
：覆盖输出文件

此操作可在保留语音质量的同时，将文件大小减少约98%。

3. Transcribe with Deepgram

3. 使用Deepgram进行转录

Use the provided

scripts/transcribe.py

script for automated transcription:

bash

scripts/transcribe.py input_file.mp4 \
  --api-key YOUR_DEEPGRAM_API_KEY \
  --output-dir ./transcripts \
  --extract-audio

Or use curl directly for manual control:

bash

curl -X POST "https://api.deepgram.com/v1/listen?model=nova-2&smart_format=true" \
  -H "Authorization: Token YOUR_API_KEY" \
  -H "Content-Type: audio/mp4" \
  --data-binary @audio_file.m4a \
  -o transcription.json

使用提供的

scripts/transcribe.py

脚本实现自动化转录：

bash

scripts/transcribe.py input_file.mp4 \
  --api-key YOUR_DEEPGRAM_API_KEY \
  --output-dir ./transcripts \
  --extract-audio

或直接使用curl进行手动控制：

bash

curl -X POST "https://api.deepgram.com/v1/listen?model=nova-2&smart_format=true" \
  -H "Authorization: Token YOUR_API_KEY" \
  -H "Content-Type: audio/mp4" \
  --data-binary @audio_file.m4a \
  -o transcription.json

4. Extract and Save Results

4. 提取并保存结果

The transcription response includes:

Full JSON (with timestamps, confidence scores, metadata):

json

{
  "results": {
    "channels": [{
      "alternatives": [{
        "transcript": "Full text here...",
        "words": [
          {"word": "hello", "start": 0.5, "end": 0.9, "confidence": 0.99}
        ]
      }]
    }]
  }
}

Extract plain text transcript:

bash

cat transcription.json | python3 -c "import json, sys; data=json.load(sys.stdin); print(data['results']['channels'][0]['alternatives'][0]['transcript'])" > transcript.txt

转录响应包含以下内容：

完整JSON（含时间戳、置信度分数、元数据）：

json

{
  "results": {
    "channels": [{
      "alternatives": [{
        "transcript": "Full text here...",
        "words": [
          {"word": "hello", "start": 0.5, "end": 0.9, "confidence": 0.99}
        ]
      }]
    }]
  }
}

提取纯文本转录稿：

bash

cat transcription.json | python3 -c "import json, sys; data=json.load(sys.stdin); print(data['results']['channels'][0]['alternatives'][0]['transcript'])" > transcript.txt

Using the Transcription Script

使用转录脚本

The

scripts/transcribe.py

provides a complete workflow:

scripts/transcribe.py

提供完整的工作流：

Basic Usage

基础用法

bash

undefined

bash

undefined

Transcribe a video file (auto-extracts audio)

转录视频文件（自动提取音频）

scripts/transcribe.py video.mp4 --api-key YOUR_KEY --extract-audio

Transcribe an audio file directly

直接转录音频文件

scripts/transcribe.py audio.mp3 --api-key YOUR_KEY

Specify output directory

指定输出目录

scripts/transcribe.py video.mov --api-key YOUR_KEY --output-dir ./transcripts

undefined

scripts/transcribe.py video.mov --api-key YOUR_KEY --output-dir ./transcripts

undefined

Advanced Options

高级选项

bash

undefined

bash

undefined

Use a different Deepgram model

使用不同的Deepgram模型

scripts/transcribe.py file.mp4 --api-key YOUR_KEY --model whisper-large

Disable smart formatting

禁用智能格式化

scripts/transcribe.py file.mp4 --api-key YOUR_KEY --no-smart-format

Custom audio bitrate when extracting

提取音频时自定义比特率

scripts/transcribe.py file.mp4 --api-key YOUR_KEY --extract-audio --audio-bitrate 192k

undefined

scripts/transcribe.py file.mp4 --api-key YOUR_KEY --extract-audio --audio-bitrate 192k

undefined

Output Files

输出文件

The script generates:

```
{filename}_transcription.json
```
- Full Deepgram response with timestamps
```
{filename}_transcript.txt
```
- Clean text transcript only

该脚本会生成以下文件：

```
{filename}_transcription.json
```
- 带时间戳的完整Deepgram响应
```
{filename}_transcript.txt
```
- 仅含纯文本的转录稿

Recommended Settings

Common Scenarios

常见场景

Scenario 1: Single Video File

场景1：单个视频文件

bash

undefined

bash

undefined

User: "Transcribe this video recording"

用户："转录这个视频录制文件"

scripts/transcribe.py recording.mp4 --api-key KEY --extract-audio

undefined

scripts/transcribe.py recording.mp4 --api-key KEY --extract-audio

undefined

Scenario 2: Multiple Screen Recordings

场景2：多个屏幕录制文件

bash

undefined

bash

undefined

Extract audio from all videos first

先从所有视频中提取音频

for f in *.mov; do ffmpeg -i "$f" -vn -acodec aac -b:a 128k "${f%.mov}_audio.m4a" -y done

Transcribe all audio files

转录所有音频文件

for f in *_audio.m4a; do scripts/transcribe.py "$f" --api-key KEY --output-dir ./transcripts done

undefined

for f in *_audio.m4a; do scripts/transcribe.py "$f" --api-key KEY --output-dir ./transcripts done

undefined

Scenario 3: Audio-Only File

场景3：仅音频文件

bash

undefined

bash

undefined

Direct transcription (no extraction needed)

直接转录（无需提取）

scripts/transcribe.py podcast.mp3 --api-key KEY

undefined

scripts/transcribe.py podcast.mp3 --api-key KEY

undefined

Troubleshooting

故障排除

File Access Issues

文件访问问题

If encountering permission errors:

Check file permissions:
```
ls -l filename
```
Ensure file exists:
```
file filename
```
Use absolute paths if needed

若遇到权限错误：

检查文件权限：
```
ls -l filename
```
确保文件存在：
```
file filename
```
必要时使用绝对路径

Large File Uploads Timing Out

大文件上传超时

For very large files:

Always extract audio first (
```
--extract-audio
```
)
Increase timeout in script if needed
Consider splitting long recordings

对于超大文件：

务必先提取音频（
```
--extract-audio
```
）
必要时增加脚本中的超时时间
考虑拆分长录音

API Key Issues

API密钥问题

Verify API key is correct
Check Deepgram account has available credits
Ensure no extra spaces in key

验证API密钥是否正确
检查Deepgram账户是否有可用余额
确保密钥中无多余空格

File Size Guidelines

文件尺寸指南

Input Type	Size	Recommendation
Audio	Any	Transcribe directly
Video	< 50MB	Can transcribe directly
Video	50-200MB	Extract audio first
Video	> 200MB	Must extract audio first

输入类型	尺寸	建议操作
音频	任意	直接转录
视频	< 50MB	可直接转录
视频	50-200MB	先提取音频
视频	> 200MB	必须先提取音频

Resources

资源

scripts/transcribe.py

Complete Python script handling:

Audio extraction from video
Deepgram API calls
Response parsing
Output file generation

Execute without loading into context for efficiency.

完整的Python脚本，可处理：

从视频中提取音频
调用Deepgram API
解析响应
生成输出文件

为提升效率，无需将其加载至上下文。

references/api_reference.md

Deepgram API documentation including:

Available models and features
API parameters and options
Response format details
Best practices

Load into context when needing detailed API information.

Deepgram API文档，包含：

可用模型和功能
API参数和选项
响应格式详情
最佳实践

当需要详细API信息时，可将其加载至上下文。

deepgram-transcription

Original

Translation

Deepgram Transcription

Deepgram 音视频转录

Overview

概述

When to Use This Skill

使用场景

Core Workflow

核心工作流程

1. Determine Input Type

1. 确定输入类型

2. Extract Audio (For Video Files)

2. 提取音频（针对视频文件）

3. Transcribe with Deepgram

3. 使用Deepgram进行转录

4. Extract and Save Results

4. 提取并保存结果

Using the Transcription Script

使用转录脚本

Basic Usage

基础用法

Transcribe a video file (auto-extracts audio)

转录视频文件（自动提取音频）

Transcribe an audio file directly

直接转录音频文件

Specify output directory

指定输出目录

Advanced Options

高级选项

Use a different Deepgram model

使用不同的Deepgram模型

Disable smart formatting

禁用智能格式化

Custom audio bitrate when extracting

提取音频时自定义比特率

Output Files

输出文件

Recommended Settings

推荐设置

Common Scenarios

常见场景

Scenario 1: Single Video File

场景1：单个视频文件

User: "Transcribe this video recording"

用户："转录这个视频录制文件"

Scenario 2: Multiple Screen Recordings

场景2：多个屏幕录制文件

Extract audio from all videos first

先从所有视频中提取音频

Transcribe all audio files

转录所有音频文件

Scenario 3: Audio-Only File

场景3：仅音频文件

Direct transcription (no extraction needed)

直接转录（无需提取）

Troubleshooting

故障排除

File Access Issues

文件访问问题

Large File Uploads Timing Out

大文件上传超时

API Key Issues

API密钥问题

File Size Guidelines

文件尺寸指南

Resources

资源

scripts/transcribe.py

scripts/transcribe.py

references/api_reference.md

references/api_reference.md