transcribe-video

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Transcribe Video

视频转录

Extract transcript text from a local video file. The skill checks for embedded subtitles first (faster and more accurate), and only falls back to API-based speech recognition if none are found.

从本地视频文件中提取转录文本。本技能会优先检查视频中的内嵌字幕（速度更快、准确率更高），只有在未找到内嵌字幕时才会使用基于API的语音识别。

Step 1: Identify the video file

步骤1：确认视频文件

Confirm the video file path with the user. Supported formats: mp4, mkv, mov, avi, webm, and any format ffmpeg can handle.

与用户确认视频文件路径。支持的格式：mp4、mkv、mov、avi、webm，以及任何ffmpeg可处理的格式。

Step 2: Check for embedded subtitles

步骤2：检查内嵌字幕

bash

ffprobe -v quiet -select_streams s -show_entries stream=index,codec_name:stream_tags=language,title -of json "<video_path>"

If subtitle streams exist → go to Step 3a (extract embedded subtitles)
If no subtitle streams → go to Step 3b (API transcription)

bash

ffprobe -v quiet -select_streams s -show_entries stream=index,codec_name:stream_tags=language,title -of json "<video_path>"

若存在字幕流 → 进入步骤3a（提取内嵌字幕）
若无字幕流 → 进入步骤3b（API转录）

Step 3a: Extract embedded subtitles

步骤3a：提取内嵌字幕

If multiple subtitle tracks exist, prefer the one matching the video's primary language or ask the user which track to use.

bash

undefined

若存在多个字幕轨道，优先选择与视频主语言匹配的轨道，或询问用户选择哪个轨道。

bash

undefined

Extract as SRT (stream index 0 for first subtitle track; adjust if needed)

提取为SRT格式（stream index 0对应第一个字幕轨道；按需调整）

ffmpeg -i "<video_path>" -map 0:s:0 -c:s srt "<output_path>.srt" -y


After extraction, convert SRT to clean text:
- Remove sequence numbers
- Remove timestamp lines (lines matching `\d{2}:\d{2}:\d{2}`)
- Remove HTML-like tags (`<i>`, `</i>`, etc.)
- Join remaining non-empty lines

Save the clean transcript to `<video_name>.txt` next to the video file. Done — skip Step 3b.

ffmpeg -i "<video_path>" -map 0:s:0 -c:s srt "<output_path>.srt" -y


提取完成后，将SRT转换为干净文本：
- 移除序号
- 移除时间戳行（匹配`\d{2}:\d{2}:\d{2}`的行）
- 移除类HTML标签（`<i>`、`</i>`等）
- 合并剩余非空行

将干净的转录文本保存至视频文件旁的`<video_name>.txt`。完成——跳过步骤3b。

Step 3b: API-based transcription

步骤3b：基于API的转录

Use the bundled transcription script. It reads credentials from

~/.transcribe_video.env

使用内置的转录脚本。脚本会从

~/.transcribe_video.env

读取凭证。

Prerequisites check

预检查

Verify the env file exists:

bash

test -f ~/.transcribe_video.env && echo "OK" || echo "MISSING"

If MISSING, tell the user to create

~/.transcribe_video.env

with:

OPENAI_API_KEY=your-key-here
# Optional Base URL:
# OPENAI_API_BASE=https://<base-url>/v1/
# Optional Model Name:
# TRANSCRIBE_MODEL=gpt-4o-transcribe

Wait for the user to confirm before proceeding.

Verify dependencies:

bash

python3 -c "from openai import OpenAI; from dotenv import load_dotenv; print('OK')" 2>&1

If missing:

pip install openai python-dotenv

验证环境文件是否存在：

bash

test -f ~/.transcribe_video.env && echo "OK" || echo "MISSING"

若文件缺失，告知用户创建

~/.transcribe_video.env

并填入以下内容：

OPENAI_API_KEY=your-key-here
# 可选基础URL：
# OPENAI_API_BASE=https://<base-url>/v1/
# 可选模型名称：
# TRANSCRIBE_MODEL=gpt-4o-transcribe

等待用户确认后再继续。

验证依赖项：

bash

python3 -c "from openai import OpenAI; from dotenv import load_dotenv; print('OK')" 2>&1

若缺失依赖：

pip install openai python-dotenv

Run transcription

运行转录

bash

python3 <skill_directory>/scripts/transcribe.py "<video_path>"

The script extracts audio (WAV, 16kHz mono), sends it to the API, and saves the transcript to

<video_name>.txt

next to the video file.

bash

python3 <skill_directory>/scripts/transcribe.py "<video_path>"

脚本会提取音频（WAV格式，16kHz单声道），发送至API，并将转录文本保存至视频文件旁的

<video_name>.txt

。

Step 4: Report results

步骤4：反馈结果

Tell the user:

Where the transcript file was saved
How many lines / approximate word count
Whether it came from embedded subtitles or API transcription
Display the first few lines as a preview

告知用户：

转录文件的保存位置
行数/大致字数
转录来源是内嵌字幕还是API转录
显示前几行作为预览