transcribe-video
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTranscribe Video
视频转录
Extract transcript text from a local video file. The skill checks for embedded subtitles first (faster and more accurate), and only falls back to API-based speech recognition if none are found.
从本地视频文件中提取转录文本。本技能会优先检查视频中的内嵌字幕(速度更快、准确率更高),只有在未找到内嵌字幕时才会使用基于API的语音识别。
Step 1: Identify the video file
步骤1:确认视频文件
Confirm the video file path with the user. Supported formats: mp4, mkv, mov, avi, webm, and any format ffmpeg can handle.
与用户确认视频文件路径。支持的格式:mp4、mkv、mov、avi、webm,以及任何ffmpeg可处理的格式。
Step 2: Check for embedded subtitles
步骤2:检查内嵌字幕
bash
ffprobe -v quiet -select_streams s -show_entries stream=index,codec_name:stream_tags=language,title -of json "<video_path>"- If subtitle streams exist → go to Step 3a (extract embedded subtitles)
- If no subtitle streams → go to Step 3b (API transcription)
bash
ffprobe -v quiet -select_streams s -show_entries stream=index,codec_name:stream_tags=language,title -of json "<video_path>"- 若存在字幕流 → 进入步骤3a(提取内嵌字幕)
- 若无字幕流 → 进入步骤3b(API转录)
Step 3a: Extract embedded subtitles
步骤3a:提取内嵌字幕
If multiple subtitle tracks exist, prefer the one matching the video's primary language or ask the user which track to use.
bash
undefined若存在多个字幕轨道,优先选择与视频主语言匹配的轨道,或询问用户选择哪个轨道。
bash
undefinedExtract as SRT (stream index 0 for first subtitle track; adjust if needed)
提取为SRT格式(stream index 0对应第一个字幕轨道;按需调整)
ffmpeg -i "<video_path>" -map 0:s:0 -c:s srt "<output_path>.srt" -y
After extraction, convert SRT to clean text:
- Remove sequence numbers
- Remove timestamp lines (lines matching `\d{2}:\d{2}:\d{2}`)
- Remove HTML-like tags (`<i>`, `</i>`, etc.)
- Join remaining non-empty lines
Save the clean transcript to `<video_name>.txt` next to the video file. Done — skip Step 3b.ffmpeg -i "<video_path>" -map 0:s:0 -c:s srt "<output_path>.srt" -y
提取完成后,将SRT转换为干净文本:
- 移除序号
- 移除时间戳行(匹配`\d{2}:\d{2}:\d{2}`的行)
- 移除类HTML标签(`<i>`、`</i>`等)
- 合并剩余非空行
将干净的转录文本保存至视频文件旁的`<video_name>.txt`。完成——跳过步骤3b。Step 3b: API-based transcription
步骤3b:基于API的转录
Use the bundled transcription script. It reads credentials from .
~/.transcribe_video.env使用内置的转录脚本。脚本会从读取凭证。
~/.transcribe_video.envPrerequisites check
预检查
-
Verify the env file exists:bash
test -f ~/.transcribe_video.env && echo "OK" || echo "MISSING" -
If MISSING, tell the user to createwith:
~/.transcribe_video.envOPENAI_API_KEY=your-key-here # Optional Base URL: # OPENAI_API_BASE=https://<base-url>/v1/ # Optional Model Name: # TRANSCRIBE_MODEL=gpt-4o-transcribeWait for the user to confirm before proceeding. -
Verify dependencies:bash
python3 -c "from openai import OpenAI; from dotenv import load_dotenv; print('OK')" 2>&1If missing:pip install openai python-dotenv
-
验证环境文件是否存在:bash
test -f ~/.transcribe_video.env && echo "OK" || echo "MISSING" -
若文件缺失,告知用户创建并填入以下内容:
~/.transcribe_video.envOPENAI_API_KEY=your-key-here # 可选基础URL: # OPENAI_API_BASE=https://<base-url>/v1/ # 可选模型名称: # TRANSCRIBE_MODEL=gpt-4o-transcribe等待用户确认后再继续。 -
验证依赖项:bash
python3 -c "from openai import OpenAI; from dotenv import load_dotenv; print('OK')" 2>&1若缺失依赖:pip install openai python-dotenv
Run transcription
运行转录
bash
python3 <skill_directory>/scripts/transcribe.py "<video_path>"The script extracts audio (WAV, 16kHz mono), sends it to the API, and saves the transcript to next to the video file.
<video_name>.txtbash
python3 <skill_directory>/scripts/transcribe.py "<video_path>"脚本会提取音频(WAV格式,16kHz单声道),发送至API,并将转录文本保存至视频文件旁的。
<video_name>.txtStep 4: Report results
步骤4:反馈结果
Tell the user:
- Where the transcript file was saved
- How many lines / approximate word count
- Whether it came from embedded subtitles or API transcription
- Display the first few lines as a preview
告知用户:
- 转录文件的保存位置
- 行数/大致字数
- 转录来源是内嵌字幕还是API转录
- 显示前几行作为预览