video-understand
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesevideo-understand
视频内容解析
Understand video content locally using ffmpeg for frame extraction and Whisper for transcription. Fully offline, no API keys required.
借助ffmpeg提取帧、Whisper转录音频,在本地解析视频内容。完全离线运行,无需API密钥。
Prerequisites
前置依赖
- +
ffmpeg(required):ffprobebrew install ffmpeg - (optional, for transcription):
openai-whisperpip install openai-whisper
- +
ffmpeg(必填):ffprobebrew install ffmpeg - (可选,用于转录):
openai-whisperpip install openai-whisper
Commands
命令示例
bash
undefinedbash
undefinedScene detection + transcribe (default)
场景检测 + 转录(默认)
python3 skills/video-understand/scripts/understand_video.py video.mp4
python3 skills/video-understand/scripts/understand_video.py video.mp4
Keyframe extraction
关键帧提取
python3 skills/video-understand/scripts/understand_video.py video.mp4 -m keyframe
python3 skills/video-understand/scripts/understand_video.py video.mp4 -m keyframe
Regular interval extraction
固定间隔提取
python3 skills/video-understand/scripts/understand_video.py video.mp4 -m interval
python3 skills/video-understand/scripts/understand_video.py video.mp4 -m interval
Limit frames extracted
限制提取的帧数
python3 skills/video-understand/scripts/understand_video.py video.mp4 --max-frames 10
python3 skills/video-understand/scripts/understand_video.py video.mp4 --max-frames 10
Use a larger Whisper model
使用更大的Whisper模型
python3 skills/video-understand/scripts/understand_video.py video.mp4 --whisper-model small
python3 skills/video-understand/scripts/understand_video.py video.mp4 --whisper-model small
Frames only, skip transcription
仅提取帧,跳过转录
python3 skills/video-understand/scripts/understand_video.py video.mp4 --no-transcribe
python3 skills/video-understand/scripts/understand_video.py video.mp4 --no-transcribe
Quiet mode (JSON only, no progress)
静默模式(仅输出JSON,无进度信息)
python3 skills/video-understand/scripts/understand_video.py video.mp4 -q
python3 skills/video-understand/scripts/understand_video.py video.mp4 -q
Output to file
输出到文件
python3 skills/video-understand/scripts/understand_video.py video.mp4 -o result.json
undefinedpython3 skills/video-understand/scripts/understand_video.py video.mp4 -o result.json
undefinedCLI Options
CLI选项
| Flag | Description |
|---|---|
| Input video file (positional, required) |
| Extraction mode: |
| Maximum frames to keep (default: 20) |
| Whisper model size: tiny, base, small, medium, large (default: base) |
| Skip audio transcription, extract frames only |
| Write result JSON to file instead of stdout |
| Suppress progress messages, output only JSON |
| 参数 | 说明 |
|---|---|
| 输入视频文件(位置参数,必填) |
| 提取模式: |
| 保留的最大帧数(默认:20) |
| Whisper模型大小:tiny、base、small、medium、large(默认:base) |
| 跳过音频转录,仅提取帧 |
| 将结果JSON写入文件而非标准输出 |
| 隐藏进度信息,仅输出JSON |
Extraction Modes
提取模式
| Mode | How it works | Best for |
|---|---|---|
| Detects scene changes via ffmpeg | Most videos, varied content |
| Extracts I-frames (codec keyframes) | Encoded video with natural keyframe placement |
| Evenly spaced frames based on duration and max-frames | Fixed sampling, predictable output |
If mode detects no scene changes, it automatically falls back to mode.
sceneinterval| 模式 | 工作原理 | 适用场景 |
|---|---|---|
| 通过ffmpeg | 大多数视频、内容多样的视频 |
| 提取I帧(编解码器关键帧) | 带有自然关键帧布局的编码视频 |
| 根据时长和最大帧数均匀抽取帧 | 固定采样、可预测输出的场景 |
如果模式未检测到场景变化,将自动切换为模式。
sceneintervalOutput
输出结果
The script outputs JSON to stdout (or file with ). See for the full schema.
-oreferences/output-format.mdjson
{
"video": "video.mp4",
"duration": 18.076,
"resolution": {"width": 1224, "height": 1080},
"mode": "scene",
"frames": [
{"path": "/abs/path/frame_0001.jpg", "timestamp": 0.0, "timestamp_formatted": "00:00"}
],
"frame_count": 12,
"transcript": [
{"start": 0.0, "end": 2.5, "text": "Hello and welcome..."}
],
"text": "Full transcript...",
"note": "Use the Read tool to view frame images for visual understanding."
}Use the Read tool on frame image paths to visually inspect extracted frames.
该脚本会将JSON输出到标准输出(或通过参数写入文件)。完整的输出格式请参考。
-oreferences/output-format.mdjson
{
"video": "video.mp4",
"duration": 18.076,
"resolution": {"width": 1224, "height": 1080},
"mode": "scene",
"frames": [
{"path": "/abs/path/frame_0001.jpg", "timestamp": 0.0, "timestamp_formatted": "00:00"}
],
"frame_count": 12,
"transcript": [
{"start": 0.0, "end": 2.5, "text": "Hello and welcome..."}
],
"text": "Full transcript...",
"note": "Use the Read tool to view frame images for visual understanding."
}可使用Read工具查看帧图像路径,以可视化检查提取的帧。
References
参考资料
- -- Full JSON output schema documentation
references/output-format.md
- -- 完整的JSON输出格式文档
references/output-format.md