video-understand

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

video-understand

视频内容解析

Understand video content locally using ffmpeg for frame extraction and Whisper for transcription. Fully offline, no API keys required.
借助ffmpeg提取帧、Whisper转录音频,在本地解析视频内容。完全离线运行,无需API密钥。

Prerequisites

前置依赖

  • ffmpeg
    +
    ffprobe
    (required):
    brew install ffmpeg
  • openai-whisper
    (optional, for transcription):
    pip install openai-whisper
  • ffmpeg
    +
    ffprobe
    (必填):
    brew install ffmpeg
  • openai-whisper
    (可选,用于转录):
    pip install openai-whisper

Commands

命令示例

bash
undefined
bash
undefined

Scene detection + transcribe (default)

场景检测 + 转录(默认)

python3 skills/video-understand/scripts/understand_video.py video.mp4
python3 skills/video-understand/scripts/understand_video.py video.mp4

Keyframe extraction

关键帧提取

python3 skills/video-understand/scripts/understand_video.py video.mp4 -m keyframe
python3 skills/video-understand/scripts/understand_video.py video.mp4 -m keyframe

Regular interval extraction

固定间隔提取

python3 skills/video-understand/scripts/understand_video.py video.mp4 -m interval
python3 skills/video-understand/scripts/understand_video.py video.mp4 -m interval

Limit frames extracted

限制提取的帧数

python3 skills/video-understand/scripts/understand_video.py video.mp4 --max-frames 10
python3 skills/video-understand/scripts/understand_video.py video.mp4 --max-frames 10

Use a larger Whisper model

使用更大的Whisper模型

python3 skills/video-understand/scripts/understand_video.py video.mp4 --whisper-model small
python3 skills/video-understand/scripts/understand_video.py video.mp4 --whisper-model small

Frames only, skip transcription

仅提取帧,跳过转录

python3 skills/video-understand/scripts/understand_video.py video.mp4 --no-transcribe
python3 skills/video-understand/scripts/understand_video.py video.mp4 --no-transcribe

Quiet mode (JSON only, no progress)

静默模式(仅输出JSON,无进度信息)

python3 skills/video-understand/scripts/understand_video.py video.mp4 -q
python3 skills/video-understand/scripts/understand_video.py video.mp4 -q

Output to file

输出到文件

python3 skills/video-understand/scripts/understand_video.py video.mp4 -o result.json
undefined
python3 skills/video-understand/scripts/understand_video.py video.mp4 -o result.json
undefined

CLI Options

CLI选项

FlagDescription
video
Input video file (positional, required)
-m, --mode
Extraction mode:
scene
(default),
keyframe
,
interval
--max-frames
Maximum frames to keep (default: 20)
--whisper-model
Whisper model size: tiny, base, small, medium, large (default: base)
--no-transcribe
Skip audio transcription, extract frames only
-o, --output
Write result JSON to file instead of stdout
-q, --quiet
Suppress progress messages, output only JSON
参数说明
video
输入视频文件(位置参数,必填)
-m, --mode
提取模式:
scene
(默认)、
keyframe
interval
--max-frames
保留的最大帧数(默认:20)
--whisper-model
Whisper模型大小:tiny、base、small、medium、large(默认:base)
--no-transcribe
跳过音频转录,仅提取帧
-o, --output
将结果JSON写入文件而非标准输出
-q, --quiet
隐藏进度信息,仅输出JSON

Extraction Modes

提取模式

ModeHow it worksBest for
scene
Detects scene changes via ffmpeg
select='gt(scene,0.3)'
Most videos, varied content
keyframe
Extracts I-frames (codec keyframes)Encoded video with natural keyframe placement
interval
Evenly spaced frames based on duration and max-framesFixed sampling, predictable output
If
scene
mode detects no scene changes, it automatically falls back to
interval
mode.
模式工作原理适用场景
scene
通过ffmpeg
select='gt(scene,0.3)'
检测场景变化
大多数视频、内容多样的视频
keyframe
提取I帧(编解码器关键帧)带有自然关键帧布局的编码视频
interval
根据时长和最大帧数均匀抽取帧固定采样、可预测输出的场景
如果
scene
模式未检测到场景变化,将自动切换为
interval
模式。

Output

输出结果

The script outputs JSON to stdout (or file with
-o
). See
references/output-format.md
for the full schema.
json
{
  "video": "video.mp4",
  "duration": 18.076,
  "resolution": {"width": 1224, "height": 1080},
  "mode": "scene",
  "frames": [
    {"path": "/abs/path/frame_0001.jpg", "timestamp": 0.0, "timestamp_formatted": "00:00"}
  ],
  "frame_count": 12,
  "transcript": [
    {"start": 0.0, "end": 2.5, "text": "Hello and welcome..."}
  ],
  "text": "Full transcript...",
  "note": "Use the Read tool to view frame images for visual understanding."
}
Use the Read tool on frame image paths to visually inspect extracted frames.
该脚本会将JSON输出到标准输出(或通过
-o
参数写入文件)。完整的输出格式请参考
references/output-format.md
json
{
  "video": "video.mp4",
  "duration": 18.076,
  "resolution": {"width": 1224, "height": 1080},
  "mode": "scene",
  "frames": [
    {"path": "/abs/path/frame_0001.jpg", "timestamp": 0.0, "timestamp_formatted": "00:00"}
  ],
  "frame_count": 12,
  "transcript": [
    {"start": 0.0, "end": 2.5, "text": "Hello and welcome..."}
  ],
  "text": "Full transcript...",
  "note": "Use the Read tool to view frame images for visual understanding."
}
可使用Read工具查看帧图像路径,以可视化检查提取的帧。

References

参考资料

  • references/output-format.md
    -- Full JSON output schema documentation
  • references/output-format.md
    -- 完整的JSON输出格式文档