video-understand
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVideo Understanding
视频理解
Multi-provider video understanding with automatic fallback and model selection.
具备自动降级与模型选择功能的多服务商视频理解工具。
Quick Start
快速开始
bash
undefinedbash
undefinedCheck available providers
查看可用服务商
python3 scripts/check_providers.py
python3 scripts/check_providers.py
Process a video (auto-selects best provider)
处理视频(自动选择最佳服务商)
python3 scripts/process_video.py "https://youtube.com/watch?v=..."
python3 scripts/process_video.py /path/to/video.mp4
python3 scripts/process_video.py "https://youtube.com/watch?v=..."
python3 scripts/process_video.py /path/to/video.mp4
Custom prompt
自定义提示词
python3 scripts/process_video.py video.mp4 -p "List all products shown with timestamps"
python3 scripts/process_video.py video.mp4 -p "列出所有出现的产品及对应时间戳"
Use specific provider/model
使用指定服务商/模型
python3 scripts/process_video.py video.mp4 --provider openrouter -m google/gemini-3-pro-preview
python3 scripts/process_video.py video.mp4 --provider openrouter -m google/gemini-3-pro-preview
List available models
查看可用模型
python3 scripts/process_video.py --list-models
undefinedpython3 scripts/process_video.py --list-models
undefinedProvider Hierarchy
服务商优先级
Automatically selects the best available provider:
| Priority | Provider | Capability | Env Var | Default Model |
|---|---|---|---|---|
| 1 | Gemini | Full video | | gemini-3-flash-preview |
| 2 | Vertex AI | Full video | | gemini-3-flash-preview |
| 3 | OpenRouter | Full video | | google/gemini-3-flash-preview |
| 4 | FFMPEG | Frames + ASR | None (requires ffmpeg + whisper) | scene |
| 5 | OpenAI | ASR only | | whisper-1 |
| 6 | AssemblyAI | ASR + analysis | | best |
| 7 | Deepgram | ASR | | nova-2 |
| 8 | Groq | ASR (fast) | | whisper-large-v3-turbo |
| 9 | Local Whisper | ASR (offline) | None | base |
Full video = visual + audio analysis. Frames + ASR = extracted screenshots + audio transcription (free, offline). ASR = audio transcription only.
会自动选择最佳可用服务商:
| 优先级 | 服务商 | 能力 | 环境变量 | 默认模型 |
|---|---|---|---|---|
| 1 | Gemini | 全视频分析 | | gemini-3-flash-preview |
| 2 | Vertex AI | 全视频分析 | | gemini-3-flash-preview |
| 3 | OpenRouter | 全视频分析 | | google/gemini-3-flash-preview |
| 4 | FFMPEG | 帧提取+ASR | 无(需安装ffmpeg + whisper) | scene |
| 5 | OpenAI | 仅ASR | | whisper-1 |
| 6 | AssemblyAI | ASR+分析 | | best |
| 7 | Deepgram | ASR | | nova-2 |
| 8 | Groq | ASR(高速) | | whisper-large-v3-turbo |
| 9 | Local Whisper | ASR(离线) | 无 | base |
全视频分析 = 视觉+音频分析。帧提取+ASR = 提取截图+音频转录(免费、离线)。ASR = 仅音频转录。
CLI Options
CLI选项
python3 scripts/process_video.py [OPTIONS] SOURCE
Arguments:
SOURCE YouTube URL, video URL, or local file path
Options:
-p, --prompt TEXT Custom prompt for video understanding
--provider NAME Force specific provider
-m, --model NAME Force specific model
--asr-only Force ASR-only mode (skip visual analysis)
-o, --output FILE Write JSON to file instead of stdout
-q, --quiet Suppress progress messages
--list-models Show available models per provider
--list-providers Show available providers as JSONpython3 scripts/process_video.py [OPTIONS] SOURCE
参数:
SOURCE YouTube URL、视频URL或本地文件路径
选项:
-p, --prompt TEXT 用于视频理解的自定义提示词
--provider NAME 强制使用指定服务商
-m, --model NAME 强制使用指定模型
--asr-only 强制开启仅ASR模式(跳过视觉分析)
-o, --output FILE 将结果写入JSON文件而非输出到控制台
-q, --quiet 隐藏进度信息
--list-models 查看各服务商的可用模型
--list-providers 以JSON格式查看可用服务商Model Selection
模型选择
Each provider supports multiple models. Use to see options:
--list-modelsbash
python3 scripts/process_video.py --list-modelsOpenRouter models:
- (default) - Fast, free tier
google/gemini-3-flash-preview - - Higher quality
google/gemini-3-pro-preview
Gemini models:
- (default) - Latest, fast
gemini-3-flash-preview - - Highest quality
gemini-3-pro-preview - - Stable production fallback
gemini-2.5-flash
Local Whisper models:
- ,
tiny(default),base,small,medium,largelarge-v3
FFMPEG modes (frame extraction strategy):
- (default) - Extract frames when scene changes (smart, efficient)
scene - - Extract I-frames only (fastest)
keyframe - - Extract frames at regular intervals (predictable)
interval
每个服务商支持多种模型。使用查看选项:
--list-modelsbash
python3 scripts/process_video.py --list-modelsOpenRouter模型:
- (默认)- 快速、免费 tier
google/gemini-3-flash-preview - - 更高质量
google/gemini-3-pro-preview
Gemini模型:
- (默认)- 最新、快速
gemini-3-flash-preview - - 最高质量
gemini-3-pro-preview - - 稳定的生产环境降级选项
gemini-2.5-flash
Local Whisper模型:
- ,
tiny(默认),base,small,medium,largelarge-v3
FFMPEG模式(帧提取策略):
- (默认)- 场景变化时提取帧(智能、高效)
scene - - 仅提取I帧(最快)
keyframe - - 按固定间隔提取帧(可预测)
interval
Quick Reference
快速参考
| Task | Reference |
|---|---|
| Setup & API keys | setup-guide.md |
| Use Gemini for video | gemini.md |
| Use OpenRouter | openrouter.md |
| FFMPEG frames (free) | ffmpeg-frames.md |
| ASR providers | asr-providers.md |
| Output JSON schema | output-format.md |
| Video sources & downloading | video-sources.md |
| 任务 | 参考文档 |
|---|---|
| 配置与API密钥 | setup-guide.md |
| 使用Gemini处理视频 | gemini.md |
| 使用OpenRouter | openrouter.md |
| FFMPEG帧提取(免费) | ffmpeg-frames.md |
| ASR服务商 | asr-providers.md |
| 输出JSON schema | output-format.md |
| 视频源与下载 | video-sources.md |
Verify Setup
验证配置
bash
python3 scripts/setup.py # Check dependencies and API keysbash
python3 scripts/setup.py # 检查依赖与API密钥Output Format
输出格式
All providers return consistent JSON:
json
{
"source": {
"type": "youtube|url|local",
"path": "...",
"duration_seconds": 120.5,
"size_mb": 15.2
},
"provider": "openrouter",
"model": "google/gemini-3-flash-preview",
"capability": "full_video",
"response": "...",
"transcript": [{"start": 0.0, "end": 2.5, "text": "..."}],
"text": "Full transcript..."
}所有服务商返回统一格式的JSON:
json
{
"source": {
"type": "youtube|url|local",
"path": "...",
"duration_seconds": 120.5,
"size_mb": 15.2
},
"provider": "openrouter",
"model": "google/gemini-3-flash-preview",
"capability": "full_video",
"response": "...",
"transcript": [{"start": 0.0, "end": 2.5, "text": "..."}],
"text": "完整转录文本..."
}Features
功能特性
- Automatic provider selection based on available API keys
- Model selection per provider with sensible defaults
- Robust path handling for macOS special characters and unicode
- Progress output (use for quiet mode)
-q - File size warnings for API limits
- Auto-conversion of video formats when needed
- YouTube URL support (direct or via download)
- 自动选择服务商:基于可用的API密钥
- 按服务商选模型:配有合理的默认选项
- 健壮的路径处理:支持macOS特殊字符和unicode
- 进度输出(使用开启静默模式)
-q - 文件大小警告:针对API限制
- 自动格式转换:在需要时转换视频格式
- 支持YouTube URL(直接处理或下载后处理)
Requirements
依赖要求
For full video understanding:
bash
pip install google-generativeai # Gemini
pip install openai # OpenRouterFor ASR fallback:
bash
brew install yt-dlp ffmpeg # Video tools
pip install openai # OpenAI Whisper
pip install groq # Groq Whisper
pip install assemblyai # AssemblyAI
pip install deepgram-sdk # Deepgram
pip install openai-whisper # Local Whisper全视频理解所需依赖:
bash
pip install google-generativeai # Gemini
pip install openai # OpenRouterASR降级方案所需依赖:
bash
brew install yt-dlp ffmpeg # 视频工具
pip install openai # OpenAI Whisper
pip install groq # Groq Whisper
pip install assemblyai # AssemblyAI
pip install deepgram-sdk # Deepgram
pip install openai-whisper # 本地Whisper