video-understand

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Video Understanding

视频理解

Multi-provider video understanding with automatic fallback and model selection.
具备自动降级与模型选择功能的多服务商视频理解工具。

Quick Start

快速开始

bash
undefined
bash
undefined

Check available providers

查看可用服务商

python3 scripts/check_providers.py
python3 scripts/check_providers.py

Process a video (auto-selects best provider)

处理视频(自动选择最佳服务商)

python3 scripts/process_video.py "https://youtube.com/watch?v=..." python3 scripts/process_video.py /path/to/video.mp4
python3 scripts/process_video.py "https://youtube.com/watch?v=..." python3 scripts/process_video.py /path/to/video.mp4

Custom prompt

自定义提示词

python3 scripts/process_video.py video.mp4 -p "List all products shown with timestamps"
python3 scripts/process_video.py video.mp4 -p "列出所有出现的产品及对应时间戳"

Use specific provider/model

使用指定服务商/模型

python3 scripts/process_video.py video.mp4 --provider openrouter -m google/gemini-3-pro-preview
python3 scripts/process_video.py video.mp4 --provider openrouter -m google/gemini-3-pro-preview

List available models

查看可用模型

python3 scripts/process_video.py --list-models
undefined
python3 scripts/process_video.py --list-models
undefined

Provider Hierarchy

服务商优先级

Automatically selects the best available provider:
PriorityProviderCapabilityEnv VarDefault Model
1GeminiFull video
GEMINI_API_KEY
gemini-3-flash-preview
2Vertex AIFull video
GOOGLE_APPLICATION_CREDENTIALS
gemini-3-flash-preview
3OpenRouterFull video
OPENROUTER_API_KEY
google/gemini-3-flash-preview
4FFMPEGFrames + ASRNone (requires ffmpeg + whisper)scene
5OpenAIASR only
OPENAI_API_KEY
whisper-1
6AssemblyAIASR + analysis
ASSEMBLYAI_API_KEY
best
7DeepgramASR
DEEPGRAM_API_KEY
nova-2
8GroqASR (fast)
GROQ_API_KEY
whisper-large-v3-turbo
9Local WhisperASR (offline)Nonebase
Full video = visual + audio analysis. Frames + ASR = extracted screenshots + audio transcription (free, offline). ASR = audio transcription only.
会自动选择最佳可用服务商:
优先级服务商能力环境变量默认模型
1Gemini全视频分析
GEMINI_API_KEY
gemini-3-flash-preview
2Vertex AI全视频分析
GOOGLE_APPLICATION_CREDENTIALS
gemini-3-flash-preview
3OpenRouter全视频分析
OPENROUTER_API_KEY
google/gemini-3-flash-preview
4FFMPEG帧提取+ASR无(需安装ffmpeg + whisper)scene
5OpenAI仅ASR
OPENAI_API_KEY
whisper-1
6AssemblyAIASR+分析
ASSEMBLYAI_API_KEY
best
7DeepgramASR
DEEPGRAM_API_KEY
nova-2
8GroqASR(高速)
GROQ_API_KEY
whisper-large-v3-turbo
9Local WhisperASR(离线)base
全视频分析 = 视觉+音频分析。帧提取+ASR = 提取截图+音频转录(免费、离线)。ASR = 仅音频转录。

CLI Options

CLI选项

python3 scripts/process_video.py [OPTIONS] SOURCE

Arguments:
  SOURCE              YouTube URL, video URL, or local file path

Options:
  -p, --prompt TEXT   Custom prompt for video understanding
  --provider NAME     Force specific provider
  -m, --model NAME    Force specific model
  --asr-only          Force ASR-only mode (skip visual analysis)
  -o, --output FILE   Write JSON to file instead of stdout
  -q, --quiet         Suppress progress messages
  --list-models       Show available models per provider
  --list-providers    Show available providers as JSON
python3 scripts/process_video.py [OPTIONS] SOURCE

参数:
  SOURCE              YouTube URL、视频URL或本地文件路径

选项:
  -p, --prompt TEXT   用于视频理解的自定义提示词
  --provider NAME     强制使用指定服务商
  -m, --model NAME    强制使用指定模型
  --asr-only          强制开启仅ASR模式(跳过视觉分析)
  -o, --output FILE   将结果写入JSON文件而非输出到控制台
  -q, --quiet         隐藏进度信息
  --list-models       查看各服务商的可用模型
  --list-providers    以JSON格式查看可用服务商

Model Selection

模型选择

Each provider supports multiple models. Use
--list-models
to see options:
bash
python3 scripts/process_video.py --list-models
OpenRouter models:
  • google/gemini-3-flash-preview
    (default) - Fast, free tier
  • google/gemini-3-pro-preview
    - Higher quality
Gemini models:
  • gemini-3-flash-preview
    (default) - Latest, fast
  • gemini-3-pro-preview
    - Highest quality
  • gemini-2.5-flash
    - Stable production fallback
Local Whisper models:
  • tiny
    ,
    base
    (default),
    small
    ,
    medium
    ,
    large
    ,
    large-v3
FFMPEG modes (frame extraction strategy):
  • scene
    (default) - Extract frames when scene changes (smart, efficient)
  • keyframe
    - Extract I-frames only (fastest)
  • interval
    - Extract frames at regular intervals (predictable)
每个服务商支持多种模型。使用
--list-models
查看选项:
bash
python3 scripts/process_video.py --list-models
OpenRouter模型:
  • google/gemini-3-flash-preview
    (默认)- 快速、免费 tier
  • google/gemini-3-pro-preview
    - 更高质量
Gemini模型:
  • gemini-3-flash-preview
    (默认)- 最新、快速
  • gemini-3-pro-preview
    - 最高质量
  • gemini-2.5-flash
    - 稳定的生产环境降级选项
Local Whisper模型:
  • tiny
    ,
    base
    (默认),
    small
    ,
    medium
    ,
    large
    ,
    large-v3
FFMPEG模式(帧提取策略):
  • scene
    (默认)- 场景变化时提取帧(智能、高效)
  • keyframe
    - 仅提取I帧(最快)
  • interval
    - 按固定间隔提取帧(可预测)

Quick Reference

快速参考

TaskReference
Setup & API keyssetup-guide.md
Use Gemini for videogemini.md
Use OpenRouteropenrouter.md
FFMPEG frames (free)ffmpeg-frames.md
ASR providersasr-providers.md
Output JSON schemaoutput-format.md
Video sources & downloadingvideo-sources.md
任务参考文档
配置与API密钥setup-guide.md
使用Gemini处理视频gemini.md
使用OpenRouteropenrouter.md
FFMPEG帧提取(免费)ffmpeg-frames.md
ASR服务商asr-providers.md
输出JSON schemaoutput-format.md
视频源与下载video-sources.md

Verify Setup

验证配置

bash
python3 scripts/setup.py  # Check dependencies and API keys
bash
python3 scripts/setup.py  # 检查依赖与API密钥

Output Format

输出格式

All providers return consistent JSON:
json
{
  "source": {
    "type": "youtube|url|local",
    "path": "...",
    "duration_seconds": 120.5,
    "size_mb": 15.2
  },
  "provider": "openrouter",
  "model": "google/gemini-3-flash-preview",
  "capability": "full_video",
  "response": "...",
  "transcript": [{"start": 0.0, "end": 2.5, "text": "..."}],
  "text": "Full transcript..."
}
所有服务商返回统一格式的JSON:
json
{
  "source": {
    "type": "youtube|url|local",
    "path": "...",
    "duration_seconds": 120.5,
    "size_mb": 15.2
  },
  "provider": "openrouter",
  "model": "google/gemini-3-flash-preview",
  "capability": "full_video",
  "response": "...",
  "transcript": [{"start": 0.0, "end": 2.5, "text": "..."}],
  "text": "完整转录文本..."
}

Features

功能特性

  • Automatic provider selection based on available API keys
  • Model selection per provider with sensible defaults
  • Robust path handling for macOS special characters and unicode
  • Progress output (use
    -q
    for quiet mode)
  • File size warnings for API limits
  • Auto-conversion of video formats when needed
  • YouTube URL support (direct or via download)
  • 自动选择服务商:基于可用的API密钥
  • 按服务商选模型:配有合理的默认选项
  • 健壮的路径处理:支持macOS特殊字符和unicode
  • 进度输出(使用
    -q
    开启静默模式)
  • 文件大小警告:针对API限制
  • 自动格式转换:在需要时转换视频格式
  • 支持YouTube URL(直接处理或下载后处理)

Requirements

依赖要求

For full video understanding:
bash
pip install google-generativeai  # Gemini
pip install openai               # OpenRouter
For ASR fallback:
bash
brew install yt-dlp ffmpeg       # Video tools
pip install openai               # OpenAI Whisper
pip install groq                 # Groq Whisper
pip install assemblyai           # AssemblyAI
pip install deepgram-sdk         # Deepgram
pip install openai-whisper       # Local Whisper
全视频理解所需依赖:
bash
pip install google-generativeai  # Gemini
pip install openai               # OpenRouter
ASR降级方案所需依赖:
bash
brew install yt-dlp ffmpeg       # 视频工具
pip install openai               # OpenAI Whisper
pip install groq                 # Groq Whisper
pip install assemblyai           # AssemblyAI
pip install deepgram-sdk         # Deepgram
pip install openai-whisper       # 本地Whisper