video-analyzer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Video Analyzer

Video Analyzer

通过视觉/视频大模型分析视频内容,支持本地视频文件和互联网视频。
Analyze video content using visual/video large models, supporting local video files and online videos.

使用场景

Use Cases

  • 用户要求分析、理解或描述一段视频
  • 用户提供视频文件路径或 URL,希望了解视频内容
  • 用户需要对视频进行问答
  • Users request to analyze, understand, or describe a video
  • Users provide a video file path or URL and want to know the video content
  • Users need to ask questions about the video

配置

Configuration

环境变量

Environment Variables

根据使用的模型设置对应的 API Key 环境变量:
bash
undefined
Set the corresponding API Key environment variables based on the model used:
bash
undefined

火山引擎(豆包)

VolcEngine (Doubao)

export ARK_API_KEY="your-api-key"
export ARK_API_KEY="your-api-key"

OpenAI

OpenAI

export OPENAI_API_KEY="your-api-key"
undefined
export OPENAI_API_KEY="your-api-key"
undefined

模型配置

Model Configuration

编辑
scripts/models.json
添加或修改模型配置。每个模型需要:
  • base_url
    — API 地址
  • api_key_env
    — 读取 API Key 的环境变量名
  • model
    — 模型 ID
  • api_type
    responses
    chat_completions
  • supports_video
    — 是否支持原生视频输入
Edit
scripts/models.json
to add or modify model configurations. Each model requires:
  • base_url
    — API endpoint
  • api_key_env
    — Name of the environment variable for reading the API Key
  • model
    — Model ID
  • api_type
    responses
    or
    chat_completions
  • supports_video
    — Whether native video input is supported

工作流程

Workflow

  1. 确认视频来源:获取用户提供的视频路径或 URL。
  2. 确认分析需求:明确用户想了解什么(如概括内容、回答问题、描述场景等)。如果
    $ARGUMENTS
    非空,将其作为分析提示词。
  3. 选择模型:默认使用
    models.json
    中的
    default_model
    ,用户也可指定。
  4. 执行分析:运行脚本(在
    scripts/
    目录下执行):
    bash
    uv run analyze.py --video <视频路径或URL> --prompt "<分析提示词>"
    可选参数:
    • --model <名称>
      — 指定模型(对应 models.json 中的 key)
    • --frames <数量>
      — 抽帧数量(默认 10)
    • --max-size <像素>
      — 帧最大边长(默认 720)
  5. 展示结果:将模型返回的分析结果展示给用户。
  1. Confirm Video Source: Obtain the video path or URL provided by the user.
  2. Confirm Analysis Requirements: Clarify what the user wants to know (e.g., content summary, question answering, scene description, etc.). If
    $ARGUMENTS
    is not empty, use it as the analysis prompt.
  3. Select Model: By default, use the
    default_model
    in
    models.json
    ; users can also specify a model.
  4. Execute Analysis: Run the script (execute in the
    scripts/
    directory):
    bash
    uv run analyze.py --video <video path or URL> --prompt "<analysis prompt>"
    Optional parameters:
    • --model <name>
      — Specify a model (corresponds to the key in models.json)
    • --frames <number>
      — Number of frames to extract (default: 10)
    • --max-size <pixels>
      — Maximum side length of frames (default: 720)
  5. Display Results: Present the analysis results returned by the model to the user.

CLI 参考

CLI Reference

bash
undefined
bash
undefined

本地视频

Local video

uv run analyze.py --video /path/to/video.mp4 --prompt "描述视频内容"
uv run analyze.py --video /path/to/video.mp4 --prompt "Describe the video content"

互联网直接视频 URL

Direct online video URL

uv run analyze.py --video https://example.com/video.mp4 --prompt "分析视频"
uv run analyze.py --video https://example.com/video.mp4 --prompt "Analyze the video"

视频站点 URL(YouTube、Bilibili 等)

Video website URL (YouTube, Bilibili, etc.)

uv run analyze.py --video https://www.youtube.com/watch?v=xxxxx --prompt "总结视频"
uv run analyze.py --video https://www.youtube.com/watch?v=xxxxx --prompt "Summarize the video"

指定模型和抽帧数

Specify model and number of frames

uv run analyze.py --video video.mp4 --model doubao-vision --frames 20 --prompt "分析"
undefined
uv run analyze.py --video video.mp4 --model doubao-vision --frames 20 --prompt "Analyze"
undefined

注意事项

Notes

  • 视频站点 URL 下载依赖
    yt-dlp
    ,已作为 Python 依赖自动安装
  • 抽帧模式下,帧数越多分析越详细,但 API 调用成本也越高
  • 大视频文件下载可能需要较长时间,请耐心等待
  • Downloading video from website URLs depends on
    yt-dlp
    , which is automatically installed as a Python dependency
  • In frame extraction mode, more frames lead to more detailed analysis but higher API call costs
  • Downloading large video files may take a long time, please be patient