video-analyzer

Original🇨🇳 Chinese
Translated
1 scriptsChecked / no sensitive code detected

Analyze video content using visual/video large models. This tool is triggered when the user uses phrases like "analyze video", "video understanding", "look at this video", or "analyze video".

4installs
Added on

NPX Install

npx skill4agent add zrong/skills video-analyzer

SKILL.md Content (Chinese)

View Translation Comparison →

Video Analyzer

Analyze video content using visual/video large models, supporting local video files and online videos.

Use Cases

  • Users request to analyze, understand, or describe a video
  • Users provide a video file path or URL and want to know the video content
  • Users need to ask questions about the video

Configuration

Environment Variables

Set the corresponding API Key environment variables based on the model used:
bash
# VolcEngine (Doubao)
export ARK_API_KEY="your-api-key"

# OpenAI
export OPENAI_API_KEY="your-api-key"

Model Configuration

Edit
scripts/models.json
to add or modify model configurations. Each model requires:
  • base_url
    — API endpoint
  • api_key_env
    — Name of the environment variable for reading the API Key
  • model
    — Model ID
  • api_type
    responses
    or
    chat_completions
  • supports_video
    — Whether native video input is supported

Workflow

  1. Confirm Video Source: Obtain the video path or URL provided by the user.
  2. Confirm Analysis Requirements: Clarify what the user wants to know (e.g., content summary, question answering, scene description, etc.). If
    $ARGUMENTS
    is not empty, use it as the analysis prompt.
  3. Select Model: By default, use the
    default_model
    in
    models.json
    ; users can also specify a model.
  4. Execute Analysis: Run the script (execute in the
    scripts/
    directory):
    bash
    uv run analyze.py --video <video path or URL> --prompt "<analysis prompt>"
    Optional parameters:
    • --model <name>
      — Specify a model (corresponds to the key in models.json)
    • --frames <number>
      — Number of frames to extract (default: 10)
    • --max-size <pixels>
      — Maximum side length of frames (default: 720)
  5. Display Results: Present the analysis results returned by the model to the user.

CLI Reference

bash
# Local video
uv run analyze.py --video /path/to/video.mp4 --prompt "Describe the video content"

# Direct online video URL
uv run analyze.py --video https://example.com/video.mp4 --prompt "Analyze the video"

# Video website URL (YouTube, Bilibili, etc.)
uv run analyze.py --video https://www.youtube.com/watch?v=xxxxx --prompt "Summarize the video"

# Specify model and number of frames
uv run analyze.py --video video.mp4 --model doubao-vision --frames 20 --prompt "Analyze"

Notes

  • Downloading video from website URLs depends on
    yt-dlp
    , which is automatically installed as a Python dependency
  • In frame extraction mode, more frames lead to more detailed analysis but higher API call costs
  • Downloading large video files may take a long time, please be patient