volcengine-video-understanding

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

火山视频理解

Volcengine Video Understanding

使用字节跳动火山方舟视频理解 API(doubao-seed-2-0-pro-260215 等模型)对视频进行深度理解和分析。
推荐方式:Files API 上传 + Responses API 分析
  • 支持最大 512MB 视频文件
  • 自动视频预处理(FPS采样)
  • 文件可重复使用(存储7天)
Uses ByteDance Volcano Ark Video Understanding API (models such as doubao-seed-2-0-pro-260215) to conduct in-depth understanding and analysis of videos.
Recommended method: Files API upload + Responses API analysis
  • Supports video files up to 512MB
  • Automatic video preprocessing (FPS sampling)
  • Files can be reused (stored for 7 days)

功能

Features

  • 视频上传:通过 Files API 上传本地视频(推荐,最大512MB)
  • 内容理解:分析视频场景、人物、动作、情感
  • 视频问答:基于视频内容回答用户问题
  • 视频描述:自动生成视频描述和摘要
  • Video Upload: Upload local videos via Files API (recommended, max 512MB)
  • Content Understanding: Analyze video scenes, characters, actions, emotions
  • Video Q&A: Answer user questions based on video content
  • Video Description: Automatically generate video descriptions and summaries

前置要求

Prerequisites

需要设置
ARK_API_KEY
环境变量。
The
ARK_API_KEY
environment variable needs to be set.

配置方式(推荐)

Configuration Method (Recommended)

  1. 复制配置模板:
bash
cp .canghe-skills/.env.example .canghe-skills/.env
  1. 编辑
    .canghe-skills/.env
    文件,填写你的 API Key:
ARK_API_KEY=your-actual-api-key-here
  1. Copy the configuration template:
bash
cp .canghe-skills/.env.example .canghe-skills/.env
  1. Edit the
    .canghe-skills/.env
    file and fill in your API Key:
ARK_API_KEY=your-actual-api-key-here

或使用环境变量

Or use environment variables

bash
export ARK_API_KEY="your-api-key"
bash
export ARK_API_KEY="your-api-key"

加载优先级

Loading Priority

  1. 系统环境变量 (
    process.env
    )
  2. 当前目录
    .canghe-skills/.env
  3. 用户主目录
    ~/.canghe-skills/.env
  1. System environment variable (
    process.env
    )
  2. .canghe-skills/.env
    in current directory
  3. ~/.canghe-skills/.env
    in user home directory

使用方法

Usage

1. 基础视频分析(Files API 方式 - 推荐)

1. Basic Video Analysis (Files API method - Recommended)

bash
cd ~/.openclaw/workspace/skills/volcengine-video-understanding
python3 scripts/video_understand.py /path/to/video.mp4 "描述这个视频的内容"
bash
cd ~/.openclaw/workspace/skills/volcengine-video-understanding
python3 scripts/video_understand.py /path/to/video.mp4 "Describe the content of this video"

2. 视频问答

2. Video Q&A

bash
python3 scripts/video_understand.py /path/to/video.mp4 "视频中出现了哪些人物?"
bash
python3 scripts/video_understand.py /path/to/video.mp4 "What characters appear in the video?"

3. 情感分析

3. Sentiment Analysis

bash
python3 scripts/video_understand.py /path/to/video.mp4 "分析视频中人物的情感变化"
bash
python3 scripts/video_understand.py /path/to/video.mp4 "Analyze the emotional changes of the characters in the video"

4. 指定模型和帧率

4. Specify Model and Frame Rate

bash
python3 scripts/video_understand.py /path/to/video.mp4 "总结视频要点" \
  --model doubao-seed-2-0-pro-260215 \
  --fps 2
bash
python3 scripts/video_understand.py /path/to/video.mp4 "Summarize the key points of the video" \
  --model doubao-seed-2-0-pro-260215 \
  --fps 2

5. 保存结果到文件

5. Save Results to File

bash
python3 scripts/video_understand.py /path/to/video.mp4 "描述视频" --output result.json
bash
python3 scripts/video_understand.py /path/to/video.mp4 "Describe the video" --output result.json

参数说明

Parameter Description

参数默认值说明
video_path
必填视频文件路径
instruction
必填分析指令/问题
--model
doubao-seed-2-0-pro-260215模型 ID
--fps
1视频采样帧率(预处理)
--output
-结果输出文件路径
ParameterDefault ValueDescription
video_path
RequiredVideo file path
instruction
RequiredAnalysis instruction/question
--model
doubao-seed-2-0-pro-260215Model ID
--fps
1Video sampling frame rate (preprocessing)
--output
-Result output file path

支持的模型

Supported Models

  • doubao-seed-2-0-pro-260215
    (默认)
  • doubao-seed-2-0-lite-250728
  • doubao-seed-1-6-251015
  • 其他 Seed 系列视频理解模型
  • doubao-seed-2-0-pro-260215
    (default)
  • doubao-seed-2-0-lite-250728
  • doubao-seed-1-6-251015
  • Other Seed series video understanding models

分析示例

Analysis Examples

示例 1:视频内容描述

Example 1: Video Content Description

bash
python3 scripts/video_understand.py ~/Desktop/video.mp4 "详细描述这个视频的内容,包括场景、人物和动作"
bash
python3 scripts/video_understand.py ~/Desktop/video.mp4 "Describe the content of this video in detail, including scenes, characters and actions"

示例 2:视频摘要

Example 2: Video Summary

bash
python3 scripts/video_understand.py ~/Desktop/video.mp4 "用3句话总结这个视频的要点"
bash
python3 scripts/video_understand.py ~/Desktop/video.mp4 "Summarize the key points of this video in 3 sentences"

示例 3:动作识别

Example 3: Action Recognition

bash
python3 scripts/video_understand.py ~/Desktop/video.mp4 "视频中的人物在做什么动作?按时间顺序描述"
bash
python3 scripts/video_understand.py ~/Desktop/video.mp4 "What actions are the characters in the video doing? Describe in chronological order"

示例 4:场景分析

Example 4: Scene Analysis

bash
python3 scripts/video_understand.py ~/Desktop/video.mp4 "分析视频中的场景变化和环境特征"
bash
python3 scripts/video_understand.py ~/Desktop/video.mp4 "Analyze the scene changes and environmental characteristics in the video"

技术细节

Technical Details

调用流程

Call Process

  1. 上传视频:通过 Files API 上传本地视频文件,指定 FPS 预处理配置
  2. 等待处理:等待视频预处理完成(状态变为 processed)
  3. 创建任务:调用 Responses API 进行视频理解
  4. 获取结果:返回分析结果
  1. Upload Video: Upload local video file via Files API, specify FPS preprocessing configuration
  2. Wait for Processing: Wait for video preprocessing to complete (status changes to processed)
  3. Create Task: Call Responses API for video understanding
  4. Get Results: Return analysis results

API 格式

API Format

Files API 上传
bash
curl https://ark.cn-beijing.volces.com/api/v3/files \
  -H "Authorization: Bearer $ARK_API_KEY" \
  -F 'purpose=user_data' \
  -F 'file=@video.mp4' \
  -F 'preprocess_configs[video][fps]=1'
Responses API 分析
json
{
  "model": "doubao-seed-2-0-pro-260215",
  "input": [
    {
      "role": "user",
      "content": [
        {
          "type": "input_video",
          "file_id": "file-xxxx"
        },
        {
          "type": "input_text",
          "text": "用户指令"
        }
      ]
    }
  ]
}
Files API Upload:
bash
curl https://ark.cn-beijing.volces.com/api/v3/files \
  -H "Authorization: Bearer $ARK_API_KEY" \
  -F 'purpose=user_data' \
  -F 'file=@video.mp4' \
  -F 'preprocess_configs[video][fps]=1'
Responses API Analysis:
json
{
  "model": "doubao-seed-2-0-pro-260215",
  "input": [
    {
      "role": "user",
      "content": [
        {
          "type": "input_video",
          "file_id": "file-xxxx"
        },
        {
          "type": "input_text",
          "text": "User instruction"
        }
      ]
    }
  ]
}

FPS 设置建议

FPS Setting Recommendations

FPS适用场景
0.3-0.5慢节奏视频、静态场景、节省token
1一般视频分析(默认)
2-3快速动作、细节分析
FPSApplicable Scenarios
0.3-0.5Slow-paced videos, static scenes, token saving
1General video analysis (default)
2-3Fast actions, detail analysis

限制

Limitations

  • 视频格式:MP4(推荐)、MOV、AVI
  • 文件大小:最大 512MB(Files API 方式)
  • 存储时间:上传的文件默认存储 7 天
  • 处理时间:根据视频长度和复杂度,通常 10-60 秒
  • Video Format: MP4 (recommended), MOV, AVI
  • File Size: Max 512MB (Files API method)
  • Storage Time: Uploaded files are stored for 7 days by default
  • Processing Time: Usually 10-60 seconds depending on video length and complexity

Python API 使用

Python API Usage

python
from scripts.video_understand import analyze_video

result = analyze_video(
    file_path="/path/to/video.mp4",
    instruction="描述视频内容",
    model="doubao-seed-2-0-pro-260215",
    fps=1
)
python
from scripts.video_understand import analyze_video

result = analyze_video(
    file_path="/path/to/video.mp4",
    instruction="Describe video content",
    model="doubao-seed-2-0-pro-260215",
    fps=1
)

提取回答

Extract answer

text = "" for item in result.get("output", []): if item.get("type") == "message": for content in item.get("content", []): if content.get("type") == "output_text": text = content.get("text", "") break
print(text)
undefined
text = "" for item in result.get("output", []): if item.get("type") == "message": for content in item.get("content", []): if content.get("type") == "output_text": text = content.get("text", "") break
print(text)
undefined

错误处理

Error Handling

常见错误及解决方案:
错误原因解决方案
API Key 错误未设置或错误检查 ARK_API_KEY 环境变量
文件不存在路径错误检查文件路径
上传失败文件过大或格式不支持检查文件大小(<512MB)和格式
处理超时视频过长或复杂缩短视频或降低 FPS
Common errors and solutions:
ErrorCauseSolution
API Key errorNot set or incorrectCheck ARK_API_KEY environment variable
File does not existWrong pathCheck file path
Upload failedFile too large or format not supportedCheck file size (<512MB) and format
Processing timeoutVideo too long or complexShorten video or reduce FPS

参考文档

Reference Documents