video-generation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Video Generation Skill

视频生成Skill

Generate videos using AI (Google Veo 3.1, OpenAI Sora).
Capabilities:
  • 🎬 Text-to-Video: Create videos from text descriptions
  • 🖼️ Image-to-Video: Animate images as the first frame
  • 🔊 Audio Generation: Dialogue, sound effects, ambient sounds (Veo 3+)
  • 🎭 Reference Images: Guide video content with up to 3 reference images (Veo 3.1)
使用AI(Google Veo 3.1、OpenAI Sora)生成视频。
功能特性:
  • 🎬 文本转视频:根据文字描述创建视频
  • 🖼️ 图片转视频:将图片作为第一帧制作动画
  • 🔊 音频生成:对话、音效、环境音(Veo 3+支持)
  • 🎭 参考图片:最多使用3张参考图片引导视频内容(Veo 3.1支持)

Prerequisites

前置条件

Default: Vertex AI (10 requests/minute) ⭐

默认方案:Vertex AI(10次请求/分钟)⭐

Vertex AI is the default backend with 1400x higher rate limits:
bash
undefined
Vertex AI是默认后端,速率限制比其他方案高1400倍:
bash
undefined

1. Set your project

1. 设置你的项目

export GOOGLE_CLOUD_PROJECT=your-project-id
export GOOGLE_CLOUD_PROJECT=your-project-id

2. Authenticate (opens browser)

2. 认证(将打开浏览器)

gcloud auth application-default login
gcloud auth application-default login

3. Enable the API (one-time)

3. 启用API(仅需一次)

gcloud services enable aiplatform.googleapis.com

Add to your `.env` file:
GOOGLE_CLOUD_PROJECT=your-project-id GOOGLE_CLOUD_LOCATION=us-central1
undefined
gcloud services enable aiplatform.googleapis.com

将以下内容添加到你的`.env`文件:
GOOGLE_CLOUD_PROJECT=your-project-id GOOGLE_CLOUD_LOCATION=us-central1
undefined

Fallback: AI Studio (10 requests/day)

备选方案:AI Studio(10次请求/天)

Only use if you don't have a GCP project:
仅当你没有GCP项目时使用:

For Sora (OpenAI)

针对OpenAI Sora

  • OPENAI_API_KEY
    - For OpenAI Sora
  • OPENAI_API_KEY
    - 用于调用OpenAI Sora

Available Models

可用模型

Google Veo Models

Google Veo模型

ModelDescriptionBest For
veo-3.1
Highest quality (default)Professional videos, dialogue, reference images
veo-3.1-fast
Faster processingQuick iterations, batch generation
Both models include:
  • 720p/1080p resolution
  • 4, 6, or 8 second duration
  • Native audio (dialogue, SFX, ambient)
  • Image-to-video (animate images)
  • Reference images (up to 3)
  • Video extension
  • Batch/parallel generation
模型描述最佳适用场景
veo-3.1
最高画质(默认)专业视频、对话生成、参考图片
veo-3.1-fast
处理速度更快快速迭代、批量生成
两款模型均支持:
  • 720p/1080p分辨率
  • 4、6或8秒时长
  • 原生音频(对话、音效、环境音)
  • 图片转视频(图片动画化)
  • 参考图片(最多3张)
  • 视频续播
  • 批量/并行生成

OpenAI Sora

OpenAI Sora

  • Best for: Creative videos, cinematic quality, complex motion
  • Resolutions: 480p, 720p, 1080p
  • Durations: 5s, 10s, 15s, 20s
  • Features: Text-to-video, image-to-video
  • 最佳适用场景:创意视频、电影级画质、复杂运动效果
  • 分辨率:480p、720p、1080p
  • 时长:5秒、10秒、15秒、20秒
  • 功能:文本转视频、图片转视频

Workflow

工作流程

Step 1: Gather Requirements (REQUIRED)

步骤1:收集需求(必填)

⚠️ Use interactive questioning — ask ONE question at a time.
⚠️ 使用交互式提问——一次只问一个问题。

Question Flow

提问流程

⚠️ Use the
AskUserQuestion
tool for each question below.
Do not just print questions in your response — use the tool to create interactive prompts with the options shown.
Q1: Image
"I'll generate that video for you! First — do you have an image to animate?
  • Yes (provide path — I'll use it as the first frame)
  • No, generate from scratch"
Wait for response.
Q2: Audio
"What audio preference?
  • With audio (default) — Veo 3.1 generates dialogue, SFX, ambient
  • Silent video — no audio"
Wait for response.
Q3: Model
"Which model would you like?
  • veo-3.1
    — Latest, highest quality with audio (default)
  • veo-3.1-fast
    — Faster processing with audio
  • veo-3
    /
    veo-3-fast
    — Previous generation with audio
  • sora
    — OpenAI, up to 20 seconds, no audio"
Wait for response.
Q4: Duration
"What duration?
  • 4 seconds
  • 6 seconds
  • 8 seconds (default)"
Wait for response.
Q5: Format
"What aspect ratio and resolution?
  • 16:9 landscape, 720p
  • 16:9 landscape, 1080p
  • 9:16 portrait, 720p
  • 9:16 portrait, 1080p
  • Or specify"
Wait for response.
⚠️ 针对以下每个问题,使用
AskUserQuestion
工具。
不要直接在回复中打印问题——使用工具创建带有以下选项的交互式提示。
问题1:图片
"我将为你生成视频!首先——你是否有需要动画化的图片?
  • 是(提供路径——我将把它作为第一帧)
  • 否,从头开始生成"
等待回复。
问题2:音频
"音频偏好是什么?
  • 带音频(默认)——Veo 3.1会生成对话、音效、环境音
  • 静音视频——无音频"
等待回复。
问题3:模型
"你想使用哪个模型
  • veo-3.1
    ——最新版本,最高画质,支持音频(默认)
  • veo-3.1-fast
    ——处理速度更快,支持音频
  • veo-3
    /
    veo-3-fast
    ——上一代模型,支持音频
  • sora
    ——OpenAI模型,最长20秒,无音频"
等待回复。
问题4:时长
"视频时长是多少?
  • 4秒
  • 6秒
  • 8秒(默认)"
等待回复。
问题5:格式
"宽高比和分辨率是什么?
  • 16:9横屏,720p
  • 16:9横屏,1080p
  • 9:16竖屏,720p
  • 9:16竖屏,1080p
  • 或自定义"
等待回复。

Quick Reference

快速参考

QuestionDetermines
ImageImage-to-video vs text-to-video
AudioWith/without audio generation
ModelQuality and speed tradeoff
DurationClip length
FormatAspect ratio and resolution

问题决定内容
图片图片转视频还是文本转视频
音频是否生成音频
模型画质与速度的权衡
时长视频片段长度
格式宽高比和分辨率

Step 2: Craft the Prompt

步骤2:编写提示词

Transform the user request into an effective video prompt:
  1. Describe the scene: Set the visual context
  2. Specify action: What moves, changes, happens
  3. Include camera work: "slow pan", "tracking shot", "dolly shot"
  4. Add audio cues (Veo 3+): Use quotes for dialogue, describe sounds
  5. Set the mood: Lighting, atmosphere, time of day
Example with dialogue (Veo 3.1):
  • User: "a person discovering treasure"
  • Enhanced: "Close-up of a treasure hunter's face as torchlight flickers. He murmurs 'This must be it...' while brushing dust off an ancient chest. Sound of creaking hinges as he opens it, revealing golden light on his awestruck face. Cinematic, dramatic shadows."
Example without dialogue:
  • User: "a dog running on a beach"
  • Enhanced: "Cinematic slow-motion shot of a golden retriever running joyfully along a beach at sunset, waves lapping, warm golden hour lighting, shallow depth of field"
将用户需求转化为有效的视频提示词:
  1. 描述场景:设定视觉背景
  2. 指定动作:什么元素在移动、变化、发生
  3. 包含镜头运镜:“缓慢摇镜”、“跟拍镜头”、“推拉镜头”
  4. 添加音频提示(Veo 3+支持):用引号标注对话,描述声音
  5. 设定氛围:光线、环境、时间段
带对话的示例(Veo 3.1):
  • 用户需求:“一个人发现宝藏”
  • 优化后:“特写镜头对准寻宝者的脸,火把光线摇曳。他喃喃自语‘一定是这里...’,同时拂去古老箱子上的灰尘。打开箱子时传来铰链吱呀声,金色光芒照亮他惊叹的脸庞。电影质感,戏剧性阴影。”
不带对话的示例:
  • 用户需求:“狗在沙滩上奔跑”
  • 优化后:“电影级慢镜头,金毛猎犬在日落时分的沙滩上欢快奔跑,海浪轻拍,温暖的黄金时刻光线,浅景深效果”

Step 3: Select the Model

步骤3:选择模型

Default: veo-3.1 (highest quality, with audio)
Use CaseRecommended ModelReason
Best qualityveo-3.1 (default)Highest quality, audio
Quick iterationveo-3.1-fastFaster processing
Batch generationveo-3.1-fastSpeed matters for multiple clips
Longer videos (>8s)soraSupports up to 20s
默认:veo-3.1(最高画质,支持音频)
使用场景推荐模型理由
最佳画质veo-3.1(默认)最高画质,支持音频
快速迭代veo-3.1-fast处理速度更快
批量生成veo-3.1-fast速度对多片段工作流至关重要
长视频(>8秒)sora支持最长20秒

Step 4: Generate the Video

步骤4:生成视频

Execute the appropriate script from
${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/
:
For Google Veo 3.1 (default, with audio):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
  --prompt "your enhanced prompt with 'dialogue in quotes'" \
  --model "veo-3.1" \
  --duration 8 \
  --aspect-ratio "16:9" \
  --resolution "720p"
For Google Veo 3.1 with image input:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
  --prompt "The cat slowly opens its eyes and yawns" \
  --image "/path/to/cat.jpg" \
  --model "veo-3.1" \
  --duration 8
For faster generation:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
  --prompt "your prompt" \
  --model "veo-3.1-fast"
For OpenAI Sora (longer videos):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/sora.py \
  --prompt "your enhanced prompt" \
  --duration 20 \
  --resolution "1080p"
List available models:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py --list-models
${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/
执行相应脚本:
针对Google Veo 3.1(默认,带音频):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
  --prompt "你的优化提示词,包含带引号的'对话'" \
  --model "veo-3.1" \
  --duration 8 \
  --aspect-ratio "16:9" \
  --resolution "720p"
针对带图片输入的Google Veo 3.1:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
  --prompt "猫慢慢睁开眼睛并打哈欠" \
  --image "/path/to/cat.jpg" \
  --model "veo-3.1" \
  --duration 8
针对快速生成:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
  --prompt "你的提示词" \
  --model "veo-3.1-fast"
针对OpenAI Sora(长视频):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/sora.py \
  --prompt "你的优化提示词" \
  --duration 20 \
  --resolution "1080p"
列出可用模型:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py --list-models

Video Extension (For Long-Form Continuity)

视频续播(用于长视频连贯性)

The
--extend
flag creates TRUE visual continuity
by continuing from where a previous Veo video ended. This is the best approach for long-form videos.
Basic extension:
bash
undefined
--extend
参数通过从上一个Veo视频的结尾处继续生成,实现真正的视觉连贯性
。这是制作长视频的最佳方法。
基础续播:
bash
undefined

First, generate initial clip

首先,生成初始片段

python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py
--prompt "A person walks through a forest at sunrise"
--duration 8
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py
--prompt "一个人在日出时穿过森林"
--duration 8

Extend it with new content (adds ~7 seconds)

用新内容续播(新增约7秒)

python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py
--extend veo_veo-3.1_20260104_120000.mp4
--prompt "Continue walking, discover a hidden stream"

**Multiple extensions (for longer videos):**
```bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py
--extend veo_veo-3.1_20260104_120000.mp4
--prompt "继续行走,发现一条隐藏的溪流"

**多次续播(用于更长视频):**
```bash

Extend 5 times (adds ~35 seconds of continuation)

续播5次(新增约35秒内容)

python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py
--extend initial_clip.mp4
--prompt "Keep exploring the forest, encounter wildlife"
--extend-times 5

**Extension vs Stitching:**

| Approach | Result | Use Case |
|----------|--------|----------|
| **Extension** | True continuity, same characters/scene | Long continuous shots |
| **Stitching** | Separate clips with transitions | Scene changes, montages |

**Extension Limits:**
- Input video must be Veo-generated (max 141 seconds)
- Each extension adds ~7 seconds
- Maximum 20 extensions total (~2.5 minutes)
- Output resolution is 720p
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py
--extend initial_clip.mp4
--prompt "继续探索森林,遇到野生动物"
--extend-times 5

**续播 vs 拼接:**

| 方法 | 结果 | 适用场景 |
|----------|--------|----------|
| **续播** | 真正的连贯性,相同角色/场景 | 长连续镜头 |
| **拼接** | 独立片段加转场 | 场景切换、蒙太奇 |

**续播限制:**
- 输入视频必须是Veo生成的(最长141秒)
- 每次续播新增约7秒
- 最多可续播20次(约2.5分钟)
- 输出分辨率为720p

Batch Generation (Parallel)

批量生成(并行)

Generate multiple videos simultaneously for faster multi-scene workflows. Instead of waiting 15+ minutes for 5 sequential videos, generate them all in parallel (~3 minutes total).
同时生成多个视频,加快多场景工作流速度。无需等待15+分钟生成5个顺序视频,并行生成仅需约3分钟。

Create a scenes.json file:

创建scenes.json文件:

json
[
  {"prompt": "Scene 1: Cinematic hero shot of wireless earbuds on dark surface", "duration": 6, "output": "scene1_hero.mp4"},
  {"prompt": "Scene 2: Sound waves visualization, person enjoying music", "duration": 8, "output": "scene2_sound.mp4"},
  {"prompt": "Scene 3: Close-up of earbud in ear, person exercising", "duration": 8, "output": "scene3_comfort.mp4"},
  {"prompt": "Scene 4: Lifestyle montage, various settings", "duration": 8, "output": "scene4_lifestyle.mp4"},
  {"prompt": "Scene 5: Product with logo on clean background", "duration": 4, "output": "scene5_cta.mp4"}
]
json
[
  {"prompt": "场景1:无线耳机在深色表面上的电影级主镜头", "duration": 6, "output": "scene1_hero.mp4"},
  {"prompt": "场景2:声波可视化,人们享受音乐", "duration": 8, "output": "scene2_sound.mp4"},
  {"prompt": "场景3:耳机入耳特写,人们运动", "duration": 8, "output": "scene3_comfort.mp4"},
  {"prompt": "场景4:生活方式蒙太奇,多种场景", "duration": 8, "output": "scene4_lifestyle.mp4"},
  {"prompt": "场景5:产品带logo在干净背景上", "duration": 4, "output": "scene5_cta.mp4"}
]

Generate all scenes in parallel:

并行生成所有场景:

bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
  --batch scenes.json
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
  --batch scenes.json

With custom worker count:

自定义工作进程数:

bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
  --batch scenes.json \
  --max-workers 3
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
  --batch scenes.json \
  --max-workers 3

Batch config options per video:

每个视频的批量配置选项:

OptionDescriptionDefault
prompt
Video description (required)-
model
veo-3.1, veo-3.1-fast, etc.veo-3.1
duration
4, 6, or 8 seconds8
aspect_ratio
"16:9" or "9:16""16:9"
resolution
"720p" or "1080p""720p"
image
Path to image for image-to-video-
negative_prompt
What to avoid-
output
Custom output filenameauto-generated
选项描述默认值
prompt
视频描述(必填)-
model
veo-3.1、veo-3.1-fast等veo-3.1
duration
4、6或8秒8
aspect_ratio
"16:9"或"9:16""16:9"
resolution
"720p"或"1080p""720p"
image
图片转视频的图片路径-
negative_prompt
要避免的内容-
output
自定义输出文件名自动生成

Speed comparison:

速度对比:

ScenesSequentialParallel (5 workers)Speedup
3~9 min~3 min3x
5~15 min~3 min5x
10~30 min~6 min5x
场景数量顺序生成并行生成(5个工作进程)提速倍数
3~9分钟~3分钟3倍
5~15分钟~3分钟5倍
10~30分钟~6分钟5倍

Step 5: Deliver the Result

步骤5:交付结果

  1. Provide the generated video file/URL
  2. Share the enhanced prompt used
  3. Mention generation settings (duration, resolution)
  4. Offer to:
    • Generate variations
    • Try different style/duration
    • Use a different API
    • Extend the video
  1. 提供生成的视频文件/URL
  2. 分享使用的优化提示词
  3. 提及生成设置(时长、分辨率)
  4. 提供以下选项:
    • 生成变体
    • 尝试不同风格/时长
    • 使用不同API
    • 续播视频

Error Handling

错误处理

Missing API key: Inform the user which key is needed:
Content policy violation: Rephrase the prompt appropriately.
Generation failed: Retry with simplified prompt or different API.
Quota exceeded: Suggest waiting or trying the other provider.
缺少API密钥:告知用户需要哪个密钥:
内容政策违规:适当改写提示词。
生成失败:使用简化提示词或不同API重试。
配额用尽:建议等待或尝试其他提供商。

Prompt Engineering Tips

提示词工程技巧

For Audio (Veo 3.1)

针对音频(Veo 3.1)

  • Dialogue: Use quotes for speech:
    "Hello!" she said excitedly
  • Sound effects: Describe explicitly:
    tires screeching, engine roaring
  • Ambient: Describe the soundscape:
    birds chirping, distant traffic
  • Example:
    A man whispers "Did you hear that?" as footsteps echo in the dark hallway
  • 对话:用引号标注对话:
    "你好!"她兴奋地说
  • 音效:明确描述:
    轮胎打滑声、引擎轰鸣声
  • 环境音:描述声场:
    鸟鸣声、远处的交通声
  • 示例
    一个男人低语道"你听到了吗?",脚步声在黑暗的走廊里回荡

For Cinematic Quality

针对电影级画质

  • Include camera directions: "slow dolly", "tracking shot", "crane shot"
  • Specify lighting: "golden hour", "dramatic shadows", "soft diffused light"
  • Add film references: "Blade Runner style", "Wes Anderson aesthetic"
  • 包含镜头方向:“缓慢推拉”、“跟拍镜头”、“升降镜头”
  • 指定光线:“黄金时刻”、“戏剧性阴影”、“柔和漫射光”
  • 添加电影参考:“银翼杀手风格”、“韦斯·安德森美学”

For Realistic Motion

针对真实运动

  • Describe physics: "natural movement", "realistic physics"
  • Include environmental details: "wind in hair", "leaves rustling"
  • Specify speed: "slow motion", "real-time", "time-lapse"
  • 描述物理效果:“自然运动”、“真实物理效果”
  • 包含环境细节:“风吹动头发”、“树叶沙沙作响”
  • 指定速度:“慢动作”、“实时”、“延时摄影”

For Image-to-Video

针对图片转视频

  • Describe what should change/move from the starting image
  • Be specific about the action: "the cat slowly opens its eyes"
  • Include environmental motion: "leaves blow past"
  • 描述起始图片中应该变化/移动的元素
  • 明确动作:“猫慢慢睁开眼睛”
  • 包含环境运动:“树叶飘过”

Negative Prompts

负面提示词

  • Describe what NOT to include:
    --negative-prompt "cartoon, low quality, blurry"
  • Don't use "no" or "don't" - just describe the unwanted elements
  • 描述不要包含的内容:
    --negative-prompt "卡通风格、低画质、模糊"
  • 不要使用“不”或“不要”——直接描述不需要的元素

API Comparison

API对比

FeatureVeo 3.1 (Default)Veo 3.1 FastSora
ProviderGoogleGoogleOpenAI
API Key
GOOGLE_API_KEY
GOOGLE_API_KEY
OPENAI_API_KEY
Max duration8 seconds8 seconds20 seconds
Resolution720p, 1080p720p, 1080pUp to 1080p
Aspect ratios16:9, 9:1616:9, 9:1616:9, 9:16, 1:1
Audio (dialogue, SFX)✅ Yes✅ Yes❌ No
Image-to-video✅ Yes✅ Yes✅ Yes
Reference images✅ Up to 3✅ Up to 3❌ No
Video extension✅ Yes✅ Yes❌ No
Batch generation✅ Yes✅ Yes❌ No
SpeedBest quality~2x fasterSlower
Best forProfessionalBatch workflowsLonger videos
特性Veo 3.1(默认)Veo 3.1 FastSora
提供商GoogleGoogleOpenAI
API密钥
GOOGLE_API_KEY
GOOGLE_API_KEY
OPENAI_API_KEY
最长时长8秒8秒20秒
分辨率720p、1080p720p、1080p最高1080p
宽高比16:9、9:1616:9、9:1616:9、9:16、1:1
音频(对话、音效)✅ 是✅ 是❌ 否
图片转视频✅ 是✅ 是✅ 是
参考图片✅ 最多3张✅ 最多3张❌ 否
视频续播✅ 是✅ 是❌ 否
批量生成✅ 是✅ 是❌ 否
速度最佳画质约2倍快较慢
最佳适用场景专业视频批量工作流长视频