video-generation

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Video Generation Skill

视频生成Skill

Generate videos using AI (Google Veo 3.1, OpenAI Sora).

Capabilities:

🎬 Text-to-Video: Create videos from text descriptions
🖼️ Image-to-Video: Animate images as the first frame
🔊 Audio Generation: Dialogue, sound effects, ambient sounds (Veo 3+)
🎭 Reference Images: Guide video content with up to 3 reference images (Veo 3.1)

使用AI（Google Veo 3.1、OpenAI Sora）生成视频。

功能特性：

🎬 文本转视频：根据文字描述创建视频
🖼️ 图片转视频：将图片作为第一帧制作动画
🔊 音频生成：对话、音效、环境音（Veo 3+支持）
🎭 参考图片：最多使用3张参考图片引导视频内容（Veo 3.1支持）

Prerequisites

前置条件

Default: Vertex AI (10 requests/minute) ⭐

默认方案：Vertex AI（10次请求/分钟）⭐

Vertex AI is the default backend with 1400x higher rate limits:

bash

undefined

Vertex AI是默认后端，速率限制比其他方案高1400倍：

bash

undefined

1. Set your project

1. 设置你的项目

export GOOGLE_CLOUD_PROJECT=your-project-id

2. Authenticate (opens browser)

2. 认证（将打开浏览器）

gcloud auth application-default login

3. Enable the API (one-time)

3. 启用API（仅需一次）

gcloud services enable aiplatform.googleapis.com


Add to your `.env` file:

GOOGLE_CLOUD_PROJECT=your-project-id GOOGLE_CLOUD_LOCATION=us-central1

undefined

gcloud services enable aiplatform.googleapis.com


将以下内容添加到你的`.env`文件：

GOOGLE_CLOUD_PROJECT=your-project-id GOOGLE_CLOUD_LOCATION=us-central1

undefined

Fallback: AI Studio (10 requests/day)

备选方案：AI Studio（10次请求/天）

Only use if you don't have a GCP project:

```
GOOGLE_API_KEY
```
- Get from https://aistudio.google.com/apikey

仅当你没有GCP项目时使用：

```
GOOGLE_API_KEY
```
- 从https://aistudio.google.com/apikey获取

For Sora (OpenAI)

针对OpenAI Sora

```
OPENAI_API_KEY
```
- For OpenAI Sora

```
OPENAI_API_KEY
```
- 用于调用OpenAI Sora

Available Models

可用模型

Google Veo Models

Google Veo模型

Model	Description	Best For
`veo-3.1`	Highest quality (default)	Professional videos, dialogue, reference images
`veo-3.1-fast`	Faster processing	Quick iterations, batch generation

Both models include:

720p/1080p resolution
4, 6, or 8 second duration
Native audio (dialogue, SFX, ambient)
Image-to-video (animate images)
Reference images (up to 3)
Video extension
Batch/parallel generation

模型	描述	最佳适用场景
`veo-3.1`	最高画质（默认）	专业视频、对话生成、参考图片
`veo-3.1-fast`	处理速度更快	快速迭代、批量生成

两款模型均支持：

720p/1080p分辨率
4、6或8秒时长
原生音频（对话、音效、环境音）
图片转视频（图片动画化）
参考图片（最多3张）
视频续播
批量/并行生成

OpenAI Sora

Best for: Creative videos, cinematic quality, complex motion
Resolutions: 480p, 720p, 1080p
Durations: 5s, 10s, 15s, 20s
Features: Text-to-video, image-to-video

最佳适用场景：创意视频、电影级画质、复杂运动效果
分辨率：480p、720p、1080p
时长：5秒、10秒、15秒、20秒
功能：文本转视频、图片转视频

Workflow

工作流程

Step 1: Gather Requirements (REQUIRED)

步骤1：收集需求（必填）

⚠️ Use interactive questioning — ask ONE question at a time.

⚠️ 使用交互式提问——一次只问一个问题。

Question Flow

提问流程

⚠️ Use the
AskUserQuestion
tool for each question below. Do not just print questions in your response — use the tool to create interactive prompts with the options shown.

Q1: Image

"I'll generate that video for you! First — do you have an image to animate?

Yes (provide path — I'll use it as the first frame)

No, generate from scratch"

Wait for response.

Q2: Audio

"What audio preference?

With audio (default) — Veo 3.1 generates dialogue, SFX, ambient

Silent video — no audio"

Wait for response.

Q3: Model

"Which model would you like?
veo-3.1
— Latest, highest quality with audio (default)
veo-3.1-fast
— Faster processing with audio
veo-3
/
veo-3-fast
— Previous generation with audio
sora
— OpenAI, up to 20 seconds, no audio"

Wait for response.

Q4: Duration

"What duration?

4 seconds

6 seconds

8 seconds (default)"

Wait for response.

Q5: Format

"What aspect ratio and resolution?

16:9 landscape, 720p

16:9 landscape, 1080p

9:16 portrait, 720p

9:16 portrait, 1080p

Or specify"

Wait for response.

⚠️ 针对以下每个问题，使用
AskUserQuestion
工具。不要直接在回复中打印问题——使用工具创建带有以下选项的交互式提示。

问题1：图片

"我将为你生成视频！首先——你是否有需要动画化的图片？

是（提供路径——我将把它作为第一帧）

否，从头开始生成"

等待回复。

问题2：音频

"音频偏好是什么？

带音频（默认）——Veo 3.1会生成对话、音效、环境音

静音视频——无音频"

等待回复。

问题3：模型

"你想使用哪个模型？
veo-3.1
——最新版本，最高画质，支持音频（默认）
veo-3.1-fast
——处理速度更快，支持音频
veo-3
/
veo-3-fast
——上一代模型，支持音频
sora
——OpenAI模型，最长20秒，无音频"

等待回复。

问题4：时长

"视频时长是多少？

4秒

6秒

8秒（默认）"

等待回复。

问题5：格式

"宽高比和分辨率是什么？

16:9横屏，720p

16:9横屏，1080p

9:16竖屏，720p

9:16竖屏，1080p

或自定义"

等待回复。

Quick Reference

快速参考

Question	Determines
Image	Image-to-video vs text-to-video
Audio	With/without audio generation
Model	Quality and speed tradeoff
Duration	Clip length
Format	Aspect ratio and resolution

问题	决定内容
图片	图片转视频还是文本转视频
音频	是否生成音频
模型	画质与速度的权衡
时长	视频片段长度
格式	宽高比和分辨率

Step 2: Craft the Prompt

步骤2：编写提示词

Transform the user request into an effective video prompt:

Describe the scene: Set the visual context
Specify action: What moves, changes, happens
Include camera work: "slow pan", "tracking shot", "dolly shot"
Add audio cues (Veo 3+): Use quotes for dialogue, describe sounds
Set the mood: Lighting, atmosphere, time of day

Example with dialogue (Veo 3.1):

User: "a person discovering treasure"
Enhanced: "Close-up of a treasure hunter's face as torchlight flickers. He murmurs 'This must be it...' while brushing dust off an ancient chest. Sound of creaking hinges as he opens it, revealing golden light on his awestruck face. Cinematic, dramatic shadows."

Example without dialogue:

User: "a dog running on a beach"
Enhanced: "Cinematic slow-motion shot of a golden retriever running joyfully along a beach at sunset, waves lapping, warm golden hour lighting, shallow depth of field"

将用户需求转化为有效的视频提示词：

描述场景：设定视觉背景
指定动作：什么元素在移动、变化、发生
包含镜头运镜：“缓慢摇镜”、“跟拍镜头”、“推拉镜头”
添加音频提示（Veo 3+支持）：用引号标注对话，描述声音
设定氛围：光线、环境、时间段

带对话的示例（Veo 3.1）：

用户需求：“一个人发现宝藏”
优化后：“特写镜头对准寻宝者的脸，火把光线摇曳。他喃喃自语‘一定是这里...’，同时拂去古老箱子上的灰尘。打开箱子时传来铰链吱呀声，金色光芒照亮他惊叹的脸庞。电影质感，戏剧性阴影。”

不带对话的示例：

用户需求：“狗在沙滩上奔跑”
优化后：“电影级慢镜头，金毛猎犬在日落时分的沙滩上欢快奔跑，海浪轻拍，温暖的黄金时刻光线，浅景深效果”

Step 3: Select the Model

步骤3：选择模型

Default: veo-3.1 (highest quality, with audio)

Use Case	Recommended Model	Reason
Best quality	veo-3.1 (default)	Highest quality, audio
Quick iteration	veo-3.1-fast	Faster processing
Batch generation	veo-3.1-fast	Speed matters for multiple clips
Longer videos (>8s)	sora	Supports up to 20s

默认：veo-3.1（最高画质，支持音频）

使用场景	推荐模型	理由
最佳画质	veo-3.1（默认）	最高画质，支持音频
快速迭代	veo-3.1-fast	处理速度更快
批量生成	veo-3.1-fast	速度对多片段工作流至关重要
长视频（>8秒）	sora	支持最长20秒

Step 4: Generate the Video

步骤4：生成视频

Execute the appropriate script from

${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/

For Google Veo 3.1 (default, with audio):

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
  --prompt "your enhanced prompt with 'dialogue in quotes'" \
  --model "veo-3.1" \
  --duration 8 \
  --aspect-ratio "16:9" \
  --resolution "720p"

For Google Veo 3.1 with image input:

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
  --prompt "The cat slowly opens its eyes and yawns" \
  --image "/path/to/cat.jpg" \
  --model "veo-3.1" \
  --duration 8

For faster generation:

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
  --prompt "your prompt" \
  --model "veo-3.1-fast"

For OpenAI Sora (longer videos):

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/sora.py \
  --prompt "your enhanced prompt" \
  --duration 20 \
  --resolution "1080p"

List available models:

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py --list-models

从

${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/

执行相应脚本：

针对Google Veo 3.1（默认，带音频）：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
  --prompt "你的优化提示词，包含带引号的'对话'" \
  --model "veo-3.1" \
  --duration 8 \
  --aspect-ratio "16:9" \
  --resolution "720p"

针对带图片输入的Google Veo 3.1：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
  --prompt "猫慢慢睁开眼睛并打哈欠" \
  --image "/path/to/cat.jpg" \
  --model "veo-3.1" \
  --duration 8

针对快速生成：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
  --prompt "你的提示词" \
  --model "veo-3.1-fast"

针对OpenAI Sora（长视频）：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/sora.py \
  --prompt "你的优化提示词" \
  --duration 20 \
  --resolution "1080p"

列出可用模型：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py --list-models

Video Extension (For Long-Form Continuity)

视频续播（用于长视频连贯性）

The
--extend
flag creates TRUE visual continuity by continuing from where a previous Veo video ended. This is the best approach for long-form videos.

Basic extension:

bash

undefined

--extend
参数通过从上一个Veo视频的结尾处继续生成，实现真正的视觉连贯性。这是制作长视频的最佳方法。

基础续播：

bash

undefined

First, generate initial clip

首先，生成初始片段

python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py
--prompt "A person walks through a forest at sunrise"
--duration 8

python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py
--prompt "一个人在日出时穿过森林"
--duration 8

Extend it with new content (adds ~7 seconds)

用新内容续播（新增约7秒）

python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py
--extend veo_veo-3.1_20260104_120000.mp4
--prompt "Continue walking, discover a hidden stream"


**Multiple extensions (for longer videos):**
```bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py
--extend veo_veo-3.1_20260104_120000.mp4
--prompt "继续行走，发现一条隐藏的溪流"


**多次续播（用于更长视频）：**
```bash

Extend 5 times (adds ~35 seconds of continuation)

续播5次（新增约35秒内容）

python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py
--extend initial_clip.mp4
--prompt "Keep exploring the forest, encounter wildlife"
--extend-times 5


**Extension vs Stitching:**

| Approach | Result | Use Case |
|----------|--------|----------|
| **Extension** | True continuity, same characters/scene | Long continuous shots |
| **Stitching** | Separate clips with transitions | Scene changes, montages |

**Extension Limits:**
- Input video must be Veo-generated (max 141 seconds)
- Each extension adds ~7 seconds
- Maximum 20 extensions total (~2.5 minutes)
- Output resolution is 720p

python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py
--extend initial_clip.mp4
--prompt "继续探索森林，遇到野生动物"
--extend-times 5


**续播 vs 拼接：**

| 方法 | 结果 | 适用场景 |
|----------|--------|----------|
| **续播** | 真正的连贯性，相同角色/场景 | 长连续镜头 |
| **拼接** | 独立片段加转场 | 场景切换、蒙太奇 |

**续播限制：**
- 输入视频必须是Veo生成的（最长141秒）
- 每次续播新增约7秒
- 最多可续播20次（约2.5分钟）
- 输出分辨率为720p

Batch Generation (Parallel)

批量生成（并行）

Generate multiple videos simultaneously for faster multi-scene workflows. Instead of waiting 15+ minutes for 5 sequential videos, generate them all in parallel (~3 minutes total).

同时生成多个视频，加快多场景工作流速度。无需等待15+分钟生成5个顺序视频，并行生成仅需约3分钟。

Create a scenes.json file:

创建scenes.json文件：

json

[
  {"prompt": "Scene 1: Cinematic hero shot of wireless earbuds on dark surface", "duration": 6, "output": "scene1_hero.mp4"},
  {"prompt": "Scene 2: Sound waves visualization, person enjoying music", "duration": 8, "output": "scene2_sound.mp4"},
  {"prompt": "Scene 3: Close-up of earbud in ear, person exercising", "duration": 8, "output": "scene3_comfort.mp4"},
  {"prompt": "Scene 4: Lifestyle montage, various settings", "duration": 8, "output": "scene4_lifestyle.mp4"},
  {"prompt": "Scene 5: Product with logo on clean background", "duration": 4, "output": "scene5_cta.mp4"}
]

json

[
  {"prompt": "场景1：无线耳机在深色表面上的电影级主镜头", "duration": 6, "output": "scene1_hero.mp4"},
  {"prompt": "场景2：声波可视化，人们享受音乐", "duration": 8, "output": "scene2_sound.mp4"},
  {"prompt": "场景3：耳机入耳特写，人们运动", "duration": 8, "output": "scene3_comfort.mp4"},
  {"prompt": "场景4：生活方式蒙太奇，多种场景", "duration": 8, "output": "scene4_lifestyle.mp4"},
  {"prompt": "场景5：产品带logo在干净背景上", "duration": 4, "output": "scene5_cta.mp4"}
]

Generate all scenes in parallel:

并行生成所有场景：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
  --batch scenes.json

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
  --batch scenes.json

With custom worker count:

自定义工作进程数：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
  --batch scenes.json \
  --max-workers 3

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
  --batch scenes.json \
  --max-workers 3

Batch config options per video:

每个视频的批量配置选项：

Option	Description	Default
`prompt`	Video description (required)	-
`model`	veo-3.1, veo-3.1-fast, etc.	veo-3.1
`duration`	4, 6, or 8 seconds	8
`aspect_ratio`	"16:9" or "9:16"	"16:9"
`resolution`	"720p" or "1080p"	"720p"
`image`	Path to image for image-to-video	-
`negative_prompt`	What to avoid	-
`output`	Custom output filename	auto-generated

选项	描述	默认值
`prompt`	视频描述（必填）	-
`model`	veo-3.1、veo-3.1-fast等	veo-3.1
`duration`	4、6或8秒	8
`aspect_ratio`	"16:9"或"9:16"	"16:9"
`resolution`	"720p"或"1080p"	"720p"
`image`	图片转视频的图片路径	-
`negative_prompt`	要避免的内容	-
`output`	自定义输出文件名	自动生成

Speed comparison:

速度对比：

Scenes	Sequential	Parallel (5 workers)	Speedup
3	~9 min	~3 min	3x
5	~15 min	~3 min	5x
10	~30 min	~6 min	5x

场景数量	顺序生成	并行生成（5个工作进程）	提速倍数
3	~9分钟	~3分钟	3倍
5	~15分钟	~3分钟	5倍
10	~30分钟	~6分钟	5倍

Step 5: Deliver the Result

步骤5：交付结果

Provide the generated video file/URL
Share the enhanced prompt used
Mention generation settings (duration, resolution)
Offer to:
- Generate variations
- Try different style/duration
- Use a different API
- Extend the video

提供生成的视频文件/URL
分享使用的优化提示词
提及生成设置（时长、分辨率）
提供以下选项：
- 生成变体
- 尝试不同风格/时长
- 使用不同API
- 续播视频

Error Handling

错误处理

Missing API key: Inform the user which key is needed:

OpenAI: https://platform.openai.com/api-keys
Google: https://aistudio.google.com/apikey

Content policy violation: Rephrase the prompt appropriately.

Generation failed: Retry with simplified prompt or different API.

Quota exceeded: Suggest waiting or trying the other provider.

缺少API密钥：告知用户需要哪个密钥：

OpenAI：https://platform.openai.com/api-keys
Google：https://aistudio.google.com/apikey

内容政策违规：适当改写提示词。

生成失败：使用简化提示词或不同API重试。

配额用尽：建议等待或尝试其他提供商。

Prompt Engineering Tips

提示词工程技巧

For Audio (Veo 3.1)

针对音频（Veo 3.1）

Dialogue: Use quotes for speech:
```
"Hello!" she said excitedly
```
Sound effects: Describe explicitly:
```
tires screeching, engine roaring
```
Ambient: Describe the soundscape:
```
birds chirping, distant traffic
```

Example:

A man whispers "Did you hear that?" as footsteps echo in the dark hallway

对话：用引号标注对话：
```
"你好！"她兴奋地说
```
音效：明确描述：
```
轮胎打滑声、引擎轰鸣声
```
环境音：描述声场：
```
鸟鸣声、远处的交通声
```

示例：

一个男人低语道"你听到了吗？"，脚步声在黑暗的走廊里回荡

For Cinematic Quality

针对电影级画质

Include camera directions: "slow dolly", "tracking shot", "crane shot"
Specify lighting: "golden hour", "dramatic shadows", "soft diffused light"
Add film references: "Blade Runner style", "Wes Anderson aesthetic"

包含镜头方向：“缓慢推拉”、“跟拍镜头”、“升降镜头”
指定光线：“黄金时刻”、“戏剧性阴影”、“柔和漫射光”
添加电影参考：“银翼杀手风格”、“韦斯·安德森美学”

For Realistic Motion

针对真实运动

Describe physics: "natural movement", "realistic physics"
Include environmental details: "wind in hair", "leaves rustling"
Specify speed: "slow motion", "real-time", "time-lapse"

描述物理效果：“自然运动”、“真实物理效果”
包含环境细节：“风吹动头发”、“树叶沙沙作响”
指定速度：“慢动作”、“实时”、“延时摄影”

For Image-to-Video

针对图片转视频

Describe what should change/move from the starting image
Be specific about the action: "the cat slowly opens its eyes"
Include environmental motion: "leaves blow past"

描述起始图片中应该变化/移动的元素
明确动作：“猫慢慢睁开眼睛”
包含环境运动：“树叶飘过”

Negative Prompts

负面提示词

Describe what NOT to include:

--negative-prompt "cartoon, low quality, blurry"

Don't use "no" or "don't" - just describe the unwanted elements

描述不要包含的内容：

--negative-prompt "卡通风格、低画质、模糊"

不要使用“不”或“不要”——直接描述不需要的元素

API Comparison

API对比

Feature	Veo 3.1 (Default)	Veo 3.1 Fast	Sora
Provider	Google	Google	OpenAI
API Key	`GOOGLE_API_KEY`	`GOOGLE_API_KEY`	`OPENAI_API_KEY`
Max duration	8 seconds	8 seconds	20 seconds
Resolution	720p, 1080p	720p, 1080p	Up to 1080p
Aspect ratios	16:9, 9:16	16:9, 9:16	16:9, 9:16, 1:1
Audio (dialogue, SFX)	✅ Yes	✅ Yes	❌ No
Image-to-video	✅ Yes	✅ Yes	✅ Yes
Reference images	✅ Up to 3	✅ Up to 3	❌ No
Video extension	✅ Yes	✅ Yes	❌ No
Batch generation	✅ Yes	✅ Yes	❌ No
Speed	Best quality	~2x faster	Slower
Best for	Professional	Batch workflows	Longer videos

特性	Veo 3.1（默认）	Veo 3.1 Fast	Sora
提供商	Google	Google	OpenAI
API密钥	`GOOGLE_API_KEY`	`GOOGLE_API_KEY`	`OPENAI_API_KEY`
最长时长	8秒	8秒	20秒
分辨率	720p、1080p	720p、1080p	最高1080p
宽高比	16:9、9:16	16:9、9:16	16:9、9:16、1:1
音频（对话、音效）	✅ 是	✅ 是	❌ 否
图片转视频	✅ 是	✅ 是	✅ 是
参考图片	✅ 最多3张	✅ 最多3张	❌ 否
视频续播	✅ 是	✅ 是	❌ 否
批量生成	✅ 是	✅ 是	❌ 否
速度	最佳画质	约2倍快	较慢
最佳适用场景	专业视频	批量工作流	长视频