video-generation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVideo Generation Skill
视频生成Skill
Generate videos using AI (Google Veo 3.1, OpenAI Sora).
Capabilities:
- 🎬 Text-to-Video: Create videos from text descriptions
- 🖼️ Image-to-Video: Animate images as the first frame
- 🔊 Audio Generation: Dialogue, sound effects, ambient sounds (Veo 3+)
- 🎭 Reference Images: Guide video content with up to 3 reference images (Veo 3.1)
使用AI(Google Veo 3.1、OpenAI Sora)生成视频。
功能特性:
- 🎬 文本转视频:根据文字描述创建视频
- 🖼️ 图片转视频:将图片作为第一帧制作动画
- 🔊 音频生成:对话、音效、环境音(Veo 3+支持)
- 🎭 参考图片:最多使用3张参考图片引导视频内容(Veo 3.1支持)
Prerequisites
前置条件
Default: Vertex AI (10 requests/minute) ⭐
默认方案:Vertex AI(10次请求/分钟)⭐
Vertex AI is the default backend with 1400x higher rate limits:
bash
undefinedVertex AI是默认后端,速率限制比其他方案高1400倍:
bash
undefined1. Set your project
1. 设置你的项目
export GOOGLE_CLOUD_PROJECT=your-project-id
export GOOGLE_CLOUD_PROJECT=your-project-id
2. Authenticate (opens browser)
2. 认证(将打开浏览器)
gcloud auth application-default login
gcloud auth application-default login
3. Enable the API (one-time)
3. 启用API(仅需一次)
gcloud services enable aiplatform.googleapis.com
Add to your `.env` file:GOOGLE_CLOUD_PROJECT=your-project-id
GOOGLE_CLOUD_LOCATION=us-central1
undefinedgcloud services enable aiplatform.googleapis.com
将以下内容添加到你的`.env`文件:GOOGLE_CLOUD_PROJECT=your-project-id
GOOGLE_CLOUD_LOCATION=us-central1
undefinedFallback: AI Studio (10 requests/day)
备选方案:AI Studio(10次请求/天)
Only use if you don't have a GCP project:
- - Get from https://aistudio.google.com/apikey
GOOGLE_API_KEY
仅当你没有GCP项目时使用:
- - 从https://aistudio.google.com/apikey获取
GOOGLE_API_KEY
For Sora (OpenAI)
针对OpenAI Sora
- - For OpenAI Sora
OPENAI_API_KEY
- - 用于调用OpenAI Sora
OPENAI_API_KEY
Available Models
可用模型
Google Veo Models
Google Veo模型
| Model | Description | Best For |
|---|---|---|
| Highest quality (default) | Professional videos, dialogue, reference images |
| Faster processing | Quick iterations, batch generation |
Both models include:
- 720p/1080p resolution
- 4, 6, or 8 second duration
- Native audio (dialogue, SFX, ambient)
- Image-to-video (animate images)
- Reference images (up to 3)
- Video extension
- Batch/parallel generation
| 模型 | 描述 | 最佳适用场景 |
|---|---|---|
| 最高画质(默认) | 专业视频、对话生成、参考图片 |
| 处理速度更快 | 快速迭代、批量生成 |
两款模型均支持:
- 720p/1080p分辨率
- 4、6或8秒时长
- 原生音频(对话、音效、环境音)
- 图片转视频(图片动画化)
- 参考图片(最多3张)
- 视频续播
- 批量/并行生成
OpenAI Sora
OpenAI Sora
- Best for: Creative videos, cinematic quality, complex motion
- Resolutions: 480p, 720p, 1080p
- Durations: 5s, 10s, 15s, 20s
- Features: Text-to-video, image-to-video
- 最佳适用场景:创意视频、电影级画质、复杂运动效果
- 分辨率:480p、720p、1080p
- 时长:5秒、10秒、15秒、20秒
- 功能:文本转视频、图片转视频
Workflow
工作流程
Step 1: Gather Requirements (REQUIRED)
步骤1:收集需求(必填)
⚠️ Use interactive questioning — ask ONE question at a time.
⚠️ 使用交互式提问——一次只问一个问题。
Question Flow
提问流程
⚠️ Use the tool for each question below. Do not just print questions in your response — use the tool to create interactive prompts with the options shown.
AskUserQuestionQ1: Image
"I'll generate that video for you! First — do you have an image to animate?
- Yes (provide path — I'll use it as the first frame)
- No, generate from scratch"
Wait for response.
Q2: Audio
"What audio preference?
- With audio (default) — Veo 3.1 generates dialogue, SFX, ambient
- Silent video — no audio"
Wait for response.
Q3: Model
"Which model would you like?
— Latest, highest quality with audio (default)veo-3.1 — Faster processing with audioveo-3.1-fast /veo-3— Previous generation with audioveo-3-fast — OpenAI, up to 20 seconds, no audio"sora
Wait for response.
Q4: Duration
"What duration?
- 4 seconds
- 6 seconds
- 8 seconds (default)"
Wait for response.
Q5: Format
"What aspect ratio and resolution?
- 16:9 landscape, 720p
- 16:9 landscape, 1080p
- 9:16 portrait, 720p
- 9:16 portrait, 1080p
- Or specify"
Wait for response.
⚠️ 针对以下每个问题,使用工具。 不要直接在回复中打印问题——使用工具创建带有以下选项的交互式提示。
AskUserQuestion问题1:图片
"我将为你生成视频!首先——你是否有需要动画化的图片?
- 是(提供路径——我将把它作为第一帧)
- 否,从头开始生成"
等待回复。
问题2:音频
"音频偏好是什么?
- 带音频(默认)——Veo 3.1会生成对话、音效、环境音
- 静音视频——无音频"
等待回复。
问题3:模型
"你想使用哪个模型?
——最新版本,最高画质,支持音频(默认)veo-3.1 ——处理速度更快,支持音频veo-3.1-fast /veo-3——上一代模型,支持音频veo-3-fast ——OpenAI模型,最长20秒,无音频"sora
等待回复。
问题4:时长
"视频时长是多少?
- 4秒
- 6秒
- 8秒(默认)"
等待回复。
问题5:格式
"宽高比和分辨率是什么?
- 16:9横屏,720p
- 16:9横屏,1080p
- 9:16竖屏,720p
- 9:16竖屏,1080p
- 或自定义"
等待回复。
Quick Reference
快速参考
| Question | Determines |
|---|---|
| Image | Image-to-video vs text-to-video |
| Audio | With/without audio generation |
| Model | Quality and speed tradeoff |
| Duration | Clip length |
| Format | Aspect ratio and resolution |
| 问题 | 决定内容 |
|---|---|
| 图片 | 图片转视频还是文本转视频 |
| 音频 | 是否生成音频 |
| 模型 | 画质与速度的权衡 |
| 时长 | 视频片段长度 |
| 格式 | 宽高比和分辨率 |
Step 2: Craft the Prompt
步骤2:编写提示词
Transform the user request into an effective video prompt:
- Describe the scene: Set the visual context
- Specify action: What moves, changes, happens
- Include camera work: "slow pan", "tracking shot", "dolly shot"
- Add audio cues (Veo 3+): Use quotes for dialogue, describe sounds
- Set the mood: Lighting, atmosphere, time of day
Example with dialogue (Veo 3.1):
- User: "a person discovering treasure"
- Enhanced: "Close-up of a treasure hunter's face as torchlight flickers. He murmurs 'This must be it...' while brushing dust off an ancient chest. Sound of creaking hinges as he opens it, revealing golden light on his awestruck face. Cinematic, dramatic shadows."
Example without dialogue:
- User: "a dog running on a beach"
- Enhanced: "Cinematic slow-motion shot of a golden retriever running joyfully along a beach at sunset, waves lapping, warm golden hour lighting, shallow depth of field"
将用户需求转化为有效的视频提示词:
- 描述场景:设定视觉背景
- 指定动作:什么元素在移动、变化、发生
- 包含镜头运镜:“缓慢摇镜”、“跟拍镜头”、“推拉镜头”
- 添加音频提示(Veo 3+支持):用引号标注对话,描述声音
- 设定氛围:光线、环境、时间段
带对话的示例(Veo 3.1):
- 用户需求:“一个人发现宝藏”
- 优化后:“特写镜头对准寻宝者的脸,火把光线摇曳。他喃喃自语‘一定是这里...’,同时拂去古老箱子上的灰尘。打开箱子时传来铰链吱呀声,金色光芒照亮他惊叹的脸庞。电影质感,戏剧性阴影。”
不带对话的示例:
- 用户需求:“狗在沙滩上奔跑”
- 优化后:“电影级慢镜头,金毛猎犬在日落时分的沙滩上欢快奔跑,海浪轻拍,温暖的黄金时刻光线,浅景深效果”
Step 3: Select the Model
步骤3:选择模型
Default: veo-3.1 (highest quality, with audio)
| Use Case | Recommended Model | Reason |
|---|---|---|
| Best quality | veo-3.1 (default) | Highest quality, audio |
| Quick iteration | veo-3.1-fast | Faster processing |
| Batch generation | veo-3.1-fast | Speed matters for multiple clips |
| Longer videos (>8s) | sora | Supports up to 20s |
默认:veo-3.1(最高画质,支持音频)
| 使用场景 | 推荐模型 | 理由 |
|---|---|---|
| 最佳画质 | veo-3.1(默认) | 最高画质,支持音频 |
| 快速迭代 | veo-3.1-fast | 处理速度更快 |
| 批量生成 | veo-3.1-fast | 速度对多片段工作流至关重要 |
| 长视频(>8秒) | sora | 支持最长20秒 |
Step 4: Generate the Video
步骤4:生成视频
Execute the appropriate script from :
${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/For Google Veo 3.1 (default, with audio):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
--prompt "your enhanced prompt with 'dialogue in quotes'" \
--model "veo-3.1" \
--duration 8 \
--aspect-ratio "16:9" \
--resolution "720p"For Google Veo 3.1 with image input:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
--prompt "The cat slowly opens its eyes and yawns" \
--image "/path/to/cat.jpg" \
--model "veo-3.1" \
--duration 8For faster generation:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
--prompt "your prompt" \
--model "veo-3.1-fast"For OpenAI Sora (longer videos):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/sora.py \
--prompt "your enhanced prompt" \
--duration 20 \
--resolution "1080p"List available models:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py --list-models从执行相应脚本:
${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/针对Google Veo 3.1(默认,带音频):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
--prompt "你的优化提示词,包含带引号的'对话'" \
--model "veo-3.1" \
--duration 8 \
--aspect-ratio "16:9" \
--resolution "720p"针对带图片输入的Google Veo 3.1:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
--prompt "猫慢慢睁开眼睛并打哈欠" \
--image "/path/to/cat.jpg" \
--model "veo-3.1" \
--duration 8针对快速生成:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
--prompt "你的提示词" \
--model "veo-3.1-fast"针对OpenAI Sora(长视频):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/sora.py \
--prompt "你的优化提示词" \
--duration 20 \
--resolution "1080p"列出可用模型:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py --list-modelsVideo Extension (For Long-Form Continuity)
视频续播(用于长视频连贯性)
The flag creates TRUE visual continuity by continuing from where a previous Veo video ended. This is the best approach for long-form videos.
--extendBasic extension:
bash
undefined--extend基础续播:
bash
undefinedFirst, generate initial clip
首先,生成初始片段
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py
--prompt "A person walks through a forest at sunrise"
--duration 8
--prompt "A person walks through a forest at sunrise"
--duration 8
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py
--prompt "一个人在日出时穿过森林"
--duration 8
--prompt "一个人在日出时穿过森林"
--duration 8
Extend it with new content (adds ~7 seconds)
用新内容续播(新增约7秒)
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py
--extend veo_veo-3.1_20260104_120000.mp4
--prompt "Continue walking, discover a hidden stream"
--extend veo_veo-3.1_20260104_120000.mp4
--prompt "Continue walking, discover a hidden stream"
**Multiple extensions (for longer videos):**
```bashpython3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py
--extend veo_veo-3.1_20260104_120000.mp4
--prompt "继续行走,发现一条隐藏的溪流"
--extend veo_veo-3.1_20260104_120000.mp4
--prompt "继续行走,发现一条隐藏的溪流"
**多次续播(用于更长视频):**
```bashExtend 5 times (adds ~35 seconds of continuation)
续播5次(新增约35秒内容)
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py
--extend initial_clip.mp4
--prompt "Keep exploring the forest, encounter wildlife"
--extend-times 5
--extend initial_clip.mp4
--prompt "Keep exploring the forest, encounter wildlife"
--extend-times 5
**Extension vs Stitching:**
| Approach | Result | Use Case |
|----------|--------|----------|
| **Extension** | True continuity, same characters/scene | Long continuous shots |
| **Stitching** | Separate clips with transitions | Scene changes, montages |
**Extension Limits:**
- Input video must be Veo-generated (max 141 seconds)
- Each extension adds ~7 seconds
- Maximum 20 extensions total (~2.5 minutes)
- Output resolution is 720ppython3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py
--extend initial_clip.mp4
--prompt "继续探索森林,遇到野生动物"
--extend-times 5
--extend initial_clip.mp4
--prompt "继续探索森林,遇到野生动物"
--extend-times 5
**续播 vs 拼接:**
| 方法 | 结果 | 适用场景 |
|----------|--------|----------|
| **续播** | 真正的连贯性,相同角色/场景 | 长连续镜头 |
| **拼接** | 独立片段加转场 | 场景切换、蒙太奇 |
**续播限制:**
- 输入视频必须是Veo生成的(最长141秒)
- 每次续播新增约7秒
- 最多可续播20次(约2.5分钟)
- 输出分辨率为720pBatch Generation (Parallel)
批量生成(并行)
Generate multiple videos simultaneously for faster multi-scene workflows. Instead of waiting 15+ minutes for 5 sequential videos, generate them all in parallel (~3 minutes total).
同时生成多个视频,加快多场景工作流速度。无需等待15+分钟生成5个顺序视频,并行生成仅需约3分钟。
Create a scenes.json file:
创建scenes.json文件:
json
[
{"prompt": "Scene 1: Cinematic hero shot of wireless earbuds on dark surface", "duration": 6, "output": "scene1_hero.mp4"},
{"prompt": "Scene 2: Sound waves visualization, person enjoying music", "duration": 8, "output": "scene2_sound.mp4"},
{"prompt": "Scene 3: Close-up of earbud in ear, person exercising", "duration": 8, "output": "scene3_comfort.mp4"},
{"prompt": "Scene 4: Lifestyle montage, various settings", "duration": 8, "output": "scene4_lifestyle.mp4"},
{"prompt": "Scene 5: Product with logo on clean background", "duration": 4, "output": "scene5_cta.mp4"}
]json
[
{"prompt": "场景1:无线耳机在深色表面上的电影级主镜头", "duration": 6, "output": "scene1_hero.mp4"},
{"prompt": "场景2:声波可视化,人们享受音乐", "duration": 8, "output": "scene2_sound.mp4"},
{"prompt": "场景3:耳机入耳特写,人们运动", "duration": 8, "output": "scene3_comfort.mp4"},
{"prompt": "场景4:生活方式蒙太奇,多种场景", "duration": 8, "output": "scene4_lifestyle.mp4"},
{"prompt": "场景5:产品带logo在干净背景上", "duration": 4, "output": "scene5_cta.mp4"}
]Generate all scenes in parallel:
并行生成所有场景:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
--batch scenes.jsonbash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
--batch scenes.jsonWith custom worker count:
自定义工作进程数:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
--batch scenes.json \
--max-workers 3bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
--batch scenes.json \
--max-workers 3Batch config options per video:
每个视频的批量配置选项:
| Option | Description | Default |
|---|---|---|
| Video description (required) | - |
| veo-3.1, veo-3.1-fast, etc. | veo-3.1 |
| 4, 6, or 8 seconds | 8 |
| "16:9" or "9:16" | "16:9" |
| "720p" or "1080p" | "720p" |
| Path to image for image-to-video | - |
| What to avoid | - |
| Custom output filename | auto-generated |
| 选项 | 描述 | 默认值 |
|---|---|---|
| 视频描述(必填) | - |
| veo-3.1、veo-3.1-fast等 | veo-3.1 |
| 4、6或8秒 | 8 |
| "16:9"或"9:16" | "16:9" |
| "720p"或"1080p" | "720p" |
| 图片转视频的图片路径 | - |
| 要避免的内容 | - |
| 自定义输出文件名 | 自动生成 |
Speed comparison:
速度对比:
| Scenes | Sequential | Parallel (5 workers) | Speedup |
|---|---|---|---|
| 3 | ~9 min | ~3 min | 3x |
| 5 | ~15 min | ~3 min | 5x |
| 10 | ~30 min | ~6 min | 5x |
| 场景数量 | 顺序生成 | 并行生成(5个工作进程) | 提速倍数 |
|---|---|---|---|
| 3 | ~9分钟 | ~3分钟 | 3倍 |
| 5 | ~15分钟 | ~3分钟 | 5倍 |
| 10 | ~30分钟 | ~6分钟 | 5倍 |
Step 5: Deliver the Result
步骤5:交付结果
- Provide the generated video file/URL
- Share the enhanced prompt used
- Mention generation settings (duration, resolution)
- Offer to:
- Generate variations
- Try different style/duration
- Use a different API
- Extend the video
- 提供生成的视频文件/URL
- 分享使用的优化提示词
- 提及生成设置(时长、分辨率)
- 提供以下选项:
- 生成变体
- 尝试不同风格/时长
- 使用不同API
- 续播视频
Error Handling
错误处理
Missing API key: Inform the user which key is needed:
Content policy violation: Rephrase the prompt appropriately.
Generation failed: Retry with simplified prompt or different API.
Quota exceeded: Suggest waiting or trying the other provider.
缺少API密钥:告知用户需要哪个密钥:
内容政策违规:适当改写提示词。
生成失败:使用简化提示词或不同API重试。
配额用尽:建议等待或尝试其他提供商。
Prompt Engineering Tips
提示词工程技巧
For Audio (Veo 3.1)
针对音频(Veo 3.1)
- Dialogue: Use quotes for speech:
"Hello!" she said excitedly - Sound effects: Describe explicitly:
tires screeching, engine roaring - Ambient: Describe the soundscape:
birds chirping, distant traffic - Example:
A man whispers "Did you hear that?" as footsteps echo in the dark hallway
- 对话:用引号标注对话:
"你好!"她兴奋地说 - 音效:明确描述:
轮胎打滑声、引擎轰鸣声 - 环境音:描述声场:
鸟鸣声、远处的交通声 - 示例:
一个男人低语道"你听到了吗?",脚步声在黑暗的走廊里回荡
For Cinematic Quality
针对电影级画质
- Include camera directions: "slow dolly", "tracking shot", "crane shot"
- Specify lighting: "golden hour", "dramatic shadows", "soft diffused light"
- Add film references: "Blade Runner style", "Wes Anderson aesthetic"
- 包含镜头方向:“缓慢推拉”、“跟拍镜头”、“升降镜头”
- 指定光线:“黄金时刻”、“戏剧性阴影”、“柔和漫射光”
- 添加电影参考:“银翼杀手风格”、“韦斯·安德森美学”
For Realistic Motion
针对真实运动
- Describe physics: "natural movement", "realistic physics"
- Include environmental details: "wind in hair", "leaves rustling"
- Specify speed: "slow motion", "real-time", "time-lapse"
- 描述物理效果:“自然运动”、“真实物理效果”
- 包含环境细节:“风吹动头发”、“树叶沙沙作响”
- 指定速度:“慢动作”、“实时”、“延时摄影”
For Image-to-Video
针对图片转视频
- Describe what should change/move from the starting image
- Be specific about the action: "the cat slowly opens its eyes"
- Include environmental motion: "leaves blow past"
- 描述起始图片中应该变化/移动的元素
- 明确动作:“猫慢慢睁开眼睛”
- 包含环境运动:“树叶飘过”
Negative Prompts
负面提示词
- Describe what NOT to include:
--negative-prompt "cartoon, low quality, blurry" - Don't use "no" or "don't" - just describe the unwanted elements
- 描述不要包含的内容:
--negative-prompt "卡通风格、低画质、模糊" - 不要使用“不”或“不要”——直接描述不需要的元素
API Comparison
API对比
| Feature | Veo 3.1 (Default) | Veo 3.1 Fast | Sora |
|---|---|---|---|
| Provider | OpenAI | ||
| API Key | | | |
| Max duration | 8 seconds | 8 seconds | 20 seconds |
| Resolution | 720p, 1080p | 720p, 1080p | Up to 1080p |
| Aspect ratios | 16:9, 9:16 | 16:9, 9:16 | 16:9, 9:16, 1:1 |
| Audio (dialogue, SFX) | ✅ Yes | ✅ Yes | ❌ No |
| Image-to-video | ✅ Yes | ✅ Yes | ✅ Yes |
| Reference images | ✅ Up to 3 | ✅ Up to 3 | ❌ No |
| Video extension | ✅ Yes | ✅ Yes | ❌ No |
| Batch generation | ✅ Yes | ✅ Yes | ❌ No |
| Speed | Best quality | ~2x faster | Slower |
| Best for | Professional | Batch workflows | Longer videos |
| 特性 | Veo 3.1(默认) | Veo 3.1 Fast | Sora |
|---|---|---|---|
| 提供商 | OpenAI | ||
| API密钥 | | | |
| 最长时长 | 8秒 | 8秒 | 20秒 |
| 分辨率 | 720p、1080p | 720p、1080p | 最高1080p |
| 宽高比 | 16:9、9:16 | 16:9、9:16 | 16:9、9:16、1:1 |
| 音频(对话、音效) | ✅ 是 | ✅ 是 | ❌ 否 |
| 图片转视频 | ✅ 是 | ✅ 是 | ✅ 是 |
| 参考图片 | ✅ 最多3张 | ✅ 最多3张 | ❌ 否 |
| 视频续播 | ✅ 是 | ✅ 是 | ❌ 否 |
| 批量生成 | ✅ 是 | ✅ 是 | ❌ 否 |
| 速度 | 最佳画质 | 约2倍快 | 较慢 |
| 最佳适用场景 | 专业视频 | 批量工作流 | 长视频 |