video-producer-agent

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Video Producer

Video Producer Skill

Create complete videos with voiceover, music, and visuals.
This is an orchestrator skill that combines:
  • Script/storyboard generation (Claude)
  • Voiceover synthesis (Gemini TTS)
  • Background music (Lyria)
  • Video clip generation (Veo 3.1) or image animation
  • Final assembly (FFmpeg via media-utils)
创建带有旁白、音乐和视觉元素的完整视频。
这是一个协调型Skill,整合了以下能力:
  • 脚本/分镜生成(Claude)
  • 旁白合成(Gemini TTS)
  • 背景音乐制作(Lyria)
  • 视频片段生成(Veo 3.1)或图片动画
  • 最终视频组装(通过media-utils调用FFmpeg)

Workflow

工作流程

Step 1: Gather Requirements (REQUIRED)

步骤1:收集需求(必填)

⚠️ DO NOT skip this step. DO NOT run init_project.py until you have ALL answers.
Use interactive questioning — ask ONE question at a time, wait for the response, then ask the next. This creates a collaborative spec-driven process.
⚠️ 请勿跳过此步骤。在收集到所有答案前,请勿运行init_project.py。
采用交互式提问方式——每次只问一个问题,等待用户回复后再提出下一个。这是一个协作式的需求规范制定流程。

Question Flow

提问流程

⚠️ Use the
AskUserQuestion
tool for each question below.
Do not just print questions in your response — use the tool to create interactive prompts with the options shown.
Q1: Subject
"I'll create that video! First — what's it about?
(e.g., product launch, brand story, tutorial, explainer — or describe your own)"
Wait for response.
Q2: Duration
"How long should the video be?
  • 15 seconds (quick hook)
  • 30 seconds (standard ad)
  • 60 seconds (explainer)
  • 2+ minutes (detailed)
  • Or specify your own duration"
Wait for response.
Q3: Style
"What visual style?
  • Premium/luxury
  • Fun/playful
  • Corporate/professional
  • Dramatic/cinematic
  • Minimal/clean
  • Or describe your own style"
Wait for response.
Q4: Assets
"Do you have existing images or video clips to use?
  • No, generate everything
  • Yes, I have images (provide paths)
  • Yes, I have video clips (provide paths)"
Wait for response.
Q5: Audio Strategy
"How should we handle audio?
  • Custom — I generate voiceover + background music
  • Veo native — Use Veo's built-in dialogue/SFX/ambient
  • Silent — No audio, add later"
Wait for response.
Q6: Voice (if custom audio)
"What voice tone for the voiceover?
  • Professional
  • Friendly/warm
  • Energetic
  • Calm/soothing
  • Dramatic
  • Or describe your own tone"
Wait for response.
Q7: Music (if custom audio)
"What music vibe?
  • Modern electronic
  • Cinematic/epic
  • Upbeat pop
  • Ambient/chill
  • Corporate
  • Or describe your own style"
Wait for response.
Q8: Format
"What aspect ratio?
  • 16:9 (YouTube, web)
  • 9:16 (TikTok, Reels, Shorts)
  • 1:1 (Instagram feed)"
Wait for response.
Q9: Resolution
"What resolution?
  • 720p (faster generation)
  • 1080p (standard HD)"
Wait for response.
Q10: Model
"Which Veo model?
  • veo-3.1
    — Latest, highest quality (default)
  • veo-3.1-fast
    — Faster generation, slightly lower quality
  • veo-3
    — Previous generation
  • veo-3-fast
    — Previous gen, faster"
Wait for response.
⚠️ 每个问题请使用
AskUserQuestion
工具。请勿直接在回复中打印问题——使用工具创建带有以下选项的交互式提示。
Q1:视频主题
"我这就为您制作视频!首先——视频的主题是什么?
(例如:产品发布、品牌故事、教程、讲解视频——或者您自行描述)"
等待回复。
Q2:视频时长
"视频时长应为多少?
  • 15秒(快速吸引注意力)
  • 30秒(标准广告)
  • 60秒(讲解视频)
  • 2分钟以上(详细内容)
  • 或自行指定时长"
等待回复。
Q3:视觉风格
"视频的视觉风格是什么?
  • 高端/奢华
  • 趣味/活泼
  • 企业/专业
  • 戏剧/电影感
  • 极简/简洁
  • 或自行描述风格"
等待回复。
Q4:现有素材
"您是否有可使用的现有图片或视频片段?
  • 没有,全部生成新素材
  • 有,我有图片(提供路径)
  • 有,我有视频片段(提供路径)"
等待回复。
Q5:音频策略
"音频部分如何处理?
  • 自定义——生成旁白+背景音乐
  • Veo原生——使用Veo内置的对话/音效/环境音
  • 无音频——不添加音频,后续自行处理"
等待回复。
Q6:旁白语音(如果选择自定义音频)
"旁白的语音风格是什么?
  • 专业
  • 友好/温暖
  • 充满活力
  • 平静/舒缓
  • 戏剧感
  • 或自行描述风格"
等待回复。
Q7:音乐风格(如果选择自定义音频)
"背景音乐的风格是什么?
  • 现代电子
  • 电影感/宏大
  • 欢快流行
  • 氛围/舒缓
  • 企业商务
  • 或自行描述风格"
等待回复。
Q8:视频格式
"视频的画面比例是什么?
  • 16:9(YouTube、网页)
  • 9:16(TikTok、Reels、Shorts)
  • 1:1(Instagram动态)"
等待回复。
Q9:分辨率
"视频的分辨率是什么?
  • 720p(生成速度更快)
  • 1080p(标准高清)"
等待回复。
Q10:模型选择
"使用哪个Veo模型
  • veo-3.1
    ——最新版本,最高质量(默认)
  • veo-3.1-fast
    ——生成速度更快,质量略低
  • veo-3
    ——上一代版本
  • veo-3-fast
    ——上一代版本,速度更快"
等待回复。

Quick Reference

快速参考

QuestionDetermines
SubjectScene content and prompts
DurationScene count (Veo clips must be 4, 6, or 8 seconds)
StyleVisual prompts and music selection
AssetsGenerate vs use existing
Audiocustom, veo_audio, or silent
VoiceTTS voice selection
MusicLyria prompt
FormatAspect ratio for Veo
Resolution720p or 1080p output quality
Modelveo-3.1, veo-3.1-fast, veo-3, veo-3-fast

问题决定内容
主题场景内容及提示词
时长场景数量(Veo片段必须为4、6或8秒)
风格视觉提示词及音乐选择
素材生成新素材或使用现有素材
音频自定义、veo_audio或无音频
旁白TTS语音选择
音乐Lyria提示词
格式Veo的画面比例
分辨率输出质量为720p或1080p
模型veo-3.1、veo-3.1-fast、veo-3、veo-3-fast

Step 2: Initialize Project

步骤2:初始化项目

Once you have the user's answers, initialize the project with their preferences:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-producer-agent/scripts/init_project.py \
  --name "Product Launch Video" \
  --duration 30 \
  --aspect-ratio 16:9 \
  --audio-strategy custom \
  --scenes 5
收集到用户的所有答案后,根据其偏好初始化项目:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-producer-agent/scripts/init_project.py \\
  --name "Product Launch Video" \\
  --duration 30 \\
  --aspect-ratio 16:9 \\
  --audio-strategy custom \\
  --scenes 5

Step 3: Configure project.json

步骤3:配置project.json

Edit
project.json
with scene prompts, voiceover text, and music style based on user's answers.
根据用户的答案,编辑
project.json
文件,填写场景提示词、旁白文本和音乐风格。

Step 4: Assemble the Video

步骤4:组装视频

bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-producer-agent/scripts/assemble.py \
  --project ~/my_video_project/

bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-producer-agent/scripts/assemble.py \\
  --project ~/my_video_project/

Project Structure

项目结构

When you initialize a project, this folder structure is created:
my_project/
├── project.json          # Configuration: scenes, voiceover, music, settings
├── storyboard.md         # Planning document for the video
├── scenes/               # Generated video clips from Veo
│   ├── scene1_intro.mp4
│   ├── scene2_features.mp4
│   └── scene3_cta.mp4
├── audio/                # Audio assets
│   ├── voiceover.wav     # Generated voiceover
│   ├── background_music.wav  # Generated music
│   └── final_mix.mp3     # Mixed audio track
├── work/                 # Intermediate files (auto-generated)
│   ├── silent_scene1.mp4
│   ├── video_concatenated.mp4
│   └── ...
└── output/               # Final deliverables
    └── product_launch_video_final.mp4

初始化项目后,将创建以下文件夹结构:
my_project/
├── project.json          # 配置文件:场景、旁白、音乐、设置
├── storyboard.md         # 视频规划文档
├── scenes/               # Veo生成的视频片段
│   ├── scene1_intro.mp4
│   ├── scene2_features.mp4
│   └── scene3_cta.mp4
├── audio/                # 音频素材
│   ├── voiceover.wav     # 生成的旁白
│   ├── background_music.wav  # 生成的背景音乐
│   └── final_mix.mp3     # 混合后的音频轨道
├── work/                 # 中间文件(自动生成)
│   ├── silent_scene1.mp4
│   ├── video_concatenated.mp4
│   └── ...
└── output/               # 最终交付文件
    └── product_launch_video_final.mp4

Scripts

脚本说明

init_project.py

init_project.py

Initialize a new video project with folder structure and templates.
bash
undefined
初始化新的视频项目,创建文件夹结构和模板。
bash
undefined

Basic project

基础项目

python3 init_project.py --name "My Video" --duration 30
python3 init_project.py --name "My Video" --duration 30

Create in specific directory

在指定目录创建项目

python3 init_project.py --name "Demo Video" --output ~/Videos/
python3 init_project.py --name "Demo Video" --output ~/Videos/

Vertical video for social

适配社交媒体的竖屏视频

python3 init_project.py --name "Instagram Reel" --aspect-ratio 9:16 --duration 15
python3 init_project.py --name "Instagram Reel" --aspect-ratio 9:16 --duration 15

Use Veo's native audio (no custom voiceover/music)

使用Veo原生音频(不使用自定义旁白/音乐)

python3 init_project.py --name "Cinematic Scene" --audio-strategy veo_audio
python3 init_project.py --name "Cinematic Scene" --audio-strategy veo_audio

More scenes

更多场景

python3 init_project.py --name "Long Video" --duration 60 --scenes 5

**Options:**
| Option | Default | Description |
|--------|---------|-------------|
| `--name` | required | Project name |
| `--output` | current dir | Parent directory |
| `--duration` | 30 | Target duration in seconds |
| `--aspect-ratio` | 16:9 | 16:9, 9:16, 1:1, 4:3 |
| `--audio-strategy` | custom | custom, veo_audio, silent |
| `--scenes` | 3 | Number of scene placeholders |
python3 init_project.py --name "Long Video" --duration 60 --scenes 5

**选项说明:**
| 选项 | 默认值 | 说明 |
|------|--------|------|
| `--name` | 必填 | 项目名称 |
| `--output` | 当前目录 | 父目录 |
| `--duration` | 30 | 目标时长(秒) |
| `--aspect-ratio` | 16:9 | 画面比例:16:9、9:16、1:1、4:3 |
| `--audio-strategy` | custom | 音频策略:custom、veo_audio、silent |
| `--scenes` | 3 | 场景占位符数量 |

assemble.py

assemble.py

Orchestrate the full video assembly pipeline.
bash
undefined
协调完整的视频组装流程。
bash
undefined

Full pipeline (generate everything + assemble)

完整流程(生成所有内容并组装)

python3 assemble.py --project ~/my_project/
python3 assemble.py --project ~/my_project/

Skip generation (use existing scene/audio files)

跳过生成步骤(使用现有场景/音频文件)

python3 assemble.py --project ~/my_project/ --skip-generation
python3 assemble.py --project ~/my_project/ --skip-generation

Dry run (show what would be done)

试运行(展示将执行的操作)

python3 assemble.py --project ~/my_project/ --dry-run

**Pipeline steps:**
1. Generate video scenes (Veo 3.1)
2. Strip audio from scenes (if custom audio)
3. Generate voiceover (Gemini TTS)
4. Generate background music (Lyria)
5. Mix voiceover + music
6. Concatenate video clips
7. Merge audio with video
8. Output final video

---
python3 assemble.py --project ~/my_project/ --dry-run

**流程步骤:**
1. 生成视频场景(Veo 3.1)
2. 移除场景中的音频(如果使用自定义音频)
3. 生成旁白(Gemini TTS)
4. 生成背景音乐(Lyria)
5. 混合旁白与音乐
6. 拼接视频片段
7. 将音频与视频合并
8. 输出最终视频

---

project.json Configuration

project.json配置示例

json
{
  "name": "Product Launch Video",
  "duration_target": 30,
  "aspect_ratio": "16:9",
  "resolution": "720p",
  "audio_strategy": "custom",
  
  "scenes": [
    {
      "id": 1,
      "name": "scene1_hero",
      "prompt": "Cinematic slow zoom on premium product, dramatic lighting, high-end commercial style",
      "duration": 6,
      "notes": "Music only, no voiceover"
    },
    {
      "id": 2,
      "name": "scene2_features",
      "prompt": "Product features demonstration, sleek animations, modern tech aesthetic",
      "duration": 8,
      "notes": "Voiceover starts here"
    },
    {
      "id": 3,
      "name": "scene3_cta",
      "prompt": "Product with logo on clean background, call to action moment",
      "duration": 6,
      "notes": "Music swells, voiceover ends"
    }
  ],
  
  "voiceover": {
    "enabled": true,
    "text": "Introducing the future of audio. Crystal clear sound. All-day comfort. Experience the difference.",
    "voice": "Charon",
    "style": "Professional, confident, premium brand voice"
  },
  
  "music": {
    "enabled": true,
    "prompt": "modern electronic, premium, sleek, product showcase, subtle bass",
    "duration": 35,
    "bpm": 100,
    "brightness": 0.6
  },
  
  "assembly": {
    "transition": "fade",
    "transition_duration": 0.5,
    "music_volume": 0.3,
    "fade_in": 1.0,
    "fade_out": 2.0
  }
}

json
{
  "name": "Product Launch Video",
  "duration_target": 30,
  "aspect_ratio": "16:9",
  "resolution": "720p",
  "audio_strategy": "custom",
  
  "scenes": [
    {
      "id": 1,
      "name": "scene1_hero",
      "prompt": "Cinematic slow zoom on premium product, dramatic lighting, high-end commercial style",
      "duration": 6,
      "notes": "Music only, no voiceover"
    },
    {
      "id": 2,
      "name": "scene2_features",
      "prompt": "Product features demonstration, sleek animations, modern tech aesthetic",
      "duration": 8,
      "notes": "Voiceover starts here"
    },
    {
      "id": 3,
      "name": "scene3_cta",
      "prompt": "Product with logo on clean background, call to action moment",
      "duration": 6,
      "notes": "Music swells, voiceover ends"
    }
  ],
  
  "voiceover": {
    "enabled": true,
    "text": "Introducing the future of audio. Crystal clear sound. All-day comfort. Experience the difference.",
    "voice": "Charon",
    "style": "Professional, confident, premium brand voice"
  },
  
  "music": {
    "enabled": true,
    "prompt": "modern electronic, premium, sleek, product showcase, subtle bass",
    "duration": 35,
    "bpm": 100,
    "brightness": 0.6
  },
  
  "assembly": {
    "transition": "fade",
    "transition_duration": 0.5,
    "music_volume": 0.3,
    "fade_in": 1.0,
    "fade_out": 2.0
  }
}

Audio Strategies

音频策略说明

StrategyDescriptionUse When
custom
Strip Veo audio, add custom voiceover + musicMost videos
veo_audio
Keep Veo's generated audio (dialogue, SFX)Cinematic scenes, dialogues
silent
Strip audio, output silent videoAdding audio later

策略说明使用场景
custom
移除Veo生成的音频,添加自定义旁白和音乐大多数视频
veo_audio
保留Veo生成的音频(对话、音效)电影场景、对话视频
silent
移除音频,输出无音视频后续自行添加音频

Workflow: Creating a Video

制作视频的工作流程

Step 1: Initialize Project

步骤1:初始化项目

bash
python3 init_project.py --name "Wireless Earbuds Promo" --duration 30
bash
python3 init_project.py --name "Wireless Earbuds Promo" --duration 30

Step 2: Plan the Storyboard

步骤2:规划分镜

Edit
storyboard.md
to plan your video structure:
markdown
undefined
编辑
storyboard.md
来规划视频结构:
markdown
undefined

Scene 1: Hero Reveal (0-5s)

场景1:主视觉展示(0-5秒)

  • Visual: Earbuds emerging from shadow, premium lighting
  • Audio: Music only (dramatic intro)
  • 视觉:耳机从阴影中出现,高端灯光效果
  • 音频:仅音乐(戏剧性开场)

Scene 2: Sound Quality (5-12s)

场景2:音质展示(5-12秒)

  • Visual: Sound waves, person enjoying music
  • Audio: Voiceover: "Crystal clear sound. Immersive bass."
  • 视觉:声波动画,用户享受音乐的画面
  • 音频:旁白:"水晶般清晰的音质。沉浸式低音。"

Scene 3: Comfort (12-20s)

场景3:佩戴舒适度(12-20秒)

  • Visual: Close-up of fit, person running
  • Audio: Voiceover: "All-day comfort. Secure fit."
  • 视觉:佩戴特写,用户跑步的画面
  • 音频:旁白:"全天舒适佩戴。稳固贴合。"

Scene 4: CTA (20-30s)

场景4:行动号召(20-30秒)

  • Visual: Product + logo
  • Audio: Voiceover: "Experience the difference." + music swell
undefined
  • 视觉:产品+标志在简洁背景中
  • 音频:旁白:"体验与众不同的音质。" + 音乐渐强
undefined

Step 3: Configure project.json

步骤3:配置project.json

Fill in the scene prompts, voiceover text, and music style based on your storyboard.
根据分镜填写场景提示词、旁白文本和音乐风格。

Step 4: Assemble

步骤4:组装视频

bash
python3 assemble.py --project ~/wireless_earbuds_promo/
bash
python3 assemble.py --project ~/wireless_earbuds_promo/

Step 5: Review and Iterate

步骤5:审核与迭代

Check the output in
output/
. If adjustments needed:
bash
undefined
查看
output/
目录下的最终视频。如需调整:
bash
undefined

Re-run with existing scenes (just re-mix audio)

重新运行,使用现有场景(仅重新混合音频)

python3 assemble.py --project ~/wireless_earbuds_promo/ --skip-generation

---
python3 assemble.py --project ~/wireless_earbuds_promo/ --skip-generation

---

What You Can Create

可创建的视频类型

TypeExample
Product video30s hero video showcasing a product
Explainer videoHow-to or feature explanation
Promo/ad videoMarketing advertisement
Demo videoProduct demonstration
Training videoInternal training content
TestimonialCustomer quote style video
Brand videoCompany/brand story

类型示例
产品视频30秒主打产品展示视频
讲解视频操作指南或功能说明视频
推广/广告视频营销广告视频
演示视频产品演示视频
培训视频内部培训内容
客户见证视频客户语录风格视频
品牌视频公司/品牌故事视频

Prerequisites

前置条件

  • GOOGLE_API_KEY
    - For Veo (video), Gemini TTS (voice), Lyria (music)
  • FFmpeg installed:
    brew install ffmpeg

  • GOOGLE_API_KEY
    - 用于Veo(视频)、Gemini TTS(旁白)、Lyria(音乐)
  • 已安装FFmpeg:
    brew install ffmpeg

Video Styles & Music Pairings

视频风格与音乐搭配

StyleMusic PromptVoice
Premium/Luxury"elegant, minimal, ambient, sophisticated"Charon (informative)
Tech/Modern"electronic, futuristic, clean, innovative"Kore (firm)
Fun/Playful"upbeat, cheerful, acoustic, positive"Puck (upbeat)
Corporate"professional, inspiring, orchestral lite"Orus (firm)
Lifestyle"chill, aspirational, indie, warm"Aoede (breezy)
Dramatic/Cinematic"epic, orchestral, emotional, building"Gacrux (mature)

风格音乐提示词旁白语音
高端/奢华"elegant, minimal, ambient, sophisticated"Charon(信息性)
科技/现代"electronic, futuristic, clean, innovative"Kore(坚定)
趣味/活泼"upbeat, cheerful, acoustic, positive"Puck(活力)
企业商务"professional, inspiring, orchestral lite"Orus(坚定)
生活方式"chill, aspirational, indie, warm"Aoede(轻松)
戏剧/电影感"epic, orchestral, emotional, building"Gacrux(成熟)

Common Video Structures

常见视频结构

Product Video (30s)

产品视频(30秒)

0-5s:   Hero shot (music only)
5-15s:  Features (voiceover + music)
15-25s: Lifestyle/use case (voiceover + music)
25-30s: Logo + CTA (music fade)
0-5秒:主视觉镜头(仅音乐)
5-15秒:产品功能(旁白+音乐)
15-25秒:使用场景/生活方式(旁白+音乐)
25-30秒:标志+行动号召(音乐渐弱)

Explainer Video (60s)

讲解视频(60秒)

0-5s:   Hook/problem statement
5-20s:  Solution introduction
20-45s: How it works (3 steps)
45-55s: Benefits summary
55-60s: CTA
0-5秒:钩子/问题陈述
5-20秒:解决方案介绍
20-45秒:工作原理(3个步骤)
45-55秒:优势总结
55-60秒:行动号召

Testimonial Video (45s)

客户见证视频(45秒)

0-5s:   Intro/name card
5-35s:  Testimonial quote (multiple scenes)
35-45s: Product shot + logo

0-5秒:介绍/姓名卡片
5-35秒:客户见证语录(多场景)
35-45秒:产品画面+标志

Manual Workflow (Without Scripts)

手动工作流程(不使用脚本)

If you prefer to run each step manually:
Generate scenes:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
  --batch scenes.json
Generate voiceover:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
  --text "Your voiceover text..." \
  --voice Charon \
  --style "Professional, warm"
Generate music:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "modern electronic, premium" \
  --duration 35
Strip audio from clips:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/video_strip_audio.py \
  -i scene*.mp4
Concatenate videos:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/video_concat.py \
  -i silent_*.mp4 --transition fade -o video.mp4
Mix audio:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \
  --voice voiceover.wav --music music.wav -o audio.mp3
Merge audio + video:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/video_audio_merge.py \
  --video video.mp4 --audio audio.mp3 -o final.mp4

如果您偏好手动执行每个步骤:
生成场景:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \\
  --batch scenes.json
生成旁白:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \\
  --text "Your voiceover text..." \\
  --voice Charon \\
  --style "Professional, warm"
生成音乐:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \\
  --prompt "modern electronic, premium" \\
  --duration 35
移除视频片段中的音频:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/video_strip_audio.py \\
  -i scene*.mp4
拼接视频:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/video_concat.py \\
  -i silent_*.mp4 --transition fade -o video.mp4
混合音频:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \\
  --voice voiceover.wav --music music.wav -o audio.mp3
合并音频与视频:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/video_audio_merge.py \\
  --video video.mp4 --audio audio.mp3 -o final.mp4

Input Files You Can Provide

可提供的输入文件

File TypeHow It's Used
Product imagesAnimate with Veo as first frame
Logo (PNG)Overlay on final scene
Existing voiceoverPlace in
audio/voiceover.wav
Brand musicPlace in
audio/background_music.wav
Video clipsPlace in
scenes/
folder
Script/copyUse for voiceover text

文件类型使用方式
产品图片作为第一帧通过Veo制作动画
标志(PNG)叠加在最后一个场景
现有旁白放置在
audio/voiceover.wav
品牌音乐放置在
audio/background_music.wav
视频片段放置在
scenes/
文件夹
脚本/文案用作旁白文本

Limitations

限制条件

  • Veo video duration: Max 8 seconds per clip (concatenate for longer)
  • Veo 3.1 includes audio: All clips have generated audio (strip if using custom)
  • Processing time: Video generation takes 1-3 minutes per clip
  • Resolution: Currently 720p or 1080p (1080p for 8s only)

  • Veo视频时长:每个片段最长8秒(如需更长视频可拼接)
  • Veo 3.1包含音频:所有片段均带有生成的音频(如果使用自定义音频则需移除)
  • 处理时间:视频生成每个片段需要1-3分钟
  • 分辨率:目前仅支持720p或1080p(1080p仅支持8秒片段)

Error Handling

错误处理

ErrorSolution
"GOOGLE_API_KEY not set"Set up API key per README
"FFmpeg not found"Install:
brew install ffmpeg
"project.json not found"Run init_project.py first
Video generation timeoutRetry, or use shorter duration
Audio/video sync issuesAdjust scene durations

错误解决方案
"GOOGLE_API_KEY not set"按照README设置API密钥
"FFmpeg not found"安装FFmpeg:
brew install ffmpeg
"project.json not found"先运行init_project.py
视频生成超时重试,或使用更短的时长
音视频不同步调整场景时长

Example Prompts

示例提示词

Simple:
"Create a 30-second product video for my new coffee maker"
Detailed:
"Create a 45-second product video for our new wireless earbuds. Premium, luxury feel. I have product photos attached. Professional male voiceover. Modern electronic music. 16:9 for YouTube."
With project:
"Initialize a video project called 'App Demo' with 5 scenes, 60 seconds total, vertical format for TikTok"
简单提示:
"为我的新咖啡机创建一个30秒的产品视频"
详细提示:
"为我们的新款无线耳机创建一个45秒的产品视频。高端、奢华风格。我已附上产品照片。使用专业男性旁白。现代电子音乐。16:9比例适配YouTube。"
项目初始化提示:
"初始化一个名为'应用演示'的视频项目,包含5个场景,总时长60秒,竖屏格式适配TikTok" ",