video-generation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVideo Generation
视频生成
Generate videos using with . The system auto-selects the best backend based on available API keys.
generate_mediamode="video"使用并设置来生成视频。系统会根据可用的API密钥自动选择最优后端。
generate_mediamode="video"Quick Start
快速开始
python
undefinedpython
undefinedSimple text-to-video (auto-selects backend)
简单文本转视频(自动选择后端)
generate_media(prompt="A robot walking through a city", mode="video")
generate_media(prompt="A robot walking through a city", mode="video")
Specify backend and duration
指定后端和时长
generate_media(prompt="Ocean waves crashing on rocks", mode="video",
backend_type="google", duration=8)
generate_media(prompt="Ocean waves crashing on rocks", mode="video",
backend_type="google", duration=8)
With aspect ratio
指定宽高比
generate_media(prompt="A timelapse of clouds", mode="video",
backend_type="grok", aspect_ratio="16:9", duration=10)
undefinedgenerate_media(prompt="A timelapse of clouds", mode="video",
backend_type="grok", aspect_ratio="16:9", duration=10)
undefinedBackend Comparison
后端对比
| Backend | Default Model | Duration Range | Default Duration | Resolutions | API Key |
|---|---|---|---|---|---|
| Grok (priority 1) | | 1-15s | 5s | 480p, 720p | |
| Google Veo (priority 2) | | 4-8s | 8s | 720p, 1080p, 4K (use | |
| OpenAI Sora (priority 3) | | 4, 8, or 12s (discrete) | 4s | Standard | |
| 后端 | 默认模型 | 时长范围 | 默认时长 | 分辨率 | API密钥 |
|---|---|---|---|---|---|
| Grok(优先级1) | | 1-15秒 | 5秒 | 480p、720p | |
| Google Veo(优先级2) | | 4-8秒 | 8秒 | 720p、1080p、4K(使用 | |
| OpenAI Sora(优先级3) | | 4、8或12秒(离散值) | 4秒 | 标准分辨率 | |
Key Parameters
核心参数
| Parameter | Description | Example |
|---|---|---|
| Text description of the video | |
| Force a specific backend | |
| Override default model | |
| Video length in seconds | |
| Video aspect ratio | |
| Resolution (Grok: 480p/720p; Veo: 720p/1080p/4k) | |
| Source image for image-to-video | |
| Style/content guide images (Veo, up to 3) | |
| What to exclude (Veo) | |
| 参数 | 描述 | 示例 |
|---|---|---|
| 视频的文本描述 | |
| 强制使用指定后端 | |
| 覆盖默认模型 | |
| 视频时长(单位:秒) | |
| 视频宽高比 | |
| 分辨率(Grok支持480p/720p;Veo支持720p/1080p/4k) | |
| 图生视频的源图像 | |
| 风格/内容参考图像(仅Veo支持,最多3张) | |
| 要排除的内容(仅Veo支持) | |
Duration Handling
时长处理
Each backend has different duration constraints. automatically clamps the requested duration:
generate_media- Grok: Continuous range 1-15s (clamped to bounds)
- Google Veo: Continuous range 4-8s (clamped to bounds), defaults to 16:9 aspect ratio
- OpenAI Sora: Discrete values only (4, 8, or 12s) - snaps to nearest valid value
A warning is logged if duration is adjusted.
每个后端的时长约束不同,会自动裁剪请求的时长:
generate_media- Grok:支持1-15秒连续时长(自动裁剪到边界值)
- Google Veo:支持4-8秒连续时长(自动裁剪到边界值),默认宽高比为16:9
- OpenAI Sora:仅支持4、8、12秒离散值——自动对齐到最近的有效值
如果时长被调整,系统会记录一条警告日志。
Image-to-Video
图生视频
All three video backends support starting video from an existing image via :
input_imagespython
generate_media(
prompt="Animate this scene with gentle movement",
mode="video",
input_images=["scene.jpg"],
duration=5
)The first image in is used; additional images are ignored.
input_images三个视频后端都支持通过参数基于现有图像生成视频:
input_imagespython
generate_media(
prompt="Animate this scene with gentle movement",
mode="video",
input_images=["scene.jpg"],
duration=5
)系统仅使用中的第一张图像,多余图像会被忽略。
input_imagesGeneration Time
生成耗时
Video generation is significantly slower than images. All backends use polling:
- Grok: SDK handles polling internally (up to 10 min timeout)
- Google Veo: Custom polling every 20s (up to 10 min)
- OpenAI Sora: Custom polling every 2s
视频生成速度远慢于图像生成,所有后端都采用轮询机制获取结果:
- Grok:SDK内部处理轮询(最长10分钟超时)
- Google Veo:每20秒自定义轮询一次(最长10分钟超时)
- OpenAI Sora:每2秒自定义轮询一次
Veo 3.1: Native Audio
Veo 3.1:原生音频生成
Veo 3.1 generates audio (dialogue, SFX, ambient) automatically from prompt content. No extra parameter needed — just describe the sounds:
- Dialogue: Use quotation marks in prompt ()
"Hello," she said. - Sound effects: Describe sounds ()
tires screeching, engine roaring - Ambient: Describe atmosphere ()
eerie hum resonates through the hallway
Veo 3.1会根据提示词内容自动生成音频(对话、音效、环境音),无需额外参数,只需在提示词中描述声音即可:
- 对话:在提示词中使用引号包裹内容()
"Hello," she said. - 音效:描述声音内容()
tires screeching, engine roaring - 环境音:描述氛围()
eerie hum resonates through the hallway
Veo 3.1: Extension Constraints
Veo 3.1:续剪约束
When extending videos via with a ID:
continue_fromveo_vid_*- Resolution is forced to 720p (API requirement for extensions)
- Only 16:9 and 9:16 aspect ratios are supported
- Each extension adds up to 7 seconds (API limit: 20 extensions, ~141s total)
- Generated videos are retained for 2 days before expiry
当通过参数和格式的ID续剪视频时:
continue_fromveo_vid_*- 分辨率强制为720p(API续剪要求)
- 仅支持16:9和9:16两种宽高比
- 每次续剪最多增加7秒(API限制:最多续剪20次,总时长约141秒)
- 生成的视频仅保留2天,到期自动失效
Producing Longer Videos
制作长视频
Current APIs cap at 15 seconds max per clip (Grok), with most backends at 4-8s. There is no way to generate a continuous 30+ second video in one call. The proven approach:
- Plan a shot list — break your video into 6-8s segments with specific camera language per shot
- Generate clips in parallel — launch all segments concurrently using
background=True - Composite in Remotion (see below) — layer programmatic animation on top of generated footage
- Bridge with audio — a unified narration or music track smooths over visual cuts between clips
For visual continuity, use the same style anchor in every prompt (e.g., "BBC Earth documentary cinematography") and maintain consistent lighting/color descriptions.
Full production guide with examples, transition types, and duration strategy: See references/production.md
目前API单段生成最长支持15秒(Grok),大部分后端仅支持4-8秒,无法单次调用生成30秒以上的连续视频。经过验证的解决方案如下:
- 规划镜头列表——将视频拆分为6-8秒的片段,为每个镜头指定明确的镜头语言
- 并行生成片段——使用参数同时启动所有片段的生成任务
background=True - 在Remotion中合成(见下文)——在生成的素材上叠加程序化动画
- 用音频衔接片段——统一的旁白或音轨可以平滑片段之间的视觉剪切
为了保证视觉连贯性,所有提示词中要使用相同的风格锚点(例如"BBC地球纪录片摄影风格"),并保持一致的光影/色彩描述。
包含示例、转场类型和时长策略的完整制作指南:参考references/production.md
Hybrid Workflow: AI Footage + Remotion Animation
混合工作流:AI素材 + Remotion动画
The best results come from combining AI-generated footage with Remotion's programmatic animation — not choosing one or the other.
AI video generation produces photorealistic, cinematic footage that pure programmatic rendering cannot match. Remotion produces precise typography, motion graphics, overlays, and transitions that AI generation cannot reliably control. Use both together.
最佳效果来自AI生成素材和Remotion程序化动画的结合,而非二选一。
AI视频生成可以产出纯程序化渲染无法实现的写实、电影级素材;Remotion可以实现AI生成无法可靠控制的精确排版、动效、叠加层和转场。建议两者结合使用。
The Rule: Generate First, Composite Second
原则:先生成,后合成
- Generate AI clips for cinematic/photorealistic shots (environments, product demos, atmospheric footage)
- Use those clips as visual foundations in Remotion — import them as or
<Video>background layers<OffthreadVideo> - Composite programmatic elements on top — typography, motion graphics, logos, data overlays, transitions, captions
- Fill gaps with pure Remotion animation — title cards, intro sequences, motion-graphics-only segments where AI footage isn't needed
- 生成AI片段用于电影级/写实镜头(环境、产品演示、氛围素材)
- 将这些片段作为视觉基础导入Remotion——作为或
<Video>背景层<OffthreadVideo> - 在顶部叠加程序化元素——排版、动效、Logo、数据叠加层、转场、字幕
- 用纯Remotion动画填充空白——片头卡、介绍序列、不需要AI素材的纯动效片段
Do NOT Discard Generated Clips
不要丢弃生成的片段
Every AI-generated clip costs real money and time. Do not abandon generated footage and fall back to purely programmatic rendering. This is a common failure mode — agents generate clips, notice minor artifacts (e.g., repeated patterns, slight distortion), then pivot entirely to OpenCV/PIL/moviepy rendering, wasting all the generation budget.
Instead:
| Situation | Wrong Approach | Right Approach |
|---|---|---|
| Minor artifacts in generated clip | Discard clip, render from scratch with OpenCV | Use clip as background, mask artifacts with overlays/motion graphics |
| Generated clip doesn't match vision exactly | Regenerate or abandon | Composite typography/effects on top to guide the viewer's attention |
| Need precise text/logo placement | Skip AI generation, use pure programmatic | Generate atmospheric footage, overlay text in Remotion |
| Some shots need AI footage, others don't | Use one approach for everything | Mix: AI-backed shots + pure Remotion animation shots |
**每段AI生成的片段都消耗实际的成本和时间,不要放弃生成的素材转而使用纯程序化渲染。**这是常见的失败模式:Agent生成片段后发现轻微瑕疵(例如重复图案、轻微变形),就完全转向OpenCV/PIL/moviepy渲染,浪费了所有生成预算。
正确的处理方式:
| 场景 | 错误做法 | 正确做法 |
|---|---|---|
| 生成的片段有轻微瑕疵 | 丢弃片段,用OpenCV从零开始渲染 | 保留片段作为背景,用叠加层/动效遮盖瑕疵 |
| 生成的片段不完全符合预期 | 重新生成或直接放弃 | 在顶部叠加排版/效果引导观众注意力 |
| 需要精确的文字/Logo位置 | 跳过AI生成,使用纯程序化实现 | 生成氛围素材,在Remotion中叠加文字 |
| 部分镜头需要AI素材,部分不需要 | 全流程只用一种方案 | 混合使用:AI生成镜头 + 纯Remotion动画镜头 |
Cost Awareness
成本意识
Each call is expensive. Plan before generating:
generate_media(mode="video")- Decide which shots need AI footage before generating anything — not every shot needs it
- Generate only what you'll use — don't speculatively generate 8 clips hoping some work out
- Review and use what you generate — analyze each clip with , then plan your Remotion composition around actual footage
read_media - One good clip composited well beats five unused clips — invest in composition quality over generation quantity
每次调用成本很高,生成前要做好规划:
generate_media(mode="video")- 生成前确定哪些镜头需要AI素材——不是所有镜头都需要AI生成
- 只生成你会用到的内容——不要投机性生成8个片段指望其中几个能用
- 审核并利用生成的所有内容——用分析每个片段,基于实际素材规划Remotion合成方案
read_media - 一段合成效果好的优质片段胜过五段未使用的片段——投入资源提升合成质量,而非追求生成数量
Post-Production: Always Use Remotion
后期制作:始终使用Remotion
Remotion is the default post-production tool for any video that needs editing beyond simple concatenation. This includes captions, titles, transitions, overlays, motion graphics — essentially any video intended to look professional. Do not use raw ffmpeg or manual filter chains for these tasks; the results look amateur compared to what Remotion produces.
drawtextWhen you have video clips to assemble, load the Remotion skill and use it. This is not optional for professional output.
**对于任何需要简单拼接之外编辑操作的视频,Remotion是默认的后期制作工具。**包括字幕、标题、转场、叠加层、动效——基本上所有需要专业效果的视频都适用。不要使用原生ffmpeg的或手动滤镜链实现这些功能,和Remotion的产出相比效果非常业余。
drawtext**当你需要组装视频片段时,加载Remotion skill并使用它。**对于专业输出这是必须的要求。
Loading the Remotion Skill
加载Remotion Skill
Load the skill to get detailed rules and code examples:
- Local path (if installed via quickstart):
.agent/skills/remotion/SKILL.md - Remote repo (if not installed): https://github.com/remotion-dev/skills
加载skill可以获取详细规则和代码示例:
- 本地路径(如果通过快速入门安装):
.agent/skills/remotion/SKILL.md - 远程仓库(如果未安装):https://github.com/remotion-dev/skills
What Remotion Gives You
Remotion的能力
| Capability | Remotion | Raw ffmpeg |
|---|---|---|
| Styled animated captions | CSS-styled, word-level highlighting, animations | |
| Title cards / lower thirds | React components, any font/layout | Manual positioning, limited fonts |
| Scene transitions | Timing curves, spring animations, custom effects | Basic xfade (fade, wipe) |
| Motion graphics | Full React/CSS/Three.js/Lottie ecosystem | Not possible |
| Light leak / overlay effects | Built-in | Complex filter chains |
| Text animations | Typography effects, per-character animation | Not feasible |
| AI footage + overlays | Import clips as | Not feasible at quality |
| 功能 | Remotion | 原生ffmpeg |
|---|---|---|
| 样式化动画字幕 | CSS样式、单词级别高亮、动画效果 | |
| 标题卡/字幕条 | React组件,支持任意字体/布局 | 手动定位,字体支持有限 |
| 场景转场 | 时间曲线、弹簧动画、自定义效果 | 基础淡入淡出(溶解、擦除) |
| 动效设计 | 完整的React/CSS/Three.js/Lottie生态 | 无法实现 |
| 漏光/叠加效果 | 内置 | 复杂滤镜链 |
| 文字动画 | 排版效果、逐字动画 | 不可行 |
| AI素材+叠加层 | 将片段导入为 | 无法实现同等质量 |
When ffmpeg Alone Is Sufficient
仅用ffmpeg足够的场景
Only use ffmpeg without Remotion for:
- Concatenating clips with no captions, titles, or transitions (just hard cuts)
- Audio mixing / ducking (ffmpeg or Pydub)
- Color grading via LUT files (filter)
lut3d - Quick format conversion or rescaling
只有以下场景可以不用Remotion,单独使用ffmpeg:
- 无字幕、标题、转场的片段拼接(仅硬切)
- 音频混音/闪避(ffmpeg或Pydub)
- 通过LUT文件调色(滤镜)
lut3d - 快速格式转换或分辨率调整
Workflow
工作流
- Generate AI clips with (parallel, background mode) — for shots that need cinematic/photorealistic quality
generate_media - Review clips with — assess what you have, plan composition around actual footage
read_media - Generate audio (narration, music) with
generate_media(mode="audio") - Load the Remotion skill and set up a Remotion project
- Composite in Remotion: import AI clips as background layers, overlay typography/motion graphics/captions, add pure-animation segments for title cards and transitions
<Video> - Render via Remotion's headless renderer
- 用生成AI片段(并行后台模式)——用于需要电影级/写实质量的镜头
generate_media - 用审核片段——评估现有素材,基于实际内容规划合成方案
read_media - 用生成音频(旁白、音乐)
generate_media(mode="audio") - 加载Remotion skill并创建Remotion项目
- 在Remotion中合成:将AI片段导入为背景层,叠加排版/动效/字幕,添加纯动画片段作为标题卡和转场
<Video> - 通过Remotion的无头渲染器导出视频
Key Remotion Rule Files to Load
需要加载的核心Remotion规则文件
When working on a specific task, load the relevant rule files from the Remotion skill:
- Captions/subtitles: ,
rules/subtitles.md,rules/display-captions.mdrules/transcribe-captions.md - Transitions:
rules/transitions.md - Text animations:
rules/text-animations.md - Light leaks:
rules/light-leaks.md - Audio: ,
rules/audio.mdrules/audio-visualization.md - Sequencing/timeline: ,
rules/sequencing.mdrules/trimming.md - 3D motion graphics:
rules/3d.md - Animations/timing: ,
rules/animations.mdrules/timing.md
处理特定任务时,从Remotion skill中加载对应的规则文件:
- 字幕/副标题:、
rules/subtitles.md、rules/display-captions.mdrules/transcribe-captions.md - 转场:
rules/transitions.md - 文字动画:
rules/text-animations.md - 漏光效果:
rules/light-leaks.md - 音频:、
rules/audio.mdrules/audio-visualization.md - 序列/时间线:、
rules/sequencing.mdrules/trimming.md - 3D动效:
rules/3d.md - 动画/时序:、
rules/animations.mdrules/timing.md
Need More Control?
需要更多控制?
- Per-backend resolution, duration details, and quirks: See references/backends.md
- Video continuation, remix, and image-to-video: See references/editing.md
- Multi-shot production, transitions, and cinematic workflow: See references/production.md
- 各后端的分辨率、时长细节和特殊限制:参考references/backends.md
- 视频续剪、混剪和图生视频:参考references/editing.md
- 多镜头制作、转场和电影级工作流:参考references/production.md