comfyui-video-pipeline
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseComfyUI Video Pipeline
ComfyUI视频生成流水线
Orchestrates video generation across three engines, selecting the best one based on requirements and available resources.
可调度三个引擎完成视频生成,会根据需求和可用资源选择最合适的引擎。
Engine Selection
引擎选择
VIDEO REQUEST
|
|-- Need film-level quality?
| |-- Yes + 24GB+ VRAM → Wan 2.2 MoE 14B
| |-- Yes + 8GB VRAM → Wan 2.2 1.3B
|
|-- Need long video (>10 seconds)?
| |-- Yes → FramePack (60 seconds on 6GB)
|
|-- Need fast iteration?
| |-- Yes → AnimateDiff Lightning (4-8 steps)
|
|-- Need camera/motion control?
| |-- Yes → AnimateDiff V3 + Motion LoRAs
|
|-- Need first+last frame control?
| |-- Yes → Wan 2.2 MoE (exclusive feature)
|
|-- Default → Wan 2.2 (best general quality)VIDEO REQUEST
|
|-- Need film-level quality?
| |-- Yes + 24GB+ VRAM → Wan 2.2 MoE 14B
| |-- Yes + 8GB VRAM → Wan 2.2 1.3B
|
|-- Need long video (>10 seconds)?
| |-- Yes → FramePack (60 seconds on 6GB)
|
|-- Need fast iteration?
| |-- Yes → AnimateDiff Lightning (4-8 steps)
|
|-- Need camera/motion control?
| |-- Yes → AnimateDiff V3 + Motion LoRAs
|
|-- Need first+last frame control?
| |-- Yes → Wan 2.2 MoE (exclusive feature)
|
|-- Default → Wan 2.2 (best general quality)Pipeline 1: Wan 2.2 MoE (Highest Quality)
流水线1:Wan 2.2 MoE(最高画质)
Image-to-Video
图生视频
Prerequisites:
- in
wan2.1_i2v_720p_14b_bf16.safetensorsmodels/diffusion_models/ - in
umt5_xxl_fp8_e4m3fn_scaled.safetensorsmodels/clip/ - in
open_clip_vit_h_14.safetensorsmodels/clip_vision/ - in
wan_2.1_vae.safetensorsmodels/vae/
Settings:
| Parameter | Value | Notes |
|---|---|---|
| Resolution | 1280x720 (landscape) or 720x1280 (portrait) | Native training resolution |
| Frames | 81 (~5 seconds at 16fps) | Multiples of 4 + 1 |
| Steps | 30-50 | Higher = better quality |
| CFG | 5-7 | |
| Sampler | uni_pc | Recommended for Wan |
| Scheduler | normal |
Frame count guide:
| Duration | Frames (16fps) |
|---|---|
| 1 second | 17 |
| 3 seconds | 49 |
| 5 seconds | 81 |
| 10 seconds | 161 |
VRAM optimization:
- FP8 quantization: halves VRAM with minimal quality loss
- SageAttention: faster attention computation
- Reduce frames if OOM
前置要求:
- 放置在
wan2.1_i2v_720p_14b_bf16.safetensors目录下models/diffusion_models/ - 放置在
umt5_xxl_fp8_e4m3fn_scaled.safetensors目录下models/clip/ - 放置在
open_clip_vit_h_14.safetensors目录下models/clip_vision/ - 放置在
wan_2.1_vae.safetensors目录下models/vae/
参数设置:
| 参数 | 取值 | 备注 |
|---|---|---|
| 分辨率 | 1280x720(横屏)或 720x1280(竖屏) | 原生训练分辨率 |
| 帧数 | 81(16fps下约5秒) | 需符合4的倍数+1的规则 |
| 步数 | 30-50 | 数值越高画质越好 |
| CFG | 5-7 | |
| 采样器 | uni_pc | Wan官方推荐 |
| 调度器 | normal |
帧数参考表:
| 时长 | 帧数(16fps) |
|---|---|
| 1秒 | 17 |
| 3秒 | 49 |
| 5秒 | 81 |
| 10秒 | 161 |
显存优化方案:
- FP8量化:显存占用减半,画质损失极小
- SageAttention:加快注意力计算速度
- 出现OOM时减少生成帧数
Text-to-Video
文生视频
Same as I2V but uses and instead of image conditioning.
wan2.1_t2v_14b_bf16.safetensorsEmptySD3LatentImage与图生视频配置一致,仅需替换为模型,使用替代图像条件输入即可。
wan2.1_t2v_14b_bf16.safetensorsEmptySD3LatentImageFirst+Last Frame Control (Wan 2.2 Exclusive)
首尾帧控制(Wan 2.2独有特性)
Wan 2.2 MoE allows specifying both the first and last frame, enabling precise video planning:
- Generate two hero images with consistent character
- Use first as start frame, second as end frame
- Wan interpolates the motion between them
Wan 2.2 MoE支持同时指定首帧和尾帧,可实现精准的视频规划:
- 生成两张角色风格一致的主视觉图
- 分别设置为起始帧和结束帧
- Wan会自动补全两帧之间的动效过渡
Pipeline 2: FramePack (Long Videos, Low VRAM)
流水线2:FramePack(长视频、低显存方案)
Key Innovation
核心创新点
VRAM usage is invariant to video length - generates 60-second videos at 30fps on just 6GB VRAM.
How it works:
- Dynamic context compression: 1536 markers for key frames, 192 for transitions
- Bidirectional memory with reverse generation prevents drift
- Frame-by-frame generation with context window
显存占用与视频时长无关,仅需6GB显存即可生成30fps、长达60秒的视频。
工作原理:
- 动态上下文压缩:关键帧使用1536个标记,过渡帧使用192个标记
- 带反向生成的双向记忆机制避免内容漂移
- 基于上下文窗口逐帧生成
Settings
参数设置
| Parameter | Value | Notes |
|---|---|---|
| Resolution | 640x384 to 1280x720 | Depends on VRAM |
| Duration | Up to 60 seconds | VRAM-invariant |
| Quality | High (comparable to Wan) | Uses same base models |
| 参数 | 取值 | 备注 |
|---|---|---|
| 分辨率 | 640x384 至 1280x720 | 取决于显存大小 |
| 时长 | 最长60秒 | 显存占用不受时长影响 |
| 画质 | 高(与Wan效果相当) | 基于相同的基础模型开发 |
When to Use
适用场景
- Videos longer than 10 seconds
- Limited VRAM systems (but RTX 5090 doesn't need this)
- When VRAM is needed for parallel operations
- Batch video generation
- 生成10秒以上的长视频
- 显存有限的设备(RTX 5090无需使用该方案)
- 需要预留显存运行并行任务时
- 批量生成视频
Pipeline 3: AnimateDiff V3 (Fast, Controllable)
流水线3:AnimateDiff V3(快速、可控方案)
Strengths
优势
- Motion LoRAs for camera control (pan, zoom, tilt, roll)
- Effect LoRAs (shatter, smoke, explosion, liquid)
- Sliding context window for infinite length
- Very fast with Lightning model (4-8 steps)
- 支持Motion LoRAs实现镜头控制(平移、缩放、倾斜、旋转)
- 支持特效LoRAs(破碎、烟雾、爆炸、液体)
- 滑动上下文窗口支持无限时长生成
- 搭配Lightning模型速度极快(仅需4-8步)
Settings
参数设置
| Parameter | Value (Standard) | Value (Lightning) |
|---|---|---|
| Motion Module | | |
| Steps | 20-25 | 4-8 |
| CFG | 7-8 | 1.5-2.0 |
| Sampler | euler_ancestral | lcm |
| Resolution | 512x512 | 512x512 |
| Context Length | 16 | 16 |
| Context Overlap | 4 | 4 |
| 参数 | 标准模式取值 | Lightning模式取值 |
|---|---|---|
| 动作模块 | | |
| 步数 | 20-25 | 4-8 |
| CFG | 7-8 | 1.5-2.0 |
| 采样器 | euler_ancestral | lcm |
| 分辨率 | 512x512 | 512x512 |
| 上下文长度 | 16 | 16 |
| 上下文重叠 | 4 | 4 |
Camera Motion LoRAs
镜头控制Motion LoRAs
| LoRA | Motion |
|---|---|
| v2_lora_ZoomIn | Camera zooms in |
| v2_lora_ZoomOut | Camera zooms out |
| v2_lora_PanLeft | Camera pans left |
| v2_lora_PanRight | Camera pans right |
| v2_lora_TiltUp | Camera tilts up |
| v2_lora_TiltDown | Camera tilts down |
| v2_lora_RollingClockwise | Camera rolls clockwise |
| LoRA | 动效 |
|---|---|
| v2_lora_ZoomIn | 镜头拉近 |
| v2_lora_ZoomOut | 镜头拉远 |
| v2_lora_PanLeft | 镜头左移 |
| v2_lora_PanRight | 镜头右移 |
| v2_lora_TiltUp | 镜头上仰 |
| v2_lora_TiltDown | 镜头下俯 |
| v2_lora_RollingClockwise | 镜头顺时针旋转 |
Post-Processing Pipeline
后处理流水线
After any video generation:
所有视频生成完成后统一执行以下步骤:
1. Frame Interpolation (RIFE)
1. 帧插值(RIFE)
Doubles or quadruples frame count for smoother motion:
Input (16fps) → RIFE 2x → Output (32fps)
Input (16fps) → RIFE 4x → Output (64fps)Use or model.
rife47rife49将帧数翻倍或翻四倍,实现更流畅的动效:
Input (16fps) → RIFE 2x → Output (32fps)
Input (16fps) → RIFE 4x → Output (64fps)推荐使用或模型。
rife47rife492. Face Enhancement (if character video)
2. 人脸增强(角色类视频适用)
Apply FaceDetailer to each frame:
- denoise: 0.3-0.4 (lower than image - preserves temporal consistency)
- guide_size: 384 (speed optimization for video)
- detection_model: face_yolov8m.pt
对每帧应用FaceDetailer:
- 去噪强度:0.3-0.4(低于图像生成的取值,保障时序一致性)
- 引导尺寸:384(针对视频的速度优化)
- 检测模型:face_yolov8m.pt
3. Deflicker (if needed)
3. 去闪烁(按需使用)
Reduces temporal inconsistencies between frames.
减少帧之间的时序不一致问题。
4. Color Correction
4. 色彩校正
Maintain consistent color grading across frames.
保障所有帧的色彩风格统一。
5. Video Combine
5. 视频合成
Final output via VHS Video Combine:
frame_rate: 16 (native) or 24/30 (after interpolation)
format: "video/h264-mp4"
crf: 19 (high quality) to 23 (smaller file)通过VHS Video Combine输出最终文件:
frame_rate: 16(原生)或 24/30(插值后)
format: "video/h264-mp4"
crf: 19(高画质)到23(小文件)Talking Head Pipeline
数字人说话头流水线
Complete pipeline for character dialogue:
1. Generate audio → comfyui-voice-pipeline
2. Generate base video → This skill (Wan I2V or AnimateDiff)
- Prompt: "{character}, talking naturally, slight head movement"
- Duration: match audio length
3. Apply lip-sync → Wav2Lip or LatentSync
4. Enhance faces → FaceDetailer + CodeFormer
5. Final output → video-assembly角色对话类视频的完整流水线:
1. 生成音频 → comfyui-voice-pipeline
2. 生成基础视频 → 本工具(Wan I2V或AnimateDiff)
- 提示词:"{character}, talking naturally, slight head movement"
- 时长:与音频长度匹配
3. 对口型 → Wav2Lip或LatentSync
4. 人脸增强 → FaceDetailer + CodeFormer
5. 最终输出 → video-assemblyQuality Checklist
质量检查清单
Before marking video as complete:
- Character identity consistent across frames
- No flickering or temporal artifacts
- Motion looks natural (not jerky or frozen)
- Face enhancement applied if character video
- Frame rate is smooth (24+ fps for delivery)
- Audio synced (if talking head)
- Resolution matches delivery target
标记视频完成前请确认:
- 角色身份在所有帧中保持一致
- 无闪烁或时序伪影
- 动效自然(无卡顿或冻结)
- 角色类视频已执行人脸增强
- 帧率流畅(交付版本≥24fps)
- 音频同步(说话头类视频适用)
- 分辨率符合交付要求
Reference
参考资料
- - Workflow templates for Wan and AnimateDiff
references/workflows.md - - Video model download links
references/models.md - - Latest video generation advances
references/research-2025.md - - Available video models
state/inventory.json
- - Wan和AnimateDiff的工作流模板
references/workflows.md - - 视频模型下载链接
references/models.md - - 视频生成领域最新进展
references/research-2025.md - - 可用视频模型清单
state/inventory.json