comfyui-video-pipeline

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ComfyUI Video Pipeline

ComfyUI视频生成流水线

Orchestrates video generation across three engines, selecting the best one based on requirements and available resources.
可调度三个引擎完成视频生成,会根据需求和可用资源选择最合适的引擎。

Engine Selection

引擎选择

VIDEO REQUEST
    |
    |-- Need film-level quality?
    |   |-- Yes + 24GB+ VRAM → Wan 2.2 MoE 14B
    |   |-- Yes + 8GB VRAM → Wan 2.2 1.3B
    |
    |-- Need long video (>10 seconds)?
    |   |-- Yes → FramePack (60 seconds on 6GB)
    |
    |-- Need fast iteration?
    |   |-- Yes → AnimateDiff Lightning (4-8 steps)
    |
    |-- Need camera/motion control?
    |   |-- Yes → AnimateDiff V3 + Motion LoRAs
    |
    |-- Need first+last frame control?
    |   |-- Yes → Wan 2.2 MoE (exclusive feature)
    |
    |-- Default → Wan 2.2 (best general quality)
VIDEO REQUEST
    |
    |-- Need film-level quality?
    |   |-- Yes + 24GB+ VRAM → Wan 2.2 MoE 14B
    |   |-- Yes + 8GB VRAM → Wan 2.2 1.3B
    |
    |-- Need long video (>10 seconds)?
    |   |-- Yes → FramePack (60 seconds on 6GB)
    |
    |-- Need fast iteration?
    |   |-- Yes → AnimateDiff Lightning (4-8 steps)
    |
    |-- Need camera/motion control?
    |   |-- Yes → AnimateDiff V3 + Motion LoRAs
    |
    |-- Need first+last frame control?
    |   |-- Yes → Wan 2.2 MoE (exclusive feature)
    |
    |-- Default → Wan 2.2 (best general quality)

Pipeline 1: Wan 2.2 MoE (Highest Quality)

流水线1:Wan 2.2 MoE(最高画质)

Image-to-Video

图生视频

Prerequisites:
  • wan2.1_i2v_720p_14b_bf16.safetensors
    in
    models/diffusion_models/
  • umt5_xxl_fp8_e4m3fn_scaled.safetensors
    in
    models/clip/
  • open_clip_vit_h_14.safetensors
    in
    models/clip_vision/
  • wan_2.1_vae.safetensors
    in
    models/vae/
Settings:
ParameterValueNotes
Resolution1280x720 (landscape) or 720x1280 (portrait)Native training resolution
Frames81 (~5 seconds at 16fps)Multiples of 4 + 1
Steps30-50Higher = better quality
CFG5-7
Sampleruni_pcRecommended for Wan
Schedulernormal
Frame count guide:
DurationFrames (16fps)
1 second17
3 seconds49
5 seconds81
10 seconds161
VRAM optimization:
  • FP8 quantization: halves VRAM with minimal quality loss
  • SageAttention: faster attention computation
  • Reduce frames if OOM
前置要求:
  • wan2.1_i2v_720p_14b_bf16.safetensors
    放置在
    models/diffusion_models/
    目录下
  • umt5_xxl_fp8_e4m3fn_scaled.safetensors
    放置在
    models/clip/
    目录下
  • open_clip_vit_h_14.safetensors
    放置在
    models/clip_vision/
    目录下
  • wan_2.1_vae.safetensors
    放置在
    models/vae/
    目录下
参数设置:
参数取值备注
分辨率1280x720(横屏)或 720x1280(竖屏)原生训练分辨率
帧数81(16fps下约5秒)需符合4的倍数+1的规则
步数30-50数值越高画质越好
CFG5-7
采样器uni_pcWan官方推荐
调度器normal
帧数参考表:
时长帧数(16fps)
1秒17
3秒49
5秒81
10秒161
显存优化方案:
  • FP8量化:显存占用减半,画质损失极小
  • SageAttention:加快注意力计算速度
  • 出现OOM时减少生成帧数

Text-to-Video

文生视频

Same as I2V but uses
wan2.1_t2v_14b_bf16.safetensors
and
EmptySD3LatentImage
instead of image conditioning.
与图生视频配置一致,仅需替换为
wan2.1_t2v_14b_bf16.safetensors
模型,使用
EmptySD3LatentImage
替代图像条件输入即可。

First+Last Frame Control (Wan 2.2 Exclusive)

首尾帧控制(Wan 2.2独有特性)

Wan 2.2 MoE allows specifying both the first and last frame, enabling precise video planning:
  1. Generate two hero images with consistent character
  2. Use first as start frame, second as end frame
  3. Wan interpolates the motion between them
Wan 2.2 MoE支持同时指定首帧和尾帧,可实现精准的视频规划:
  1. 生成两张角色风格一致的主视觉图
  2. 分别设置为起始帧和结束帧
  3. Wan会自动补全两帧之间的动效过渡

Pipeline 2: FramePack (Long Videos, Low VRAM)

流水线2:FramePack(长视频、低显存方案)

Key Innovation

核心创新点

VRAM usage is invariant to video length - generates 60-second videos at 30fps on just 6GB VRAM.
How it works:
  • Dynamic context compression: 1536 markers for key frames, 192 for transitions
  • Bidirectional memory with reverse generation prevents drift
  • Frame-by-frame generation with context window
显存占用与视频时长无关,仅需6GB显存即可生成30fps、长达60秒的视频。
工作原理:
  • 动态上下文压缩:关键帧使用1536个标记,过渡帧使用192个标记
  • 带反向生成的双向记忆机制避免内容漂移
  • 基于上下文窗口逐帧生成

Settings

参数设置

ParameterValueNotes
Resolution640x384 to 1280x720Depends on VRAM
DurationUp to 60 secondsVRAM-invariant
QualityHigh (comparable to Wan)Uses same base models
参数取值备注
分辨率640x384 至 1280x720取决于显存大小
时长最长60秒显存占用不受时长影响
画质高(与Wan效果相当)基于相同的基础模型开发

When to Use

适用场景

  • Videos longer than 10 seconds
  • Limited VRAM systems (but RTX 5090 doesn't need this)
  • When VRAM is needed for parallel operations
  • Batch video generation
  • 生成10秒以上的长视频
  • 显存有限的设备(RTX 5090无需使用该方案)
  • 需要预留显存运行并行任务时
  • 批量生成视频

Pipeline 3: AnimateDiff V3 (Fast, Controllable)

流水线3:AnimateDiff V3(快速、可控方案)

Strengths

优势

  • Motion LoRAs for camera control (pan, zoom, tilt, roll)
  • Effect LoRAs (shatter, smoke, explosion, liquid)
  • Sliding context window for infinite length
  • Very fast with Lightning model (4-8 steps)
  • 支持Motion LoRAs实现镜头控制(平移、缩放、倾斜、旋转)
  • 支持特效LoRAs(破碎、烟雾、爆炸、液体)
  • 滑动上下文窗口支持无限时长生成
  • 搭配Lightning模型速度极快(仅需4-8步)

Settings

参数设置

ParameterValue (Standard)Value (Lightning)
Motion Module
v3_sd15_mm.ckpt
animatediff_lightning_4step.safetensors
Steps20-254-8
CFG7-81.5-2.0
Samplereuler_ancestrallcm
Resolution512x512512x512
Context Length1616
Context Overlap44
参数标准模式取值Lightning模式取值
动作模块
v3_sd15_mm.ckpt
animatediff_lightning_4step.safetensors
步数20-254-8
CFG7-81.5-2.0
采样器euler_ancestrallcm
分辨率512x512512x512
上下文长度1616
上下文重叠44

Camera Motion LoRAs

镜头控制Motion LoRAs

LoRAMotion
v2_lora_ZoomInCamera zooms in
v2_lora_ZoomOutCamera zooms out
v2_lora_PanLeftCamera pans left
v2_lora_PanRightCamera pans right
v2_lora_TiltUpCamera tilts up
v2_lora_TiltDownCamera tilts down
v2_lora_RollingClockwiseCamera rolls clockwise
LoRA动效
v2_lora_ZoomIn镜头拉近
v2_lora_ZoomOut镜头拉远
v2_lora_PanLeft镜头左移
v2_lora_PanRight镜头右移
v2_lora_TiltUp镜头上仰
v2_lora_TiltDown镜头下俯
v2_lora_RollingClockwise镜头顺时针旋转

Post-Processing Pipeline

后处理流水线

After any video generation:
所有视频生成完成后统一执行以下步骤:

1. Frame Interpolation (RIFE)

1. 帧插值(RIFE)

Doubles or quadruples frame count for smoother motion:
Input (16fps) → RIFE 2x → Output (32fps)
Input (16fps) → RIFE 4x → Output (64fps)
Use
rife47
or
rife49
model.
将帧数翻倍或翻四倍,实现更流畅的动效:
Input (16fps) → RIFE 2x → Output (32fps)
Input (16fps) → RIFE 4x → Output (64fps)
推荐使用
rife47
rife49
模型。

2. Face Enhancement (if character video)

2. 人脸增强(角色类视频适用)

Apply FaceDetailer to each frame:
  • denoise: 0.3-0.4 (lower than image - preserves temporal consistency)
  • guide_size: 384 (speed optimization for video)
  • detection_model: face_yolov8m.pt
对每帧应用FaceDetailer:
  • 去噪强度:0.3-0.4(低于图像生成的取值,保障时序一致性)
  • 引导尺寸:384(针对视频的速度优化)
  • 检测模型:face_yolov8m.pt

3. Deflicker (if needed)

3. 去闪烁(按需使用)

Reduces temporal inconsistencies between frames.
减少帧之间的时序不一致问题。

4. Color Correction

4. 色彩校正

Maintain consistent color grading across frames.
保障所有帧的色彩风格统一。

5. Video Combine

5. 视频合成

Final output via VHS Video Combine:
frame_rate: 16 (native) or 24/30 (after interpolation)
format: "video/h264-mp4"
crf: 19 (high quality) to 23 (smaller file)
通过VHS Video Combine输出最终文件:
frame_rate: 16(原生)或 24/30(插值后)
format: "video/h264-mp4"
crf: 19(高画质)到23(小文件)

Talking Head Pipeline

数字人说话头流水线

Complete pipeline for character dialogue:
1. Generate audio → comfyui-voice-pipeline
2. Generate base video → This skill (Wan I2V or AnimateDiff)
   - Prompt: "{character}, talking naturally, slight head movement"
   - Duration: match audio length
3. Apply lip-sync → Wav2Lip or LatentSync
4. Enhance faces → FaceDetailer + CodeFormer
5. Final output → video-assembly
角色对话类视频的完整流水线:
1. 生成音频 → comfyui-voice-pipeline
2. 生成基础视频 → 本工具(Wan I2V或AnimateDiff)
   - 提示词:"{character}, talking naturally, slight head movement"
   - 时长:与音频长度匹配
3. 对口型 → Wav2Lip或LatentSync
4. 人脸增强 → FaceDetailer + CodeFormer
5. 最终输出 → video-assembly

Quality Checklist

质量检查清单

Before marking video as complete:
  • Character identity consistent across frames
  • No flickering or temporal artifacts
  • Motion looks natural (not jerky or frozen)
  • Face enhancement applied if character video
  • Frame rate is smooth (24+ fps for delivery)
  • Audio synced (if talking head)
  • Resolution matches delivery target
标记视频完成前请确认:
  • 角色身份在所有帧中保持一致
  • 无闪烁或时序伪影
  • 动效自然(无卡顿或冻结)
  • 角色类视频已执行人脸增强
  • 帧率流畅(交付版本≥24fps)
  • 音频同步(说话头类视频适用)
  • 分辨率符合交付要求

Reference

参考资料

  • references/workflows.md
    - Workflow templates for Wan and AnimateDiff
  • references/models.md
    - Video model download links
  • references/research-2025.md
    - Latest video generation advances
  • state/inventory.json
    - Available video models
  • references/workflows.md
    - Wan和AnimateDiff的工作流模板
  • references/models.md
    - 视频模型下载链接
  • references/research-2025.md
    - 视频生成领域最新进展
  • state/inventory.json
    - 可用视频模型清单