comfyui-video-pipeline

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

ComfyUI Video Pipeline

ComfyUI视频生成流水线

Orchestrates video generation across three engines, selecting the best one based on requirements and available resources.

可调度三个引擎完成视频生成，会根据需求和可用资源选择最合适的引擎。

Engine Selection

引擎选择

VIDEO REQUEST
    |
    |-- Need film-level quality?
    |   |-- Yes + 24GB+ VRAM → Wan 2.2 MoE 14B
    |   |-- Yes + 8GB VRAM → Wan 2.2 1.3B
    |
    |-- Need long video (>10 seconds)?
    |   |-- Yes → FramePack (60 seconds on 6GB)
    |
    |-- Need fast iteration?
    |   |-- Yes → AnimateDiff Lightning (4-8 steps)
    |
    |-- Need camera/motion control?
    |   |-- Yes → AnimateDiff V3 + Motion LoRAs
    |
    |-- Need first+last frame control?
    |   |-- Yes → Wan 2.2 MoE (exclusive feature)
    |
    |-- Default → Wan 2.2 (best general quality)

VIDEO REQUEST
    |
    |-- Need film-level quality?
    |   |-- Yes + 24GB+ VRAM → Wan 2.2 MoE 14B
    |   |-- Yes + 8GB VRAM → Wan 2.2 1.3B
    |
    |-- Need long video (>10 seconds)?
    |   |-- Yes → FramePack (60 seconds on 6GB)
    |
    |-- Need fast iteration?
    |   |-- Yes → AnimateDiff Lightning (4-8 steps)
    |
    |-- Need camera/motion control?
    |   |-- Yes → AnimateDiff V3 + Motion LoRAs
    |
    |-- Need first+last frame control?
    |   |-- Yes → Wan 2.2 MoE (exclusive feature)
    |
    |-- Default → Wan 2.2 (best general quality)

Pipeline 1: Wan 2.2 MoE (Highest Quality)

流水线1：Wan 2.2 MoE（最高画质）

Image-to-Video

图生视频

Prerequisites:

wan2.1_i2v_720p_14b_bf16.safetensors

models/diffusion_models/

umt5_xxl_fp8_e4m3fn_scaled.safetensors

models/clip/

open_clip_vit_h_14.safetensors

models/clip_vision/

```
wan_2.1_vae.safetensors
```
in
```
models/vae/
```

Settings:

Parameter	Value	Notes
Resolution	1280x720 (landscape) or 720x1280 (portrait)	Native training resolution
Frames	81 (~5 seconds at 16fps)	Multiples of 4 + 1
Steps	30-50	Higher = better quality
CFG	5-7
Sampler	uni_pc	Recommended for Wan
Scheduler	normal

Frame count guide:

Duration	Frames (16fps)
1 second	17
3 seconds	49
5 seconds	81
10 seconds	161

VRAM optimization:

FP8 quantization: halves VRAM with minimal quality loss
SageAttention: faster attention computation
Reduce frames if OOM

前置要求：

wan2.1_i2v_720p_14b_bf16.safetensors

放置在

models/diffusion_models/

目录下

umt5_xxl_fp8_e4m3fn_scaled.safetensors

放置在

models/clip/

目录下

open_clip_vit_h_14.safetensors

放置在

models/clip_vision/

目录下

```
wan_2.1_vae.safetensors
```
放置在
```
models/vae/
```
目录下

参数设置：

参数	取值	备注
分辨率	1280x720（横屏）或 720x1280（竖屏）	原生训练分辨率
帧数	81（16fps下约5秒）	需符合4的倍数+1的规则
步数	30-50	数值越高画质越好
CFG	5-7
采样器	uni_pc	Wan官方推荐
调度器	normal

帧数参考表：

时长	帧数（16fps）
1秒	17
3秒	49
5秒	81
10秒	161

显存优化方案：

FP8量化：显存占用减半，画质损失极小
SageAttention：加快注意力计算速度
出现OOM时减少生成帧数

Text-to-Video

文生视频

Same as I2V but uses

wan2.1_t2v_14b_bf16.safetensors

and

EmptySD3LatentImage

instead of image conditioning.

与图生视频配置一致，仅需替换为

wan2.1_t2v_14b_bf16.safetensors

模型，使用

EmptySD3LatentImage

替代图像条件输入即可。

First+Last Frame Control (Wan 2.2 Exclusive)

首尾帧控制（Wan 2.2独有特性）

Wan 2.2 MoE allows specifying both the first and last frame, enabling precise video planning:

Generate two hero images with consistent character
Use first as start frame, second as end frame
Wan interpolates the motion between them

Wan 2.2 MoE支持同时指定首帧和尾帧，可实现精准的视频规划：

生成两张角色风格一致的主视觉图
分别设置为起始帧和结束帧
Wan会自动补全两帧之间的动效过渡

Pipeline 2: FramePack (Long Videos, Low VRAM)

流水线2：FramePack（长视频、低显存方案）

Key Innovation

核心创新点

VRAM usage is invariant to video length - generates 60-second videos at 30fps on just 6GB VRAM.

How it works:

Dynamic context compression: 1536 markers for key frames, 192 for transitions
Bidirectional memory with reverse generation prevents drift
Frame-by-frame generation with context window

显存占用与视频时长无关，仅需6GB显存即可生成30fps、长达60秒的视频。

工作原理：

动态上下文压缩：关键帧使用1536个标记，过渡帧使用192个标记
带反向生成的双向记忆机制避免内容漂移
基于上下文窗口逐帧生成

Settings

参数设置

Parameter	Value	Notes
Resolution	640x384 to 1280x720	Depends on VRAM
Duration	Up to 60 seconds	VRAM-invariant
Quality	High (comparable to Wan)	Uses same base models

参数	取值	备注
分辨率	640x384 至 1280x720	取决于显存大小
时长	最长60秒	显存占用不受时长影响
画质	高（与Wan效果相当）	基于相同的基础模型开发

When to Use

适用场景

Videos longer than 10 seconds
Limited VRAM systems (but RTX 5090 doesn't need this)
When VRAM is needed for parallel operations
Batch video generation

生成10秒以上的长视频
显存有限的设备（RTX 5090无需使用该方案）
需要预留显存运行并行任务时
批量生成视频

Pipeline 3: AnimateDiff V3 (Fast, Controllable)

流水线3：AnimateDiff V3（快速、可控方案）

Strengths

优势

Motion LoRAs for camera control (pan, zoom, tilt, roll)
Effect LoRAs (shatter, smoke, explosion, liquid)
Sliding context window for infinite length
Very fast with Lightning model (4-8 steps)

支持Motion LoRAs实现镜头控制（平移、缩放、倾斜、旋转）
支持特效LoRAs（破碎、烟雾、爆炸、液体）
滑动上下文窗口支持无限时长生成
搭配Lightning模型速度极快（仅需4-8步）

Settings

参数设置

Parameter	Value (Standard)	Value (Lightning)
Motion Module	`v3_sd15_mm.ckpt`	`animatediff_lightning_4step.safetensors`
Steps	20-25	4-8
CFG	7-8	1.5-2.0
Sampler	euler_ancestral	lcm
Resolution	512x512	512x512
Context Length	16	16
Context Overlap	4	4

参数	标准模式取值	Lightning模式取值
动作模块	`v3_sd15_mm.ckpt`	`animatediff_lightning_4step.safetensors`
步数	20-25	4-8
CFG	7-8	1.5-2.0
采样器	euler_ancestral	lcm
分辨率	512x512	512x512
上下文长度	16	16
上下文重叠	4	4

Camera Motion LoRAs

镜头控制Motion LoRAs

LoRA	Motion
v2_lora_ZoomIn	Camera zooms in
v2_lora_ZoomOut	Camera zooms out
v2_lora_PanLeft	Camera pans left
v2_lora_PanRight	Camera pans right
v2_lora_TiltUp	Camera tilts up
v2_lora_TiltDown	Camera tilts down
v2_lora_RollingClockwise	Camera rolls clockwise

LoRA	动效
v2_lora_ZoomIn	镜头拉近
v2_lora_ZoomOut	镜头拉远
v2_lora_PanLeft	镜头左移
v2_lora_PanRight	镜头右移
v2_lora_TiltUp	镜头上仰
v2_lora_TiltDown	镜头下俯
v2_lora_RollingClockwise	镜头顺时针旋转

Post-Processing Pipeline

后处理流水线

After any video generation:

所有视频生成完成后统一执行以下步骤：

1. Frame Interpolation (RIFE)

1. 帧插值（RIFE）

Doubles or quadruples frame count for smoother motion:

Input (16fps) → RIFE 2x → Output (32fps)
Input (16fps) → RIFE 4x → Output (64fps)

Use

rife47

rife49

model.

将帧数翻倍或翻四倍，实现更流畅的动效：

Input (16fps) → RIFE 2x → Output (32fps)
Input (16fps) → RIFE 4x → Output (64fps)

推荐使用

rife47

或

rife49

模型。

2. Face Enhancement (if character video)

2. 人脸增强（角色类视频适用）

Apply FaceDetailer to each frame:

denoise: 0.3-0.4 (lower than image - preserves temporal consistency)
guide_size: 384 (speed optimization for video)
detection_model: face_yolov8m.pt

对每帧应用FaceDetailer：

去噪强度：0.3-0.4（低于图像生成的取值，保障时序一致性）
引导尺寸：384（针对视频的速度优化）
检测模型：face_yolov8m.pt

3. Deflicker (if needed)

3. 去闪烁（按需使用）

Reduces temporal inconsistencies between frames.

减少帧之间的时序不一致问题。

4. Color Correction

4. 色彩校正

Maintain consistent color grading across frames.

保障所有帧的色彩风格统一。

5. Video Combine

5. 视频合成

Final output via VHS Video Combine:

frame_rate: 16 (native) or 24/30 (after interpolation)
format: "video/h264-mp4"
crf: 19 (high quality) to 23 (smaller file)

通过VHS Video Combine输出最终文件：

frame_rate: 16（原生）或 24/30（插值后）
format: "video/h264-mp4"
crf: 19（高画质）到23（小文件）

Talking Head Pipeline

数字人说话头流水线

Complete pipeline for character dialogue:

1. Generate audio → comfyui-voice-pipeline
2. Generate base video → This skill (Wan I2V or AnimateDiff)
   - Prompt: "{character}, talking naturally, slight head movement"
   - Duration: match audio length
3. Apply lip-sync → Wav2Lip or LatentSync
4. Enhance faces → FaceDetailer + CodeFormer
5. Final output → video-assembly

角色对话类视频的完整流水线：

1. 生成音频 → comfyui-voice-pipeline
2. 生成基础视频 → 本工具（Wan I2V或AnimateDiff）
   - 提示词："{character}, talking naturally, slight head movement"
   - 时长：与音频长度匹配
3. 对口型 → Wav2Lip或LatentSync
4. 人脸增强 → FaceDetailer + CodeFormer
5. 最终输出 → video-assembly

comfyui-video-pipeline

Original

Translation

ComfyUI Video Pipeline

ComfyUI视频生成流水线

Engine Selection

引擎选择

Pipeline 1: Wan 2.2 MoE (Highest Quality)

流水线1：Wan 2.2 MoE（最高画质）

Image-to-Video

图生视频

Text-to-Video

文生视频

First+Last Frame Control (Wan 2.2 Exclusive)

首尾帧控制（Wan 2.2独有特性）

Pipeline 2: FramePack (Long Videos, Low VRAM)

流水线2：FramePack（长视频、低显存方案）

Key Innovation

核心创新点

Settings

参数设置

When to Use

适用场景

Pipeline 3: AnimateDiff V3 (Fast, Controllable)

流水线3：AnimateDiff V3（快速、可控方案）

Strengths

优势

Settings

参数设置

Camera Motion LoRAs

镜头控制Motion LoRAs

Post-Processing Pipeline

后处理流水线

1. Frame Interpolation (RIFE)

1. 帧插值（RIFE）

2. Face Enhancement (if character video)

2. 人脸增强（角色类视频适用）

3. Deflicker (if needed)

3. 去闪烁（按需使用）

4. Color Correction

4. 色彩校正

5. Video Combine

5. 视频合成

Talking Head Pipeline

数字人说话头流水线

Quality Checklist

质量检查清单

Reference

参考资料