alayarenderer-generative-world

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

AlayaRenderer — Generative World Renderer

AlayaRenderer — 生成式世界渲染器

Skill by ara.so — Daily 2026 Skills collection.
AlayaRenderer is a two-stage framework for high-quality video rendering:
  1. Inverse Renderer (RGB → G-buffers): Extracts albedo, normal, depth, roughness, and metallic maps from RGB video using a fine-tuned Cosmos-Transfer1-DiffusionRenderer 7B model.
  2. Game Editing (G-buffers + Text → Stylized RGB): Synthesizes photorealistic, stylized RGB video from G-buffer inputs using a fine-tuned Wan2.1 1.3B model via DiffSynth-Studio.

ara.so提供的技能——属于2026年度技能合集。
AlayaRenderer是用于高质量视频渲染的两阶段框架:
  1. 逆向渲染器(RGB → G-buffers):使用微调的Cosmos-Transfer1-DiffusionRenderer 7B模型,从RGB视频中提取反照率、法线、深度、粗糙度和金属度贴图。
  2. 游戏编辑(G-buffers + 文本 → 风格化RGB):通过DiffSynth-Studio使用微调的Wan2.1 1.3B模型,从G-buffer输入合成逼真的风格化RGB视频。

Installation

安装

Clone the Repository

克隆仓库

bash
git clone --recurse-submodules https://github.com/ShandaAI/AlayaRenderer.git
cd AlayaRenderer
Important: Use
--recurse-submodules
— DiffSynth-Studio is a git submodule required for Game Editing.
bash
git clone --recurse-submodules https://github.com/ShandaAI/AlayaRenderer.git
cd AlayaRenderer
重要提示: 请使用
--recurse-submodules
参数——DiffSynth-Studio是游戏编辑功能所需的Git子模块。

Two Separate Conda Environments (Recommended)

两个独立Conda环境(推荐)

The two models have conflicting dependencies. Use separate environments:
bash
undefined
两个模型的依赖存在冲突,建议使用独立环境:
bash
undefined

Environment 1: Inverse Renderer

环境1:逆向渲染器

conda create -n inverse_renderer python=3.10 -y conda activate inverse_renderer cd inverse_renderer
conda create -n inverse_renderer python=3.10 -y conda activate inverse_renderer cd inverse_renderer

Follow inverse_renderer/ instructions for Cosmos-Transfer1 setup

按照inverse_renderer/目录下的说明完成Cosmos-Transfer1的配置

Environment 2: Game Editing

环境2:游戏编辑

conda create -n game_editing python=3.10 -y conda activate game_editing cd game_editing
conda create -n game_editing python=3.10 -y conda activate game_editing cd game_editing

Follow DiffSynth-Studio setup instructions

按照DiffSynth-Studio的配置说明完成设置


---

---

Model Weights

模型权重

ModelBase ModelSizeHuggingFace Link
Inverse RendererCosmos-Transfer1-DiffusionRenderer 7B~7B paramsBrian9999/world_inverse_renderer
Game EditingWan2.1 1.3B~1.3B paramsBrian9999/stylerenderer
模型基础模型大小HuggingFace链接
逆向渲染器Cosmos-Transfer1-DiffusionRenderer 7B~7B参数Brian9999/world_inverse_renderer
游戏编辑Wan2.1 1.3B~1.3B参数Brian9999/stylerenderer

Download and Place Weights

下载并放置权重

bash
undefined
bash
undefined

Inverse Renderer — replace the base checkpoint

逆向渲染器 — 替换基础检查点

huggingface-cli download Brian9999/world_inverse_renderer
--local-dir inverse_renderer/checkpoints/Diffusion_Renderer_Inverse_Cosmos_7B
huggingface-cli download Brian9999/world_inverse_renderer
--local-dir inverse_renderer/checkpoints/Diffusion_Renderer_Inverse_Cosmos_7B

Game Editing — place in game_editing models directory

游戏编辑 — 放置到game_editing的models目录下

mkdir -p game_editing/models/train/Wan2.1-T2V-1.3B_gbuffer huggingface-cli download Brian9999/stylerenderer
--local-dir game_editing/models/train/Wan2.1-T2V-1.3B_gbuffer

---
mkdir -p game_editing/models/train/Wan2.1-T2V-1.3B_gbuffer huggingface-cli download Brian9999/stylerenderer
--local-dir game_editing/models/train/Wan2.1-T2V-1.3B_gbuffer

---

Inverse Renderer Usage

逆向渲染器使用说明

The inverse renderer decomposes an RGB video into 5 G-buffer channels: albedo, normal, depth, roughness, metallic.
逆向渲染器可将RGB视频分解为5个G-buffer通道:反照率、法线、深度、粗糙度、金属度

Setup

环境配置

bash
cd inverse_renderer
bash
cd inverse_renderer

Follow Cosmos-Transfer1-DiffusionRenderer environment setup

按照Cosmos-Transfer1-DiffusionRenderer的环境说明完成配置

Ensure checkpoint is at:

确保检查点位于如下路径:

inverse_renderer/checkpoints/Diffusion_Renderer_Inverse_Cosmos_7B/

inverse_renderer/checkpoints/Diffusion_Renderer_Inverse_Cosmos_7B/

undefined
undefined

Inference

推理

Refer to the
inverse_renderer/
subdirectory for the full inference script. The general pattern follows Cosmos-Transfer1-DiffusionRenderer conventions:
python
undefined
完整推理脚本请参考
inverse_renderer/
子目录,通用使用方式遵循Cosmos-Transfer1-DiffusionRenderer的约定:
python
undefined

inverse_renderer/run_inverse.py (typical pattern)

inverse_renderer/run_inverse.py(典型使用示例)

import torch from pathlib import Path
import torch from pathlib import Path

Input: path to RGB video

输入:RGB视频路径

input_video = "path/to/rgb_video.mp4" output_dir = "outputs/gbuffers/"
input_video = "path/to/rgb_video.mp4" output_dir = "outputs/gbuffers/"

The model outputs 5 synchronized channels:

模型输出5个同步的通道:

- albedo (diffuse color)

- albedo(漫反射颜色)

- normal (surface orientation)

- normal(表面朝向)

- depth (scene geometry)

- depth(场景几何结构)

- roughness (surface roughness)

- roughness(表面粗糙度)

- metallic (metallic property)

- metallic(金属属性)


---

---

Game Editing Usage

游戏编辑使用说明

Quick Start — CLI Inference

快速开始 — CLI推理

bash
cd game_editing

CUDA_VISIBLE_DEVICES=0 python \
    examples/wanvideo/model_inference/inference_gbuffer_caption.py \
    --checkpoint models/train/Wan2.1-T2V-1.3B_gbuffer/model.safetensors \
    --gpu 0 \
    --style snowy_winter \
    --prompt "the scene is set in a frozen, snow-covered environment under cold, pale winter light with falling snowflakes, creating a silent and ethereal winter wonderland atmosphere." \
    --gbuffer_dir test_dataset \
    --save_dir outputs/ \
    --num_frames 81 \
    --height 480 \
    --width 832
bash
cd game_editing

CUDA_VISIBLE_DEVICES=0 python \
    examples/wanvideo/model_inference/inference_gbuffer_caption.py \
    --checkpoint models/train/Wan2.1-T2V-1.3B_gbuffer/model.safetensors \
    --gpu 0 \
    --style snowy_winter \
    --prompt "the scene is set in a frozen, snow-covered environment under cold, pale winter light with falling snowflakes, creating a silent and ethereal winter wonderland atmosphere." \
    --gbuffer_dir test_dataset \
    --save_dir outputs/ \
    --num_frames 81 \
    --height 480 \
    --width 832

CLI Parameters

CLI参数说明

ParameterDescriptionExample
--checkpoint
Path to fine-tuned
.safetensors
weights
models/train/Wan2.1-T2V-1.3B_gbuffer/model.safetensors
--gpu
GPU device index
0
--style
Named style preset
snowy_winter
,
rainy
,
night
,
sunset
--prompt
Text description of target lighting/atmosphereSee examples below
--gbuffer_dir
Directory containing G-buffer input frames/video
test_dataset
--save_dir
Output directory for rendered video
outputs/
--num_frames
Number of frames to generate (must be
8n+1
)
81
--height
Output height in pixels
480
--width
Output width in pixels
832
参数描述示例
--checkpoint
微调后的
.safetensors
权重路径
models/train/Wan2.1-T2V-1.3B_gbuffer/model.safetensors
--gpu
GPU设备索引
0
--style
命名风格预设
snowy_winter
,
rainy
,
night
,
sunset
--prompt
目标光照/氛围的文本描述参考下方示例
--gbuffer_dir
存放G-buffer输入帧/视频的目录
test_dataset
--save_dir
渲染视频的输出目录
outputs/
--num_frames
生成的帧数(必须为
8n+1
格式)
81
--height
输出视频的高度(像素)
480
--width
输出视频的宽度(像素)
832

G-buffer Directory Structure

G-buffer目录结构

test_dataset/
├── albedo/
│   ├── frame_0000.png
│   ├── frame_0001.png
│   └── ...
├── normal/
│   ├── frame_0000.png
│   └── ...
├── depth/
│   ├── frame_0000.png
│   └── ...
├── roughness/
│   ├── frame_0000.png
│   └── ...
└── metallic/
    ├── frame_0000.png
    └── ...
test_dataset/
├── albedo/
│   ├── frame_0000.png
│   ├── frame_0001.png
│   └── ...
├── normal/
│   ├── frame_0000.png
│   └── ...
├── depth/
│   ├── frame_0000.png
│   └── ...
├── roughness/
│   ├── frame_0000.png
│   └── ...
└── metallic/
    ├── frame_0000.png
    └── ...

Style Prompt Examples

风格Prompt示例

bash
undefined
bash
undefined

Cyberpunk night scene

赛博朋克夜景

--style night
--prompt "neon-lit urban environment at night with rain-slicked streets reflecting colorful neon signs, creating a cyberpunk noir atmosphere"
--style night
--prompt "neon-lit urban environment at night with rain-slicked streets reflecting colorful neon signs, creating a cyberpunk noir atmosphere"

Golden hour / sunset

黄金时刻/日落

--style sunset
--prompt "warm golden hour lighting with long shadows and a glowing amber sky, soft cinematic atmosphere"
--style sunset
--prompt "warm golden hour lighting with long shadows and a glowing amber sky, soft cinematic atmosphere"

Rainy urban

雨天城市

--style rainy
--prompt "overcast rainy day with wet surfaces, soft diffuse lighting, and atmospheric fog creating a moody cinematic look"
--style rainy
--prompt "overcast rainy day with wet surfaces, soft diffuse lighting, and atmospheric fog creating a moody cinematic look"

Fantasy / stylized

奇幻/风格化

--style fantasy
--prompt "magical forest environment with bioluminescent plants, ethereal blue-green lighting, and mystical particle effects"
--style fantasy
--prompt "magical forest environment with bioluminescent plants, ethereal blue-green lighting, and mystical particle effects"

Foggy morning

雾天清晨

--style foggy
--prompt "early morning dense fog with soft diffused light creating a mysterious and quiet atmosphere"
undefined
--style foggy
--prompt "early morning dense fog with soft diffused light creating a mysterious and quiet atmosphere"
undefined

Multi-GPU Inference

多GPU推理

bash
undefined
bash
undefined

Run on specific GPU

在指定GPU上运行

CUDA_VISIBLE_DEVICES=1 python
examples/wanvideo/model_inference/inference_gbuffer_caption.py
--checkpoint models/train/Wan2.1-T2V-1.3B_gbuffer/model.safetensors
--gpu 1
--style rainy
--prompt "heavy rainfall with dark storm clouds and dramatic lightning in the distance"
--gbuffer_dir my_gbuffers
--save_dir outputs/rainy_scene
--num_frames 81 --height 480 --width 832

---
CUDA_VISIBLE_DEVICES=1 python
examples/wanvideo/model_inference/inference_gbuffer_caption.py
--checkpoint models/train/Wan2.1-T2V-1.3B_gbuffer/model.safetensors
--gpu 1
--style rainy
--prompt "heavy rainfall with dark storm clouds and dramatic lightning in the distance"
--gbuffer_dir my_gbuffers
--save_dir outputs/rainy_scene
--num_frames 81 --height 480 --width 832

---

Full Pipeline: RGB Video → Stylized Output

完整流程:RGB视频 → 风格化输出

bash
undefined
bash
undefined

Step 1: Extract G-buffers from RGB video (Inverse Renderer env)

步骤1:从RGB视频中提取G-buffers(在逆向渲染器环境中执行)

conda activate inverse_renderer cd inverse_renderer python run_inverse.py
--input path/to/gameplay_video.mp4
--output_dir ../game_editing/test_dataset/
conda activate inverse_renderer cd inverse_renderer python run_inverse.py
--input path/to/gameplay_video.mp4
--output_dir ../game_editing/test_dataset/

Step 2: Apply game editing style (Game Editing env)

步骤2:应用游戏编辑风格(在游戏编辑环境中执行)

conda activate game_editing cd ../game_editing CUDA_VISIBLE_DEVICES=0 python
examples/wanvideo/model_inference/inference_gbuffer_caption.py
--checkpoint models/train/Wan2.1-T2V-1.3B_gbuffer/model.safetensors
--gpu 0
--style snowy_winter
--prompt "frozen tundra with blizzard conditions, pale blue-white lighting and drifting snow"
--gbuffer_dir test_dataset
--save_dir outputs/final_render
--num_frames 81 --height 480 --width 832

---
conda activate game_editing cd ../game_editing CUDA_VISIBLE_DEVICES=0 python
examples/wanvideo/model_inference/inference_gbuffer_caption.py
--checkpoint models/train/Wan2.1-T2V-1.3B_gbuffer/model.safetensors
--gpu 0
--style snowy_winter
--prompt "frozen tundra with blizzard conditions, pale blue-white lighting and drifting snow"
--gbuffer_dir test_dataset
--save_dir outputs/final_render
--num_frames 81 --height 480 --width 832

---

Online Demos

在线演示

Dataset Overview

数据集概览

The AlayaRenderer dataset (release pending) features:
  • 4M+ frames at 720p / 30 FPS
  • 6 synchronized channels: RGB + albedo, normal, depth, metallic, roughness
  • 40 hours from Cyberpunk 2077 and Black Myth: Wukong
  • Average clip length: 8 minutes, up to 53 minutes continuous
  • Weather variants: sunny, rainy, foggy, night, sunset
  • Motion blur variant via sub-frame interpolation

AlayaRenderer数据集(即将发布)包含:
  • 400万+帧,分辨率720p / 30 FPS
  • 6个同步通道:RGB + 反照率、法线、深度、金属度、粗糙度
  • 40小时素材来自《赛博朋克2077》和《黑神话:悟空》
  • 平均片段长度:8分钟,最长连续片段可达53分钟
  • 天气变体:晴天、雨天、雾天、夜晚、日落
  • 通过子帧插值生成的运动模糊变体

Architecture Summary

架构总结

RGB Video Input
┌─────────────────────────────────────┐
│  Inverse Renderer                   │
│  (Cosmos-Transfer1 7B fine-tuned)   │
│  RGB → [albedo, normal, depth,      │
│          roughness, metallic]       │
└─────────────────┬───────────────────┘
                  │  G-buffers
┌─────────────────────────────────────┐
│  Game Editing                       │
│  (Wan2.1 1.3B fine-tuned)           │
│  G-buffers + Text Prompt            │
│  → Stylized RGB Video               │
└─────────────────────────────────────┘

RGB Video Input
┌─────────────────────────────────────┐
│  逆向渲染器                          │
│  (微调的Cosmos-Transfer1 7B)       │
│  RGB → [反照率, 法线, 深度,          │
│          粗糙度, 金属度]             │
└─────────────────┬───────────────────┘
                  │  G-buffers
┌─────────────────────────────────────┐
│  游戏编辑                            │
│  (微调的Wan2.1 1.3B)               │
│  G-buffers + 文本Prompt              │
│  → 风格化RGB视频                     │
└─────────────────────────────────────┘

Troubleshooting

问题排查

Submodule not found / DiffSynth-Studio missing

找不到子模块 / 缺失DiffSynth-Studio

bash
undefined
bash
undefined

If cloned without --recurse-submodules:

如果克隆时没有加--recurse-submodules参数:

git submodule update --init --recursive
undefined
git submodule update --init --recursive
undefined

CUDA Out of Memory

CUDA显存不足

  • Reduce
    --num_frames
    (try
    41
    instead of
    81
    )
  • Reduce resolution:
    --height 320 --width 576
  • Ensure no other processes are using the GPU:
    CUDA_VISIBLE_DEVICES=0
  • 减小
    --num_frames
    (尝试用
    41
    代替
    81
  • 降低分辨率:
    --height 320 --width 576
  • 确保没有其他进程占用GPU:指定
    CUDA_VISIBLE_DEVICES=0

num_frames
must follow
8n+1
pattern

num_frames
必须遵循
8n+1
格式

Valid values:
9, 17, 25, 33, 41, 49, 57, 65, 73, 81
bash
undefined
有效值:
9, 17, 25, 33, 41, 49, 57, 65, 73, 81
bash
undefined

Valid

有效

--num_frames 81 # 810 + 1 ✓ --num_frames 41 # 85 + 1 ✓
--num_frames 81 # 810 + 1 ✓ --num_frames 41 # 85 + 1 ✓

Invalid

无效

--num_frames 80 # ✗ --num_frames 60 # ✗
undefined
--num_frames 80 # ✗ --num_frames 60 # ✗
undefined

Checkpoint not found

找不到检查点

bash
undefined
bash
undefined

Verify checkpoint placement

验证检查点放置路径是否正确

ls game_editing/models/train/Wan2.1-T2V-1.3B_gbuffer/model.safetensors ls inverse_renderer/checkpoints/Diffusion_Renderer_Inverse_Cosmos_7B/
undefined
ls game_editing/models/train/Wan2.1-T2V-1.3B_gbuffer/model.safetensors ls inverse_renderer/checkpoints/Diffusion_Renderer_Inverse_Cosmos_7B/
undefined

Version conflicts between models

模型之间的版本冲突

Always use the two separate conda environments (
inverse_renderer
and
game_editing
). Do not install both models' dependencies in one environment.

请始终使用两个独立的conda环境(
inverse_renderer
game_editing
),不要在同一个环境中安装两个模型的依赖。

Citation

引用

bibtex
@article{huang2026generativeworldrenderer,
    title={Generative World Renderer},
    author={Zheng-Hui Huang and Zhixiang Wang and Jiaming Tan and Ruihan Yu and Yidan Zhang and Bo Zheng and Yu-Lun Liu and Yung-Yu Chuang and Kaipeng Zhang},
    journal={arXiv preprint arXiv:2604.02329},
    year={2026}
}
bibtex
@article{huang2026generativeworldrenderer,
    title={Generative World Renderer},
    author={Zheng-Hui Huang and Zhixiang Wang and Jiaming Tan and Ruihan Yu and Yidan Zhang and Bo Zheng and Yu-Lun Liu and Yung-Yu Chuang and Kaipeng Zhang},
    journal={arXiv preprint arXiv:2604.02329},
    year={2026}
}