wildworld-dataset
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseWildWorld Dataset Skill
WildWorld 数据集 Skill
What WildWorld Is
WildWorld是什么
WildWorld is a large-scale action-conditioned world modeling dataset automatically collected from a photorealistic AAA action role-playing game (ARPG). It is designed for training and evaluating dynamic world models — generative models that predict future game states given past observations and player actions.
WildWorld是从照片级写实的3A动作角色扮演游戏(ARPG)中自动采集的大规模动作条件世界建模数据集,旨在用于训练和评估动态世界模型——这类生成式模型可根据过往观测和玩家动作预测未来游戏状态。
Key Statistics
关键统计数据
| Property | Value |
|---|---|
| Total frames | 108M+ |
| Actions | 450+ semantically meaningful |
| Monster species | 29 |
| Player characters | 4 |
| Weapon types | 4 |
| Distinct stages | 5 |
| Max clip length | 30+ minutes continuous |
| 属性 | 数值 |
|---|---|
| 总帧数 | 1.08亿+ |
| 动作数量 | 450+种有语义含义的动作 |
| 怪物种类 | 29 |
| 玩家角色 | 4 |
| 武器类型 | 4 |
| 不同关卡 | 5 |
| 最大片段长度 | 连续30分钟以上 |
Per-Frame Annotations
逐帧标注
Every frame includes:
- Character skeletons — joint positions for player and monsters
- Actions & states — HP, animation state, stamina, etc.
- Camera poses — position, rotation, field of view
- Depth maps — monocular depth for each frame
- Hierarchical captions — action-level and sample-level natural language descriptions
每帧包含:
- 角色骨架——玩家和怪物的关节位置
- 动作与状态——HP、动画状态、耐力等
- 相机位姿——位置、旋转、视野
- 深度图——每帧的单目深度
- 分层描述——动作级别和样本级别的自然语言描述
Project Status
项目状态
⚠️ As of March 2026, the dataset and WildBench benchmark have not yet been released. Monitor the repository for updates.
bash
undefined⚠️ 截至2026年3月,该数据集和WildBench基准尚未发布,关注仓库获取更新。
bash
undefinedWatch the repository for dataset release
关注仓库获取数据集发布通知
---
---Repository Setup
仓库配置
bash
undefinedbash
undefinedClone the repository
克隆仓库
git clone https://github.com/ShandaAI/WildWorld.git
cd WildWorld
git clone https://github.com/ShandaAI/WildWorld.git
cd WildWorld
Install dependencies (when benchmark code is released)
安装依赖(基准代码发布后可用)
pip install -r requirements.txt
---pip install -r requirements.txt
---Expected Dataset Structure
预期数据集结构
Based on the paper and framework description, the dataset is expected to follow this structure:
WildWorld/
├── data/
│ ├── sequences/
│ │ ├── stage_01/
│ │ │ ├── clip_000001/
│ │ │ │ ├── frames/ # RGB frames (e.g., PNG)
│ │ │ │ ├── depth/ # Depth maps
│ │ │ │ ├── skeleton/ # Per-frame skeleton JSON
│ │ │ │ ├── states/ # HP, animation, stamina JSON
│ │ │ │ ├── camera/ # Camera pose JSON
│ │ │ │ └── actions/ # Action label files
│ │ │ └── clip_000002/
│ │ └── stage_02/
│ └── captions/
│ ├── action_level/ # Per-action descriptions
│ └── sample_level/ # Clip-level descriptions
├── benchmark/
│ └── wildbench/ # WildBench evaluation code
├── assets/
│ └── framework-arxiv.png
├── LICENSE
└── README.md根据论文和框架描述,数据集将遵循以下结构:
WildWorld/
├── data/
│ ├── sequences/
│ │ ├── stage_01/
│ │ │ ├── clip_000001/
│ │ │ │ ├── frames/ # RGB frames (e.g., PNG)
│ │ │ │ ├── depth/ # Depth maps
│ │ │ │ ├── skeleton/ # Per-frame skeleton JSON
│ │ │ │ ├── states/ # HP, animation, stamina JSON
│ │ │ │ ├── camera/ # Camera pose JSON
│ │ │ │ └── actions/ # Action label files
│ │ │ └── clip_000002/
│ │ └── stage_02/
│ └── captions/
│ ├── action_level/ # Per-action descriptions
│ └── sample_level/ # Clip-level descriptions
├── benchmark/
│ └── wildbench/ # WildBench evaluation code
├── assets/
│ └── framework-arxiv.png
├── LICENSE
└── README.mdWorking with the Dataset (Anticipated API)
使用数据集(预期API)
Loading Frame Annotations
加载帧标注
python
import json
import os
from pathlib import Path
from PIL import Image
import numpy as np
class WildWorldClip:
"""Helper class to load a WildWorld clip and its annotations."""
def __init__(self, clip_dir: str):
self.clip_dir = Path(clip_dir)
self.frames_dir = self.clip_dir / "frames"
self.depth_dir = self.clip_dir / "depth"
self.skeleton_dir = self.clip_dir / "skeleton"
self.states_dir = self.clip_dir / "states"
self.camera_dir = self.clip_dir / "camera"
self.actions_dir = self.clip_dir / "actions"
def get_frame(self, frame_id: int) -> Image.Image:
frame_path = self.frames_dir / f"{frame_id:06d}.png"
return Image.open(frame_path)
def get_depth(self, frame_id: int) -> np.ndarray:
depth_path = self.depth_dir / f"{frame_id:06d}.npy"
return np.load(depth_path)
def get_skeleton(self, frame_id: int) -> dict:
skeleton_path = self.skeleton_dir / f"{frame_id:06d}.json"
with open(skeleton_path) as f:
return json.load(f)
def get_state(self, frame_id: int) -> dict:
"""Returns HP, animation state, stamina, etc."""
state_path = self.states_dir / f"{frame_id:06d}.json"
with open(state_path) as f:
return json.load(f)
def get_camera(self, frame_id: int) -> dict:
"""Returns camera position, rotation, and FOV."""
camera_path = self.camera_dir / f"{frame_id:06d}.json"
with open(camera_path) as f:
return json.load(f)
def get_action(self, frame_id: int) -> dict:
action_path = self.actions_dir / f"{frame_id:06d}.json"
with open(action_path) as f:
return json.load(f)
def iter_frames(self, start: int = 0, end: int = None):
"""Iterate over all frames in the clip."""
frame_files = sorted(self.frames_dir.glob("*.png"))
for frame_path in frame_files[start:end]:
frame_id = int(frame_path.stem)
yield {
"frame_id": frame_id,
"frame": self.get_frame(frame_id),
"depth": self.get_depth(frame_id),
"skeleton": self.get_skeleton(frame_id),
"state": self.get_state(frame_id),
"camera": self.get_camera(frame_id),
"action": self.get_action(frame_id),
}python
import json
import os
from pathlib import Path
from PIL import Image
import numpy as np
class WildWorldClip:
"""Helper class to load a WildWorld clip and its annotations."""
def __init__(self, clip_dir: str):
self.clip_dir = Path(clip_dir)
self.frames_dir = self.clip_dir / "frames"
self.depth_dir = self.clip_dir / "depth"
self.skeleton_dir = self.clip_dir / "skeleton"
self.states_dir = self.clip_dir / "states"
self.camera_dir = self.clip_dir / "camera"
self.actions_dir = self.clip_dir / "actions"
def get_frame(self, frame_id: int) -> Image.Image:
frame_path = self.frames_dir / f"{frame_id:06d}.png"
return Image.open(frame_path)
def get_depth(self, frame_id: int) -> np.ndarray:
depth_path = self.depth_dir / f"{frame_id:06d}.npy"
return np.load(depth_path)
def get_skeleton(self, frame_id: int) -> dict:
skeleton_path = self.skeleton_dir / f"{frame_id:06d}.json"
with open(skeleton_path) as f:
return json.load(f)
def get_state(self, frame_id: int) -> dict:
"""Returns HP, animation state, stamina, etc."""
state_path = self.states_dir / f"{frame_id:06d}.json"
with open(state_path) as f:
return json.load(f)
def get_camera(self, frame_id: int) -> dict:
"""Returns camera position, rotation, and FOV."""
camera_path = self.camera_dir / f"{frame_id:06d}.json"
with open(camera_path) as f:
return json.load(f)
def get_action(self, frame_id: int) -> dict:
action_path = self.actions_dir / f"{frame_id:06d}.json"
with open(action_path) as f:
return json.load(f)
def iter_frames(self, start: int = 0, end: int = None):
"""Iterate over all frames in the clip."""
frame_files = sorted(self.frames_dir.glob("*.png"))
for frame_path in frame_files[start:end]:
frame_id = int(frame_path.stem)
yield {
"frame_id": frame_id,
"frame": self.get_frame(frame_id),
"depth": self.get_depth(frame_id),
"skeleton": self.get_skeleton(frame_id),
"state": self.get_state(frame_id),
"camera": self.get_camera(frame_id),
"action": self.get_action(frame_id),
}Usage
Usage
clip = WildWorldClip("data/sequences/stage_01/clip_000001")
for sample in clip.iter_frames(start=0, end=100):
frame_id = sample["frame_id"]
state = sample["state"]
action = sample["action"]
print(f"Frame {frame_id}: HP={state.get('hp')}, Action={action.get('name')}")
undefinedclip = WildWorldClip("data/sequences/stage_01/clip_000001")
for sample in clip.iter_frames(start=0, end=100):
frame_id = sample["frame_id"]
state = sample["state"]
action = sample["action"]
print(f"Frame {frame_id}: HP={state.get('hp')}, Action={action.get('name')}")
undefinedPyTorch Dataset
PyTorch数据集
python
import torch
from torch.utils.data import Dataset, DataLoader
from pathlib import Path
import json
import numpy as np
from PIL import Image
import torchvision.transforms as T
class WildWorldDataset(Dataset):
"""
PyTorch Dataset for WildWorld action-conditioned world modeling.
Returns sequences of (frames, actions, states) for next-frame prediction.
"""
def __init__(
self,
root_dir: str,
sequence_length: int = 16,
image_size: tuple = (256, 256),
stage: str = None,
split: str = "train",
):
self.root_dir = Path(root_dir)
self.sequence_length = sequence_length
self.image_size = image_size
self.transform = T.Compose([
T.Resize(image_size),
T.ToTensor(),
T.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
])
# Discover all clips
self.clips = self._discover_clips(stage, split)
self.samples = self._build_sample_index()
def _discover_clips(self, stage, split):
clips = []
stage_dirs = (
[self.root_dir / "data" / "sequences" / stage]
if stage
else sorted((self.root_dir / "data" / "sequences").iterdir())
)
for stage_dir in stage_dirs:
if stage_dir.is_dir():
for clip_dir in sorted(stage_dir.iterdir()):
if clip_dir.is_dir():
clips.append(clip_dir)
# Simple train/val split
split_idx = int(len(clips) * 0.9)
return clips[:split_idx] if split == "train" else clips[split_idx:]
def _build_sample_index(self):
"""Build index of (clip_dir, start_frame) pairs."""
samples = []
for clip_dir in self.clips:
frames = sorted((clip_dir / "frames").glob("*.png"))
n_frames = len(frames)
for start in range(0, n_frames - self.sequence_length, self.sequence_length // 2):
samples.append((clip_dir, start))
return samples
def __len__(self):
return len(self.samples)
def __getitem__(self, idx):
clip_dir, start = self.samples[idx]
frames_dir = clip_dir / "frames"
frame_files = sorted(frames_dir.glob("*.png"))[start:start + self.sequence_length]
frames, actions, states = [], [], []
for frame_path in frame_files:
frame_id = int(frame_path.stem)
# Load RGB frame
img = Image.open(frame_path).convert("RGB")
frames.append(self.transform(img))
# Load action
action_path = clip_dir / "actions" / f"{frame_id:06d}.json"
with open(action_path) as f:
action_data = json.load(f)
actions.append(action_data.get("action_id", 0))
# Load state
state_path = clip_dir / "states" / f"{frame_id:06d}.json"
with open(state_path) as f:
state_data = json.load(f)
states.append([
state_data.get("hp", 1.0),
state_data.get("stamina", 1.0),
state_data.get("animation_id", 0),
])
return {
"frames": torch.stack(frames), # (T, C, H, W)
"actions": torch.tensor(actions, dtype=torch.long), # (T,)
"states": torch.tensor(states, dtype=torch.float32), # (T, S)
}python
import torch
from torch.utils.data import Dataset, DataLoader
from pathlib import Path
import json
import numpy as np
from PIL import Image
import torchvision.transforms as T
class WildWorldDataset(Dataset):
"""
PyTorch Dataset for WildWorld action-conditioned world modeling.
Returns sequences of (frames, actions, states) for next-frame prediction.
"""
def __init__(
self,
root_dir: str,
sequence_length: int = 16,
image_size: tuple = (256, 256),
stage: str = None,
split: str = "train",
):
self.root_dir = Path(root_dir)
self.sequence_length = sequence_length
self.image_size = image_size
self.transform = T.Compose([
T.Resize(image_size),
T.ToTensor(),
T.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
])
# Discover all clips
self.clips = self._discover_clips(stage, split)
self.samples = self._build_sample_index()
def _discover_clips(self, stage, split):
clips = []
stage_dirs = (
[self.root_dir / "data" / "sequences" / stage]
if stage
else sorted((self.root_dir / "data" / "sequences").iterdir())
)
for stage_dir in stage_dirs:
if stage_dir.is_dir():
for clip_dir in sorted(stage_dir.iterdir()):
if clip_dir.is_dir():
clips.append(clip_dir)
# Simple train/val split
split_idx = int(len(clips) * 0.9)
return clips[:split_idx] if split == "train" else clips[split_idx:]
def _build_sample_index(self):
"""Build index of (clip_dir, start_frame) pairs."""
samples = []
for clip_dir in self.clips:
frames = sorted((clip_dir / "frames").glob("*.png"))
n_frames = len(frames)
for start in range(0, n_frames - self.sequence_length, self.sequence_length // 2):
samples.append((clip_dir, start))
return samples
def __len__(self):
return len(self.samples)
def __getitem__(self, idx):
clip_dir, start = self.samples[idx]
frames_dir = clip_dir / "frames"
frame_files = sorted(frames_dir.glob("*.png"))[start:start + self.sequence_length]
frames, actions, states = [], [], []
for frame_path in frame_files:
frame_id = int(frame_path.stem)
# Load RGB frame
img = Image.open(frame_path).convert("RGB")
frames.append(self.transform(img))
# Load action
action_path = clip_dir / "actions" / f"{frame_id:06d}.json"
with open(action_path) as f:
action_data = json.load(f)
actions.append(action_data.get("action_id", 0))
# Load state
state_path = clip_dir / "states" / f"{frame_id:06d}.json"
with open(state_path) as f:
state_data = json.load(f)
states.append([
state_data.get("hp", 1.0),
state_data.get("stamina", 1.0),
state_data.get("animation_id", 0),
])
return {
"frames": torch.stack(frames), # (T, C, H, W)
"actions": torch.tensor(actions, dtype=torch.long), # (T,)
"states": torch.tensor(states, dtype=torch.float32), # (T, S)
}Usage
Usage
dataset = WildWorldDataset(
root_dir="/path/to/WildWorld",
sequence_length=16,
image_size=(256, 256),
split="train",
)
loader = DataLoader(dataset, batch_size=4, shuffle=True, num_workers=4)
for batch in loader:
frames = batch["frames"] # (B, T, C, H, W)
actions = batch["actions"] # (B, T)
states = batch["states"] # (B, T, S)
print(f"Frames: {frames.shape}, Actions: {actions.shape}")
break
undefineddataset = WildWorldDataset(
root_dir="/path/to/WildWorld",
sequence_length=16,
image_size=(256, 256),
split="train",
)
loader = DataLoader(dataset, batch_size=4, shuffle=True, num_workers=4)
for batch in loader:
frames = batch["frames"] # (B, T, C, H, W)
actions = batch["actions"] # (B, T)
states = batch["states"] # (B, T, S)
print(f"Frames: {frames.shape}, Actions: {actions.shape}")
break
undefinedFiltering by Action Type
按动作类型筛选
python
undefinedpython
undefinedAction categories in WildWorld
Action categories in WildWorld
ACTION_CATEGORIES = {
"movement": ["walk", "run", "sprint", "dodge", "jump"],
"attack": ["light_attack", "heavy_attack", "combo_finisher"],
"skill": ["skill_cast_1", "skill_cast_2", "skill_cast_3", "skill_cast_4"],
"defense": ["block", "parry", "guard"],
"idle": ["idle", "idle_combat"],
}
def filter_clips_by_action(dataset_root: str, action_category: str) -> list:
"""Find all frame indices that contain a specific action category."""
root = Path(dataset_root)
results = []
target_actions = ACTION_CATEGORIES.get(action_category, [])
for clip_dir in root.glob("data/sequences/**"):
if not clip_dir.is_dir():
continue
for action_file in sorted((clip_dir / "actions").glob("*.json")):
with open(action_file) as f:
data = json.load(f)
if data.get("action_name") in target_actions:
results.append({
"clip": str(clip_dir),
"frame_id": int(action_file.stem),
"action": data.get("action_name"),
})
return resultsACTION_CATEGORIES = {
"movement": ["walk", "run", "sprint", "dodge", "jump"],
"attack": ["light_attack", "heavy_attack", "combo_finisher"],
"skill": ["skill_cast_1", "skill_cast_2", "skill_cast_3", "skill_cast_4"],
"defense": ["block", "parry", "guard"],
"idle": ["idle", "idle_combat"],
}
def filter_clips_by_action(dataset_root: str, action_category: str) -> list:
"""Find all frame indices that contain a specific action category."""
root = Path(dataset_root)
results = []
target_actions = ACTION_CATEGORIES.get(action_category, [])
for clip_dir in root.glob("data/sequences/**"):
if not clip_dir.is_dir():
continue
for action_file in sorted((clip_dir / "actions").glob("*.json")):
with open(action_file) as f:
data = json.load(f)
if data.get("action_name") in target_actions:
results.append({
"clip": str(clip_dir),
"frame_id": int(action_file.stem),
"action": data.get("action_name"),
})
return resultsFind all skill cast frames
Find all skill cast frames
skill_frames = filter_clips_by_action("/path/to/WildWorld", "skill")
print(f"Found {len(skill_frames)} skill cast frames")
---skill_frames = filter_clips_by_action("/path/to/WildWorld", "skill")
print(f"Found {len(skill_frames)} skill cast frames")
---WildBench Evaluation
WildBench评估
python
undefinedpython
undefinedWildBench evaluates world models on next-frame prediction quality.
WildBench evaluates world models on next-frame prediction quality.
Expected metrics: FVD, PSNR, SSIM, action accuracy
Expected metrics: FVD, PSNR, SSIM, action accuracy
class WildBenchEvaluator:
"""Evaluator for world model predictions on WildBench."""
def __init__(self, benchmark_dir: str):
self.benchmark_dir = Path(benchmark_dir)
self.metrics = {}
def evaluate(self, model, dataloader):
from torchmetrics.image import StructuralSimilarityIndexMeasure, PeakSignalNoiseRatio
ssim = StructuralSimilarityIndexMeasure()
psnr = PeakSignalNoiseRatio()
all_psnr, all_ssim = [], []
for batch in dataloader:
frames = batch["frames"] # (B, T, C, H, W)
actions = batch["actions"] # (B, T)
states = batch["states"] # (B, T, S)
# Use first T-1 frames to predict the T-th frame
context_frames = frames[:, :-1]
context_actions = actions[:, :-1]
target_frame = frames[:, -1]
with torch.no_grad():
predicted_frame = model(context_frames, context_actions, states[:, :-1])
all_psnr.append(psnr(predicted_frame, target_frame).item())
all_ssim.append(ssim(predicted_frame, target_frame).item())
return {
"PSNR": np.mean(all_psnr),
"SSIM": np.mean(all_ssim),
}
---class WildBenchEvaluator:
"""Evaluator for world model predictions on WildBench."""
def __init__(self, benchmark_dir: str):
self.benchmark_dir = Path(benchmark_dir)
self.metrics = {}
def evaluate(self, model, dataloader):
from torchmetrics.image import StructuralSimilarityIndexMeasure, PeakSignalNoiseRatio
ssim = StructuralSimilarityIndexMeasure()
psnr = PeakSignalNoiseRatio()
all_psnr, all_ssim = [], []
for batch in dataloader:
frames = batch["frames"] # (B, T, C, H, W)
actions = batch["actions"] # (B, T)
states = batch["states"] # (B, T, S)
# Use first T-1 frames to predict the T-th frame
context_frames = frames[:, :-1]
context_actions = actions[:, :-1]
target_frame = frames[:, -1]
with torch.no_grad():
predicted_frame = model(context_frames, context_actions, states[:, :-1])
all_psnr.append(psnr(predicted_frame, target_frame).item())
all_ssim.append(ssim(predicted_frame, target_frame).item())
return {
"PSNR": np.mean(all_psnr),
"SSIM": np.mean(all_ssim),
}
---Citation
引用
bibtex
@misc{li2026wildworldlargescaledatasetdynamic,
title={WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG},
author={Zhen Li and Zian Meng and Shuwei Shi and Wenshuo Peng and Yuwei Wu and Bo Zheng and Chuanhao Li and Kaipeng Zhang},
year={2026},
eprint={2603.23497},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2603.23497},
}bibtex
@misc{li2026wildworldlargescaledatasetdynamic,
title={WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG},
author={Zhen Li and Zian Meng and Shuwei Shi and Wenshuo Peng and Yuwei Wu and Bo Zheng and Chuanhao Li and Kaipeng Zhang},
year={2026},
eprint={2603.23497},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2603.23497},
}Resources
资源
- Project Page: https://shandaai.github.io/wildworld-project/
- arXiv Paper: https://arxiv.org/abs/2603.23497
- YouTube Demo: https://www.youtube.com/watch?v=9vcSg553r2g
- GitHub: https://github.com/ShandaAI/WildWorld
Troubleshooting
常见问题
| Issue | Solution |
|---|---|
| Dataset not yet available | Monitor the repo; dataset release is pending as of March 2026 |
| Frame loading OOM | Reduce |
| Missing annotation files | Check that all subdirs (frames, depth, skeleton, states, camera, actions) are fully downloaded |
| Slow DataLoader | Increase |
| Benchmark code not found | The |
| 问题 | 解决方案 |
|---|---|
| 数据集尚未可用 | 关注仓库,截至2026年3月数据集仍待发布 |
| 帧加载内存溢出 | 减小数据集中的 |
| 标注文件缺失 | 检查所有子目录(frames、depth、skeleton、states、camera、actions)是否已完整下载 |
| DataLoader加载慢 | 增大 |
| 找不到基准代码 | |