wildworld-dataset

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

WildWorld Dataset Skill

WildWorld 数据集 Skill

Skill by ara.so — Daily 2026 Skills collection.
ara.so提供的Skill — 2026年度Skill合集。

What WildWorld Is

WildWorld是什么

WildWorld is a large-scale action-conditioned world modeling dataset automatically collected from a photorealistic AAA action role-playing game (ARPG). It is designed for training and evaluating dynamic world models — generative models that predict future game states given past observations and player actions.
WildWorld是从照片级写实的3A动作角色扮演游戏(ARPG)中自动采集的大规模动作条件世界建模数据集,旨在用于训练和评估动态世界模型——这类生成式模型可根据过往观测和玩家动作预测未来游戏状态。

Key Statistics

关键统计数据

PropertyValue
Total frames108M+
Actions450+ semantically meaningful
Monster species29
Player characters4
Weapon types4
Distinct stages5
Max clip length30+ minutes continuous
属性数值
总帧数1.08亿+
动作数量450+种有语义含义的动作
怪物种类29
玩家角色4
武器类型4
不同关卡5
最大片段长度连续30分钟以上

Per-Frame Annotations

逐帧标注

Every frame includes:
  • Character skeletons — joint positions for player and monsters
  • Actions & states — HP, animation state, stamina, etc.
  • Camera poses — position, rotation, field of view
  • Depth maps — monocular depth for each frame
  • Hierarchical captions — action-level and sample-level natural language descriptions

每帧包含:
  • 角色骨架——玩家和怪物的关节位置
  • 动作与状态——HP、动画状态、耐力等
  • 相机位姿——位置、旋转、视野
  • 深度图——每帧的单目深度
  • 分层描述——动作级别和样本级别的自然语言描述

Project Status

项目状态

⚠️ As of March 2026, the dataset and WildBench benchmark have not yet been released. Monitor the repository for updates.
bash
undefined
⚠️ 截至2026年3月,该数据集和WildBench基准尚未发布,关注仓库获取更新。
bash
undefined

Watch the repository for dataset release

关注仓库获取数据集发布通知


---

---

Repository Setup

仓库配置

bash
undefined
bash
undefined

Clone the repository

克隆仓库

Install dependencies (when benchmark code is released)

安装依赖(基准代码发布后可用)

pip install -r requirements.txt

---
pip install -r requirements.txt

---

Expected Dataset Structure

预期数据集结构

Based on the paper and framework description, the dataset is expected to follow this structure:
WildWorld/
├── data/
│   ├── sequences/
│   │   ├── stage_01/
│   │   │   ├── clip_000001/
│   │   │   │   ├── frames/          # RGB frames (e.g., PNG)
│   │   │   │   ├── depth/           # Depth maps
│   │   │   │   ├── skeleton/        # Per-frame skeleton JSON
│   │   │   │   ├── states/          # HP, animation, stamina JSON
│   │   │   │   ├── camera/          # Camera pose JSON
│   │   │   │   └── actions/         # Action label files
│   │   │   └── clip_000002/
│   │   └── stage_02/
│   └── captions/
│       ├── action_level/            # Per-action descriptions
│       └── sample_level/            # Clip-level descriptions
├── benchmark/
│   └── wildbench/                   # WildBench evaluation code
├── assets/
│   └── framework-arxiv.png
├── LICENSE
└── README.md

根据论文和框架描述,数据集将遵循以下结构:
WildWorld/
├── data/
│   ├── sequences/
│   │   ├── stage_01/
│   │   │   ├── clip_000001/
│   │   │   │   ├── frames/          # RGB frames (e.g., PNG)
│   │   │   │   ├── depth/           # Depth maps
│   │   │   │   ├── skeleton/        # Per-frame skeleton JSON
│   │   │   │   ├── states/          # HP, animation, stamina JSON
│   │   │   │   ├── camera/          # Camera pose JSON
│   │   │   │   └── actions/         # Action label files
│   │   │   └── clip_000002/
│   │   └── stage_02/
│   └── captions/
│       ├── action_level/            # Per-action descriptions
│       └── sample_level/            # Clip-level descriptions
├── benchmark/
│   └── wildbench/                   # WildBench evaluation code
├── assets/
│   └── framework-arxiv.png
├── LICENSE
└── README.md

Working with the Dataset (Anticipated API)

使用数据集(预期API)

Loading Frame Annotations

加载帧标注

python
import json
import os
from pathlib import Path
from PIL import Image
import numpy as np

class WildWorldClip:
    """Helper class to load a WildWorld clip and its annotations."""

    def __init__(self, clip_dir: str):
        self.clip_dir = Path(clip_dir)
        self.frames_dir = self.clip_dir / "frames"
        self.depth_dir = self.clip_dir / "depth"
        self.skeleton_dir = self.clip_dir / "skeleton"
        self.states_dir = self.clip_dir / "states"
        self.camera_dir = self.clip_dir / "camera"
        self.actions_dir = self.clip_dir / "actions"

    def get_frame(self, frame_id: int) -> Image.Image:
        frame_path = self.frames_dir / f"{frame_id:06d}.png"
        return Image.open(frame_path)

    def get_depth(self, frame_id: int) -> np.ndarray:
        depth_path = self.depth_dir / f"{frame_id:06d}.npy"
        return np.load(depth_path)

    def get_skeleton(self, frame_id: int) -> dict:
        skeleton_path = self.skeleton_dir / f"{frame_id:06d}.json"
        with open(skeleton_path) as f:
            return json.load(f)

    def get_state(self, frame_id: int) -> dict:
        """Returns HP, animation state, stamina, etc."""
        state_path = self.states_dir / f"{frame_id:06d}.json"
        with open(state_path) as f:
            return json.load(f)

    def get_camera(self, frame_id: int) -> dict:
        """Returns camera position, rotation, and FOV."""
        camera_path = self.camera_dir / f"{frame_id:06d}.json"
        with open(camera_path) as f:
            return json.load(f)

    def get_action(self, frame_id: int) -> dict:
        action_path = self.actions_dir / f"{frame_id:06d}.json"
        with open(action_path) as f:
            return json.load(f)

    def iter_frames(self, start: int = 0, end: int = None):
        """Iterate over all frames in the clip."""
        frame_files = sorted(self.frames_dir.glob("*.png"))
        for frame_path in frame_files[start:end]:
            frame_id = int(frame_path.stem)
            yield {
                "frame_id": frame_id,
                "frame": self.get_frame(frame_id),
                "depth": self.get_depth(frame_id),
                "skeleton": self.get_skeleton(frame_id),
                "state": self.get_state(frame_id),
                "camera": self.get_camera(frame_id),
                "action": self.get_action(frame_id),
            }
python
import json
import os
from pathlib import Path
from PIL import Image
import numpy as np

class WildWorldClip:
    """Helper class to load a WildWorld clip and its annotations."""

    def __init__(self, clip_dir: str):
        self.clip_dir = Path(clip_dir)
        self.frames_dir = self.clip_dir / "frames"
        self.depth_dir = self.clip_dir / "depth"
        self.skeleton_dir = self.clip_dir / "skeleton"
        self.states_dir = self.clip_dir / "states"
        self.camera_dir = self.clip_dir / "camera"
        self.actions_dir = self.clip_dir / "actions"

    def get_frame(self, frame_id: int) -> Image.Image:
        frame_path = self.frames_dir / f"{frame_id:06d}.png"
        return Image.open(frame_path)

    def get_depth(self, frame_id: int) -> np.ndarray:
        depth_path = self.depth_dir / f"{frame_id:06d}.npy"
        return np.load(depth_path)

    def get_skeleton(self, frame_id: int) -> dict:
        skeleton_path = self.skeleton_dir / f"{frame_id:06d}.json"
        with open(skeleton_path) as f:
            return json.load(f)

    def get_state(self, frame_id: int) -> dict:
        """Returns HP, animation state, stamina, etc."""
        state_path = self.states_dir / f"{frame_id:06d}.json"
        with open(state_path) as f:
            return json.load(f)

    def get_camera(self, frame_id: int) -> dict:
        """Returns camera position, rotation, and FOV."""
        camera_path = self.camera_dir / f"{frame_id:06d}.json"
        with open(camera_path) as f:
            return json.load(f)

    def get_action(self, frame_id: int) -> dict:
        action_path = self.actions_dir / f"{frame_id:06d}.json"
        with open(action_path) as f:
            return json.load(f)

    def iter_frames(self, start: int = 0, end: int = None):
        """Iterate over all frames in the clip."""
        frame_files = sorted(self.frames_dir.glob("*.png"))
        for frame_path in frame_files[start:end]:
            frame_id = int(frame_path.stem)
            yield {
                "frame_id": frame_id,
                "frame": self.get_frame(frame_id),
                "depth": self.get_depth(frame_id),
                "skeleton": self.get_skeleton(frame_id),
                "state": self.get_state(frame_id),
                "camera": self.get_camera(frame_id),
                "action": self.get_action(frame_id),
            }

Usage

Usage

clip = WildWorldClip("data/sequences/stage_01/clip_000001") for sample in clip.iter_frames(start=0, end=100): frame_id = sample["frame_id"] state = sample["state"] action = sample["action"] print(f"Frame {frame_id}: HP={state.get('hp')}, Action={action.get('name')}")
undefined
clip = WildWorldClip("data/sequences/stage_01/clip_000001") for sample in clip.iter_frames(start=0, end=100): frame_id = sample["frame_id"] state = sample["state"] action = sample["action"] print(f"Frame {frame_id}: HP={state.get('hp')}, Action={action.get('name')}")
undefined

PyTorch Dataset

PyTorch数据集

python
import torch
from torch.utils.data import Dataset, DataLoader
from pathlib import Path
import json
import numpy as np
from PIL import Image
import torchvision.transforms as T

class WildWorldDataset(Dataset):
    """
    PyTorch Dataset for WildWorld action-conditioned world modeling.
    
    Returns sequences of (frames, actions, states) for next-frame prediction.
    """

    def __init__(
        self,
        root_dir: str,
        sequence_length: int = 16,
        image_size: tuple = (256, 256),
        stage: str = None,
        split: str = "train",
    ):
        self.root_dir = Path(root_dir)
        self.sequence_length = sequence_length
        self.image_size = image_size

        self.transform = T.Compose([
            T.Resize(image_size),
            T.ToTensor(),
            T.Normalize(mean=[0.485, 0.456, 0.406],
                        std=[0.229, 0.224, 0.225]),
        ])

        # Discover all clips
        self.clips = self._discover_clips(stage, split)
        self.samples = self._build_sample_index()

    def _discover_clips(self, stage, split):
        clips = []
        stage_dirs = (
            [self.root_dir / "data" / "sequences" / stage]
            if stage
            else sorted((self.root_dir / "data" / "sequences").iterdir())
        )
        for stage_dir in stage_dirs:
            if stage_dir.is_dir():
                for clip_dir in sorted(stage_dir.iterdir()):
                    if clip_dir.is_dir():
                        clips.append(clip_dir)
        # Simple train/val split
        split_idx = int(len(clips) * 0.9)
        return clips[:split_idx] if split == "train" else clips[split_idx:]

    def _build_sample_index(self):
        """Build index of (clip_dir, start_frame) pairs."""
        samples = []
        for clip_dir in self.clips:
            frames = sorted((clip_dir / "frames").glob("*.png"))
            n_frames = len(frames)
            for start in range(0, n_frames - self.sequence_length, self.sequence_length // 2):
                samples.append((clip_dir, start))
        return samples

    def __len__(self):
        return len(self.samples)

    def __getitem__(self, idx):
        clip_dir, start = self.samples[idx]
        frames_dir = clip_dir / "frames"
        frame_files = sorted(frames_dir.glob("*.png"))[start:start + self.sequence_length]

        frames, actions, states = [], [], []
        for frame_path in frame_files:
            frame_id = int(frame_path.stem)

            # Load RGB frame
            img = Image.open(frame_path).convert("RGB")
            frames.append(self.transform(img))

            # Load action
            action_path = clip_dir / "actions" / f"{frame_id:06d}.json"
            with open(action_path) as f:
                action_data = json.load(f)
            actions.append(action_data.get("action_id", 0))

            # Load state
            state_path = clip_dir / "states" / f"{frame_id:06d}.json"
            with open(state_path) as f:
                state_data = json.load(f)
            states.append([
                state_data.get("hp", 1.0),
                state_data.get("stamina", 1.0),
                state_data.get("animation_id", 0),
            ])

        return {
            "frames": torch.stack(frames),            # (T, C, H, W)
            "actions": torch.tensor(actions, dtype=torch.long),   # (T,)
            "states": torch.tensor(states, dtype=torch.float32),  # (T, S)
        }
python
import torch
from torch.utils.data import Dataset, DataLoader
from pathlib import Path
import json
import numpy as np
from PIL import Image
import torchvision.transforms as T

class WildWorldDataset(Dataset):
    """
    PyTorch Dataset for WildWorld action-conditioned world modeling.
    
    Returns sequences of (frames, actions, states) for next-frame prediction.
    """

    def __init__(
        self,
        root_dir: str,
        sequence_length: int = 16,
        image_size: tuple = (256, 256),
        stage: str = None,
        split: str = "train",
    ):
        self.root_dir = Path(root_dir)
        self.sequence_length = sequence_length
        self.image_size = image_size

        self.transform = T.Compose([
            T.Resize(image_size),
            T.ToTensor(),
            T.Normalize(mean=[0.485, 0.456, 0.406],
                        std=[0.229, 0.224, 0.225]),
        ])

        # Discover all clips
        self.clips = self._discover_clips(stage, split)
        self.samples = self._build_sample_index()

    def _discover_clips(self, stage, split):
        clips = []
        stage_dirs = (
            [self.root_dir / "data" / "sequences" / stage]
            if stage
            else sorted((self.root_dir / "data" / "sequences").iterdir())
        )
        for stage_dir in stage_dirs:
            if stage_dir.is_dir():
                for clip_dir in sorted(stage_dir.iterdir()):
                    if clip_dir.is_dir():
                        clips.append(clip_dir)
        # Simple train/val split
        split_idx = int(len(clips) * 0.9)
        return clips[:split_idx] if split == "train" else clips[split_idx:]

    def _build_sample_index(self):
        """Build index of (clip_dir, start_frame) pairs."""
        samples = []
        for clip_dir in self.clips:
            frames = sorted((clip_dir / "frames").glob("*.png"))
            n_frames = len(frames)
            for start in range(0, n_frames - self.sequence_length, self.sequence_length // 2):
                samples.append((clip_dir, start))
        return samples

    def __len__(self):
        return len(self.samples)

    def __getitem__(self, idx):
        clip_dir, start = self.samples[idx]
        frames_dir = clip_dir / "frames"
        frame_files = sorted(frames_dir.glob("*.png"))[start:start + self.sequence_length]

        frames, actions, states = [], [], []
        for frame_path in frame_files:
            frame_id = int(frame_path.stem)

            # Load RGB frame
            img = Image.open(frame_path).convert("RGB")
            frames.append(self.transform(img))

            # Load action
            action_path = clip_dir / "actions" / f"{frame_id:06d}.json"
            with open(action_path) as f:
                action_data = json.load(f)
            actions.append(action_data.get("action_id", 0))

            # Load state
            state_path = clip_dir / "states" / f"{frame_id:06d}.json"
            with open(state_path) as f:
                state_data = json.load(f)
            states.append([
                state_data.get("hp", 1.0),
                state_data.get("stamina", 1.0),
                state_data.get("animation_id", 0),
            ])

        return {
            "frames": torch.stack(frames),            # (T, C, H, W)
            "actions": torch.tensor(actions, dtype=torch.long),   # (T,)
            "states": torch.tensor(states, dtype=torch.float32),  # (T, S)
        }

Usage

Usage

dataset = WildWorldDataset( root_dir="/path/to/WildWorld", sequence_length=16, image_size=(256, 256), split="train", )
loader = DataLoader(dataset, batch_size=4, shuffle=True, num_workers=4)
for batch in loader: frames = batch["frames"] # (B, T, C, H, W) actions = batch["actions"] # (B, T) states = batch["states"] # (B, T, S) print(f"Frames: {frames.shape}, Actions: {actions.shape}") break
undefined
dataset = WildWorldDataset( root_dir="/path/to/WildWorld", sequence_length=16, image_size=(256, 256), split="train", )
loader = DataLoader(dataset, batch_size=4, shuffle=True, num_workers=4)
for batch in loader: frames = batch["frames"] # (B, T, C, H, W) actions = batch["actions"] # (B, T) states = batch["states"] # (B, T, S) print(f"Frames: {frames.shape}, Actions: {actions.shape}") break
undefined

Filtering by Action Type

按动作类型筛选

python
undefined
python
undefined

Action categories in WildWorld

Action categories in WildWorld

ACTION_CATEGORIES = { "movement": ["walk", "run", "sprint", "dodge", "jump"], "attack": ["light_attack", "heavy_attack", "combo_finisher"], "skill": ["skill_cast_1", "skill_cast_2", "skill_cast_3", "skill_cast_4"], "defense": ["block", "parry", "guard"], "idle": ["idle", "idle_combat"], }
def filter_clips_by_action(dataset_root: str, action_category: str) -> list: """Find all frame indices that contain a specific action category.""" root = Path(dataset_root) results = [] target_actions = ACTION_CATEGORIES.get(action_category, [])
for clip_dir in root.glob("data/sequences/**"):
    if not clip_dir.is_dir():
        continue
    for action_file in sorted((clip_dir / "actions").glob("*.json")):
        with open(action_file) as f:
            data = json.load(f)
        if data.get("action_name") in target_actions:
            results.append({
                "clip": str(clip_dir),
                "frame_id": int(action_file.stem),
                "action": data.get("action_name"),
            })
return results
ACTION_CATEGORIES = { "movement": ["walk", "run", "sprint", "dodge", "jump"], "attack": ["light_attack", "heavy_attack", "combo_finisher"], "skill": ["skill_cast_1", "skill_cast_2", "skill_cast_3", "skill_cast_4"], "defense": ["block", "parry", "guard"], "idle": ["idle", "idle_combat"], }
def filter_clips_by_action(dataset_root: str, action_category: str) -> list: """Find all frame indices that contain a specific action category.""" root = Path(dataset_root) results = [] target_actions = ACTION_CATEGORIES.get(action_category, [])
for clip_dir in root.glob("data/sequences/**"):
    if not clip_dir.is_dir():
        continue
    for action_file in sorted((clip_dir / "actions").glob("*.json")):
        with open(action_file) as f:
            data = json.load(f)
        if data.get("action_name") in target_actions:
            results.append({
                "clip": str(clip_dir),
                "frame_id": int(action_file.stem),
                "action": data.get("action_name"),
            })
return results

Find all skill cast frames

Find all skill cast frames

skill_frames = filter_clips_by_action("/path/to/WildWorld", "skill") print(f"Found {len(skill_frames)} skill cast frames")

---
skill_frames = filter_clips_by_action("/path/to/WildWorld", "skill") print(f"Found {len(skill_frames)} skill cast frames")

---

WildBench Evaluation

WildBench评估

python
undefined
python
undefined

WildBench evaluates world models on next-frame prediction quality.

WildBench evaluates world models on next-frame prediction quality.

Expected metrics: FVD, PSNR, SSIM, action accuracy

Expected metrics: FVD, PSNR, SSIM, action accuracy

class WildBenchEvaluator: """Evaluator for world model predictions on WildBench."""
def __init__(self, benchmark_dir: str):
    self.benchmark_dir = Path(benchmark_dir)
    self.metrics = {}

def evaluate(self, model, dataloader):
    from torchmetrics.image import StructuralSimilarityIndexMeasure, PeakSignalNoiseRatio

    ssim = StructuralSimilarityIndexMeasure()
    psnr = PeakSignalNoiseRatio()

    all_psnr, all_ssim = [], []

    for batch in dataloader:
        frames = batch["frames"]       # (B, T, C, H, W)
        actions = batch["actions"]     # (B, T)
        states = batch["states"]       # (B, T, S)

        # Use first T-1 frames to predict the T-th frame
        context_frames = frames[:, :-1]
        context_actions = actions[:, :-1]
        target_frame = frames[:, -1]

        with torch.no_grad():
            predicted_frame = model(context_frames, context_actions, states[:, :-1])

        all_psnr.append(psnr(predicted_frame, target_frame).item())
        all_ssim.append(ssim(predicted_frame, target_frame).item())

    return {
        "PSNR": np.mean(all_psnr),
        "SSIM": np.mean(all_ssim),
    }

---
class WildBenchEvaluator: """Evaluator for world model predictions on WildBench."""
def __init__(self, benchmark_dir: str):
    self.benchmark_dir = Path(benchmark_dir)
    self.metrics = {}

def evaluate(self, model, dataloader):
    from torchmetrics.image import StructuralSimilarityIndexMeasure, PeakSignalNoiseRatio

    ssim = StructuralSimilarityIndexMeasure()
    psnr = PeakSignalNoiseRatio()

    all_psnr, all_ssim = [], []

    for batch in dataloader:
        frames = batch["frames"]       # (B, T, C, H, W)
        actions = batch["actions"]     # (B, T)
        states = batch["states"]       # (B, T, S)

        # Use first T-1 frames to predict the T-th frame
        context_frames = frames[:, :-1]
        context_actions = actions[:, :-1]
        target_frame = frames[:, -1]

        with torch.no_grad():
            predicted_frame = model(context_frames, context_actions, states[:, :-1])

        all_psnr.append(psnr(predicted_frame, target_frame).item())
        all_ssim.append(ssim(predicted_frame, target_frame).item())

    return {
        "PSNR": np.mean(all_psnr),
        "SSIM": np.mean(all_ssim),
    }

---

Citation

引用

bibtex
@misc{li2026wildworldlargescaledatasetdynamic,
      title={WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG}, 
      author={Zhen Li and Zian Meng and Shuwei Shi and Wenshuo Peng and Yuwei Wu and Bo Zheng and Chuanhao Li and Kaipeng Zhang},
      year={2026},
      eprint={2603.23497},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.23497}, 
}

bibtex
@misc{li2026wildworldlargescaledatasetdynamic,
      title={WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG}, 
      author={Zhen Li and Zian Meng and Shuwei Shi and Wenshuo Peng and Yuwei Wu and Bo Zheng and Chuanhao Li and Kaipeng Zhang},
      year={2026},
      eprint={2603.23497},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.23497}, 
}

Resources

资源

Troubleshooting

常见问题

IssueSolution
Dataset not yet availableMonitor the repo; dataset release is pending as of March 2026
Frame loading OOMReduce
sequence_length
or
image_size
in the Dataset
Missing annotation filesCheck that all subdirs (frames, depth, skeleton, states, camera, actions) are fully downloaded
Slow DataLoaderIncrease
num_workers
, use SSD storage, or preprocess to HDF5
Benchmark code not foundThe
benchmark/wildbench
directory will be released separately — watch the repo
问题解决方案
数据集尚未可用关注仓库,截至2026年3月数据集仍待发布
帧加载内存溢出减小数据集中的
sequence_length
image_size
参数
标注文件缺失检查所有子目录(frames、depth、skeleton、states、camera、actions)是否已完整下载
DataLoader加载慢增大
num_workers
、使用SSD存储,或预处理为HDF5格式
找不到基准代码
benchmark/wildbench
目录将单独发布,关注仓库获取更新