threejs-perf

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Three.js Performance Optimization

Three.js 性能优化

Performance patterns for Three.js games, backed by measured before/after numbers on Three.js r183 (headless Chromium via Playwright, Apple M1 Pro, software WebGL).

适用于Three.js游戏的性能优化模式，基于Three.js r183版本的实测前后数据（通过Playwright使用无头Chromium，Apple M1 Pro，软件WebGL环境）。

Reference Files

参考文件

```
instancing-static.md
```
— InstancedMesh for large static repeated objects (19,600 → 1 draw call)
```
instancing-moving.md
```
— Flat state buffer + batched InstancedMesh writes for moving entities (8,000 entities)
```
templates/
```
— Baseline vs optimized reference implementations for each pattern

```
instancing-static.md
```
— 用于大型静态重复对象的InstancedMesh（19600次 → 1次绘制调用）
```
instancing-moving.md
```
— 扁平状态缓冲区 + 批量InstancedMesh写入，适用于移动实体（8000个实体）
```
templates/
```
— 每种模式的基准实现与优化实现参考

When to Use This Skill

何时使用该方案

Scene has 100+ repeated objects sharing geometry/material
Draw calls exceed 500 and frame time is unstable
Thousands of moving entities need per-frame transform updates
Profile shows scene-graph traversal as a bottleneck

场景包含100个以上共享几何体/材质的重复对象
绘制调用超过500次且帧时间不稳定
数千个移动实体需要逐帧更新变换
性能分析显示场景图遍历是瓶颈

When NOT to Use

何时不使用

Object count is low (<50 unique meshes) — simpler code wins
Every object needs unique materials/shaders that defeat batching
Geometry differs enough that instancing provides no batching benefit

对象数量较少（<50个唯一网格）—— 更简洁的代码更合适
每个对象都需要独特的材质/着色器，导致批处理失效
几何体差异过大，实例化无法带来批处理收益

Pattern 1: Instancing Large Static Object Sets

模式1：大型静态对象集实例化

Problem: Forests, debris, decorations as individual Meshes = unnecessary draw calls.

Solution: One

InstancedMesh

per shared geometry+material combo.

Evidence: ~19,365 → 2 draw calls. Render CPU p95: 28.5ms → 0.5ms (~57× faster). Build: 39.4ms → 3.9ms. See

instancing-static.md

// Anti-pattern: one Mesh per prop
for (let i = 0; i < 19600; i++) {
  const mesh = new THREE.Mesh(geometry, material);
  mesh.position.set(x, 0, z);
  scene.add(mesh); // 19,600 draw calls
}

// Correct: one InstancedMesh
const im = new THREE.InstancedMesh(geometry, material, 19600);
const mat = new THREE.Matrix4();
for (let i = 0; i < 19600; i++) {
  mat.makeTranslation(x, 0, z);
  im.setMatrixAt(i, mat);
}
im.instanceMatrix.needsUpdate = true;
scene.add(im); // 1 draw call

问题：森林、碎片、装饰等作为单个Mesh存在 → 产生不必要的绘制调用。

解决方案：每个共享几何体+材质组合使用一个

InstancedMesh

。

实测数据：约19365次 → 2次绘制调用。渲染CPU p95：28.5ms → 0.5ms（约57倍提速）。构建时间：39.4ms → 3.9ms。详见

instancing-static.md

。

// 反模式：每个道具对应一个Mesh
for (let i = 0; i < 19600; i++) {
  const mesh = new THREE.Mesh(geometry, material);
  mesh.position.set(x, 0, z);
  scene.add(mesh); // 19600次绘制调用
}

// 正确方式：单个InstancedMesh
const im = new THREE.InstancedMesh(geometry, material, 19600);
const mat = new THREE.Matrix4();
for (let i = 0; i < 19600; i++) {
  mat.makeTranslation(x, 0, z);
  im.setMatrixAt(i, mat);
}
im.instanceMatrix.needsUpdate = true;
scene.add(im); // 1次绘制调用

Pattern 2: Moving Entity Update Loops

模式2：移动实体更新循环

Problem: Thousands of moving actors as individual Meshes = scene-graph churn + transform propagation.

Solution: Flat entity state buffer + batched

InstancedMesh.setMatrixAt()

writes.

Evidence: 8,000 → 1 draw calls. Render CPU p95: 9.9ms → 0.5ms (~20× faster). Update loop p95: 1.4ms → 0.3ms. See

instancing-moving.md

// Anti-pattern: per-entity Mesh position writes
meshes.forEach((mesh, i) => {
  mesh.position.x = computeX(i, tick);
  mesh.position.y = computeY(i, tick);
});

// Correct: batched instance matrix writes
const mat = new THREE.Matrix4();
for (let i = 0; i < count; i++) {
  mat.makeTranslation(computeX(i, tick), computeY(i, tick), computeZ(i, tick));
  instancedMesh.setMatrixAt(i, mat);
}
instancedMesh.instanceMatrix.needsUpdate = true;

问题：数千个移动角色作为单个Mesh存在 → 场景图频繁变动 + 变换传播开销。

解决方案：扁平实体状态缓冲区 + 批量

InstancedMesh.setMatrixAt()

写入。

实测数据：8000次 → 1次绘制调用。渲染CPU p95：9.9ms → 0.5ms（约20倍提速）。更新循环p95：1.4ms → 0.3ms。详见

instancing-moving.md

。

// 反模式：逐实体写入Mesh位置
meshes.forEach((mesh, i) => {
  mesh.position.x = computeX(i, tick);
  mesh.position.y = computeY(i, tick);
});

// 正确方式：批量实例矩阵写入
const mat = new THREE.Matrix4();
for (let i = 0; i < count; i++) {
  mat.makeTranslation(computeX(i, tick), computeY(i, tick), computeZ(i, tick));
  instancedMesh.setMatrixAt(i, mat);
}
instancedMesh.instanceMatrix.needsUpdate = true;

Decision Tree

决策树

Is the object repeated 50+ times with same geometry+material?
├── YES → Is it static (no per-frame movement)?
│   ├── YES → Pattern 1: Static InstancedMesh (instancing-static.md)
│   └── NO  → Pattern 2: Moving InstancedMesh with batched writes (instancing-moving.md)
└── NO  → Standard Mesh is fine. Focus on material/geometry reuse.

对象是否重复50次以上且使用相同几何体+材质？
├── 是 → 是否为静态（无逐帧移动）？
│   ├── 是 → 模式1：静态InstancedMesh（instancing-static.md）
│   └── 否 → 模式2：带批量写入的移动InstancedMesh（instancing-moving.md）
└── 否 → 使用标准Mesh即可。重点关注材质/几何体复用。

Measured Results

实测结果

Headless Chromium 147 via Playwright, Three.js r183, Apple M1 Pro, 30 warmup + 180 sample frames, median of 3 runs.

Scenario	Metric	Baseline	Optimized	Improvement
Static World (19.6k cubes)	Draw calls	~19,365	2	~9,682×
Static World (19.6k cubes)	Render CPU p95	28.5ms	0.5ms	~57×
Static World (19.6k cubes)	Build	39.4ms	3.9ms	~10×
Moving Entities (8k wave-field)	Draw calls	8,000	1	8,000×
Moving Entities (8k wave-field)	Render CPU p95	9.9ms	0.5ms	~20×
Moving Entities (8k wave-field)	Update loop p95	1.4ms	0.3ms	~4.7×

通过Playwright使用无头Chromium 147，Three.js r183，Apple M1 Pro，30帧预热 + 180帧采样，取3次运行的中位数。

场景	指标	基准值	优化后	提升幅度
静态场景（19.6k个立方体）	绘制调用	~19365	2	~9682倍
静态场景（19.6k个立方体）	渲染CPU p95	28.5ms	0.5ms	~57倍
静态场景（19.6k个立方体）	构建时间	39.4ms	3.9ms	~10倍
移动实体（8k个波场对象）	绘制调用	8000	1	8000倍
移动实体（8k个波场对象）	渲染CPU p95	9.9ms	0.5ms	~20倍
移动实体（8k个波场对象）	更新循环p95	1.4ms	0.3ms	~4.7倍

Methodology notes

方法说明

CPU-side metrics are the trustworthy signal. Draw calls, render CPU p95, update loop, and build time reliably show the 1–2 order-of-magnitude win.
FPS and frame-time p95 are unreliable in headless Chromium. Playwright's bundled Chromium uses SwiftShader (software WebGL), which bottlenecks on fragment shading of ~90 MB of visible geometry regardless of draw-call count. On real hardware WebGL, the FPS gap would be substantially larger — baseline would drop to single-digit FPS under real fill, and optimized would hit vsync cleanly.
A benchmark passes if draw calls decreased and render CPU p95 did not regress.

CPU端指标是可靠信号。绘制调用、渲染CPU p95、更新循环和构建时间能稳定体现1-2个数量级的性能提升。
无头Chromium中的FPS和帧时间p95不可靠。Playwright捆绑的Chromium使用SwiftShader（软件WebGL），无论绘制调用数量多少，都会因约90MB可见几何体的片段着色而出现瓶颈。在真实硬件WebGL环境中，FPS差距会显著扩大——基准版本在真实填充率下会降至个位数FPS，而优化版本能稳定达到垂直同步帧率。
基准测试通过标准：绘制调用减少且渲染CPU p95未出现性能倒退。