openclaw-test-performance

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

OpenClaw Test Performance

OpenClaw 测试性能优化

Use evidence first. The goal is real

pnpm test

, plugin-suite, and plugin-inspector speed/RSS improvement with coverage intact, not runner tuning by guesswork.

秉持实证优先原则。目标是在保持覆盖率的前提下，切实提升

pnpm test

、插件套件及plugin-inspector的速度与RSS占用，而非凭猜测调整运行器。

Workflow

工作流程

Read the relevant local
```
AGENTS.md
```
files before editing:
- ```
src/agents/AGENTS.md
```
  for agent/import hotspots.
- ```
src/channels/AGENTS.md
```
  and
```
src/plugins/AGENTS.md
```
  for plugin/channel laziness.
- ```
src/gateway/AGENTS.md
```
  for server lifecycle tests.
- ```
test/helpers/AGENTS.md
```
  and
```
test/helpers/channels/AGENTS.md
```
  for shared contract helpers.
- ```
src/infra/outbound/AGENTS.md
```
  for outbound/media/action tests.

Establish a baseline before changing code:

Prefer

pnpm test:perf:groups --full-suite --allow-failures --output <file>

for full-suite ranking.

For bundled plugin breadth, run the smallest relevant
```
pnpm test:extensions:batch <plugin[,plugin...]>
```
or plugin-inspector command before jumping to the full extension sweep.

For a scoped hotspot use:

/usr/bin/time -l pnpm test <file-or-files> --maxWorkers=1 --reporter=verbose

For import-heavy suspicion add:

OPENCLAW_VITEST_IMPORT_DURATIONS=1 OPENCLAW_VITEST_PRINT_IMPORT_BREAKDOWN=1

Separate wall/runner noise from real file cost:
- Compare Vitest duration, test body timing, import breakdown, wall time, and max RSS.
- Re-run single files when grouped/full-suite numbers look stale or noisy.
- If a full-suite grouped run reports a lane failure but JSON says tests passed, capture that as harness/noise and verify the suspect file directly.
Pick the next attack by return and risk:
- High return: one file/test dominates seconds or RSS and has a clear root.
- High leverage: one plugin or SDK barrel causes every plugin-inspector or extension-batch run to load broad runtime.
- Lower risk: static descriptors, target parsing, routing, auth bypass, setup hints, registry fixtures, or test server lifecycle.
- Higher risk: real memory/runtime behavior, live providers, protocol contracts, or broad production refactors.
Fix the root cause, not the symptom:
- Move static metadata/parsing into narrow helpers or lightweight artifacts reused by full runtime and fast paths.
- Prefer dependency injection, loaded-plugin-only lookup, explicit fixtures, and pure helpers over broad mocks.
- Reuse suite-level servers/clients when a fresh handshake is irrelevant.
- Keep schedulers/background loops off unless the test proves scheduling.
- In plugin paths, move static metadata into manifest/lightweight artifacts and keep runtime plugin loads behind explicit execution boundaries.
Preserve coverage shape:
- Do not delete a slow integration proof unless the exact production composition is extracted into a named helper and tested.
- Keep one cheap integration smoke when cross-component wiring matters.
- State explicitly what incidental coverage was removed, if any.
Re-benchmark the same command after the change and compute seconds plus percent gain.
Update the running report when requested or when this thread is tracking one. Include before/after commands, artifacts, coverage notes, verification, and next attack order.
Commit with
```
scripts/committer "<message>" <paths...>
```
and push when the user asked for commits/pushes. Stage only files touched for this attack.

在编辑前阅读相关本地
```
AGENTS.md
```
文件：
- ```
src/agents/AGENTS.md
```
  ：代理/导入热点相关内容。
- ```
src/channels/AGENTS.md
```
  和
```
src/plugins/AGENTS.md
```
  ：插件/通道懒加载相关内容。
- ```
src/gateway/AGENTS.md
```
  ：服务器生命周期测试相关内容。
- ```
test/helpers/AGENTS.md
```
  和
```
test/helpers/channels/AGENTS.md
```
  ：共享契约工具相关内容。
- ```
src/infra/outbound/AGENTS.md
```
  ：出站/媒体/动作测试相关内容。
修改代码前先建立基准线：
- 如需全套件排名，优先使用
```
pnpm test:perf:groups --full-suite --allow-failures --output <file>
```
  命令。
- 如需测试捆绑插件的广度，在进行全扩展扫描前，先运行最小范围的相关命令：
```
pnpm test:extensions:batch <plugin[,plugin...]>
```
  或plugin-inspector命令。
- 如需定位特定热点，使用：
```
/usr/bin/time -l pnpm test <file-or-files> --maxWorkers=1 --reporter=verbose
```
- 如需排查导入负载问题，添加环境变量：
```
OPENCLAW_VITEST_IMPORT_DURATIONS=1 OPENCLAW_VITEST_PRINT_IMPORT_BREAKDOWN=1
```
  。
区分实际文件开销与运行器/耗时噪声：
- 对比Vitest耗时、测试主体计时、导入细分数据、实际耗时及最大RSS占用。
- 当分组/全套件数据过时或存在噪声时，重新运行单个文件测试。
- 如果全套件分组运行报告某个测试通道失败，但JSON显示测试通过，则将其视为测试框架/噪声问题，直接验证可疑文件。
根据收益与风险选择下一个优化方向：
- 高收益：单个文件/测试占用大量时间或RSS，且有明确的根源。
- 高杠杆：单个插件或SDK入口文件导致每次plugin-inspector或扩展批量运行都加载庞大的运行时。
- 低风险：静态描述符、目标解析、路由、认证绕过、设置提示、注册表 fixtures 或测试服务器生命周期优化。
- 高风险：实际内存/运行时行为、实时提供者、协议契约或大规模生产代码重构。
修复根源而非症状：
- 将静态元数据/解析逻辑移至窄范围工具或轻量级工件，供完整运行时和快速路径复用。
- 优先使用依赖注入、仅加载已安装插件的查找机制、显式 fixtures 和纯工具函数，而非大范围模拟。
- 当无需重新握手时，复用套件级别的服务器/客户端。
- 除非测试需要调度功能，否则关闭调度器/后台循环。
- 在插件路径中，将静态元数据移至清单/轻量级工件，并将运行时插件加载置于显式执行边界之后。
保持覆盖率形态：
- 除非将精确的生产组合提取到命名工具函数中并完成测试，否则不要删除慢集成测试用例。
- 当跨组件连接至关重要时，保留一个低成本的集成冒烟测试。
- 如有移除任何附带覆盖率，需明确说明。
修改后重新运行相同命令进行基准测试，计算时间节省及百分比提升。
当被要求或当前线程在追踪相关工作时，更新运行报告。包含修改前后的命令、工件、覆盖率说明、验证结果及下一个优化方向。
使用
```
scripts/committer "<message>" <paths...>
```
提交代码，并在用户要求时推送。仅暂存本次优化涉及的文件。

Plugin-Suite Workflow

插件套件工作流程

Use this section when perf work involves bundled plugins, plugin-inspector, SDK barrels, package-boundary tests, or extension suites.

Map the suite shape first:

source tests:

pnpm test extensions/<id>

pnpm test:extensions:batch <id>

package boundaries:

pnpm run test:extensions:package-boundary:canary

and

pnpm run test:extensions:package-boundary:compile

all bundled source tests:
```
pnpm test:extensions
```

plugin import memory:

pnpm test:extensions:memory -- --json .artifacts/test-perf/extensions-memory.json

plugin-inspector/report work: keep report primitives in
```
plugin-inspector
```
; keep wrappers thin and collect peak RSS when the command supports it.

Start narrow, then widen:
- one plugin changed: run that plugin's tests and plugin-inspector slice.
- SDK/public barrel changed: add representative provider, channel, memory, and feature plugins.
- loader/runtime mirror changed: add package-boundary checks and build/package proof as needed.
- unknown shared plugin behavior: run
```
test:extensions:batch
```
  groups before
```
pnpm test:extensions
```
  .
Treat plugin-inspector failures as product signals:
- JSON must parse.
- warnings/errors must be classified, not hidden.
- runtime capture should be quiet and config-tolerant.
- command output should include wall time, exit code, and peak RSS when available.

For broad or package-heavy plugin proof, use Blacksmith Testbox by default on maintainer machines. Warm once and reuse the same box:

blacksmith testbox warmup ci-check-testbox.yml --ref main --idle-timeout 90

blacksmith testbox run --id <ID> "OPENCLAW_TESTBOX=1 pnpm test:extensions:batch <ids>"

stop the box when done.

If plugin performance is package-artifact sensitive, switch to
```
openclaw-pre-release-plugin-testing
```
and Package Acceptance rather than trusting source-only timing.

当性能优化工作涉及捆绑插件、plugin-inspector、SDK入口文件、包边界测试或扩展套件时，使用本节内容。

先梳理套件形态：

源码测试：

pnpm test extensions/<id>

或

pnpm test:extensions:batch <id>

包边界检查：

pnpm run test:extensions:package-boundary:canary

和

pnpm run test:extensions:package-boundary:compile

所有捆绑源码测试：
```
pnpm test:extensions
```

插件导入内存分析：

pnpm test:extensions:memory -- --json .artifacts/test-perf/extensions-memory.json

plugin-inspector/报告工作：将报告原语保留在
```
plugin-inspector
```
中；保持包装器轻量化，并在命令支持时收集峰值RSS。

从窄范围开始，逐步扩大：
- 仅修改了一个插件：运行该插件的测试及对应的plugin-inspector切片。
- 修改了SDK/公共入口文件：添加代表性的提供者、通道、内存和功能插件测试。
- 修改了加载器/运行时镜像：根据需要添加包边界检查及构建/包验证。
- 未知共享插件行为：在运行
```
pnpm test:extensions
```
  前，先运行
```
test:extensions:batch
```
  分组测试。
将plugin-inspector的失败视为产品信号：
- JSON必须可解析。
- 警告/错误必须分类，而非隐藏。
- 运行时捕获应静默且兼容配置。
- 命令输出应包含实际耗时、退出码，以及（如果可用）峰值RSS。
如需进行大范围或包相关的插件验证，默认在维护者机器上使用Blacksmith Testbox。预热一次后复用同一测试盒：
- ```
blacksmith testbox warmup ci-check-testbox.yml --ref main --idle-timeout 90
```
- ```
blacksmith testbox run --id <ID> "OPENCLAW_TESTBOX=1 pnpm test:extensions:batch <ids>"
```
- 使用完毕后停止测试盒。
如果插件性能对包工件敏感，切换至
```
openclaw-pre-release-plugin-testing
```
和Package Acceptance流程，而非仅依赖源码计时。

Metric Collection

指标收集

Collect at least one stable metric before and after. Prefer the same machine and same command. For Testbox comparisons, use the same

tbx_...

id when possible.

Metric	Use for	Preferred source
wall time	user-visible suite cost	`/usr/bin/time -l` , test wrapper duration, Testbox run time
Vitest duration	test body/import cost	Vitest output per file/shard
import duration	broad barrel/runtime loads	`OPENCLAW_VITEST_IMPORT_DURATIONS=1`
max RSS	memory pressure and OOM risk	`/usr/bin/time -l` , `pnpm test:extensions:memory` , wrapper memory summaries
CPU/user/sys	CPU-bound vs wait-bound split	`/usr/bin/time -l` locally, Testbox job timing when local CPU is noisy
heap snapshots	real leak vs retained module graph	`openclaw-test-heap-leaks` workflow

Local scoped command with CPU/RSS:

bash

timeout 240 /usr/bin/time -l pnpm test <file> --maxWorkers=1 --reporter=verbose

Plugin import memory profile:

bash

pnpm build
pnpm test:extensions:memory -- --top 20 --json .artifacts/test-perf/extensions-memory.json

Targeted plugin import memory:

bash

pnpm test:extensions:memory -- --extension discord --extension telegram --skip-combined

Heap/RSS escalation:

bash

OPENCLAW_TEST_MEMORY_TRACE=1 \
OPENCLAW_TEST_HEAPSNAPSHOT_INTERVAL_MS=60000 \
OPENCLAW_TEST_HEAPSNAPSHOT_DIR=.tmp/heapsnap \
OPENCLAW_TEST_WORKERS=2 \
OPENCLAW_TEST_MAX_OLD_SPACE_SIZE_MB=6144 \
pnpm test

Use

openclaw-test-heap-leaks

when RSS keeps growing across intervals, workers OOM, or the suspect command has app-object retention. Do not call RSS growth a leak until snapshots or retainers support it.

修改前后至少收集一项稳定指标。优先使用同一机器和同一命令。如需对比Testbox结果，尽可能使用相同的

tbx_...

ID。

指标	适用场景	首选数据来源
实际耗时（wall time）	用户可见的套件开销	`/usr/bin/time -l` 、测试包装器耗时、Testbox运行时间
Vitest耗时	测试主体/导入开销	每个文件/分片的Vitest输出
导入耗时	大范围入口文件/运行时加载	`OPENCLAW_VITEST_IMPORT_DURATIONS=1`
最大RSS	内存压力及OOM风险	`/usr/bin/time -l` 、 `pnpm test:extensions:memory` 、包装器内存汇总
CPU/user/sys	CPU密集型与等待密集型任务拆分	本地使用 `/usr/bin/time -l` ，当本地CPU存在噪声时使用Testbox任务计时
堆快照	真实内存泄漏与保留模块图区分	`openclaw-test-heap-leaks` 工作流

本地特定范围命令（含CPU/RSS）：

bash

timeout 240 /usr/bin/time -l pnpm test <file> --maxWorkers=1 --reporter=verbose

插件导入内存分析：

bash

pnpm build
pnpm test:extensions:memory -- --top 20 --json .artifacts/test-perf/extensions-memory.json

目标插件导入内存分析：

bash

pnpm test:extensions:memory -- --extension discord --extension telegram --skip-combined

堆/RSS升级分析：

bash

OPENCLAW_TEST_MEMORY_TRACE=1 \
OPENCLAW_TEST_HEAPSNAPSHOT_INTERVAL_MS=60000 \
OPENCLAW_TEST_HEAPSNAPSHOT_DIR=.tmp/heapsnap \
OPENCLAW_TEST_WORKERS=2 \
OPENCLAW_TEST_MAX_OLD_SPACE_SIZE_MB=6144 \
pnpm test

当RSS在多个间隔持续增长、工作进程OOM，或可疑命令存在应用对象保留时，使用

openclaw-test-heap-leaks

。除非快照或保留器数据支持，否则不要将RSS增长判定为内存泄漏。

Common Root Causes

常见根源

Full bundled channel/plugin runtime loaded for static data.
```
getChannelPlugin()
```
fallback used when an already-loaded fixture or pure parser would suffice.
Broad
```
api.ts
```
,
```
runtime-api.ts
```
,
```
test-api.ts
```
, or plugin-sdk barrels pulled into hot tests.
SDK root aliases or package barrels pulling focused subpaths back into a broad plugin graph.
Plugin-inspector loading runtime code just to render metadata, reports, or CI policy scores.
Bundled plugin capture reusing real config/home state instead of synthetic, redacted, isolated state.
Partial-real mocks using
```
importActual()
```
around broad modules.
```
vi.resetModules()
```
plus fresh imports in per-test loops.
Test plugin registry seeded in
```
beforeAll
```
while runtime state resets in
```
afterEach
```
.
Per-test gateway/server/client startup when state reset would suffice.
Runtime/default model/auth selection paid by idle snapshots or fixtures.
Plugin-owned media/action discovery triggered before checking whether args contain plugin-owned fields.
Timings missing from
```
test/fixtures/test-timings.unit.json
```
, causing hotspot files to stay in shared workers.

Parallel Vitest runs sharing

node_modules/.experimental-vitest-cache

without distinct

OPENCLAW_VITEST_FS_MODULE_CACHE_PATH

values.

为获取静态数据而加载完整的捆绑通道/插件运行时。
在已有加载的fixture或纯解析器可用时，仍使用
```
getChannelPlugin()
```
回退逻辑。
热点测试中引入了大范围的
```
api.ts
```
、
```
runtime-api.ts
```
、
```
test-api.ts
```
或插件SDK入口文件。
SDK根别名或包入口文件将聚焦的子路径重新引入到庞大的插件图中。
Plugin-inspector仅为渲染元数据、报告或CI策略评分而加载运行时代码。
捆绑插件捕获复用真实配置/本地状态，而非合成、脱敏、隔离的状态。
使用
```
importActual()
```
对大范围模块进行半真实模拟。
在每个测试循环中同时使用
```
vi.resetModules()
```
和重新导入。
在
```
beforeAll
```
中初始化测试插件注册表，而在
```
afterEach
```
中重置运行时状态。
每次测试都启动网关/服务器/客户端，而仅重置状态即可满足需求。
空闲快照或fixtures需承担运行时/默认模型/认证选择的开销。
在检查参数是否包含插件所属字段前，触发了插件所属的媒体/动作发现逻辑。
```
test/fixtures/test-timings.unit.json
```
中缺少计时数据，导致热点文件留在共享工作进程中。

并行Vitest运行共享

node_modules/.experimental-vitest-cache

，但未设置不同的

OPENCLAW_VITEST_FS_MODULE_CACHE_PATH

值。

Benchmark Commands

基准测试命令

Scoped file:

bash

timeout 240 /usr/bin/time -l pnpm test <file> --maxWorkers=1 --reporter=verbose

Scoped file with import breakdown:

bash

timeout 240 /usr/bin/time -l env \
  OPENCLAW_VITEST_IMPORT_DURATIONS=1 \
  OPENCLAW_VITEST_PRINT_IMPORT_BREAKDOWN=1 \
  pnpm test <file> --maxWorkers=1 --reporter=verbose

Grouped suite:

bash

pnpm test:perf:groups --full-suite --allow-failures \
  --output .artifacts/test-perf/<name>.json

Extension batch:

bash

pnpm test:extensions:batch <plugin[,plugin...]> -- --reporter=verbose

All extension tests:

bash

pnpm test:extensions

Package-boundary plugin checks:

bash

pnpm run test:extensions:package-boundary:canary
pnpm run test:extensions:package-boundary:compile

Reuse an existing Vitest JSON report:

bash

pnpm test:perf:groups --report <vitest-json> \
  --output .artifacts/test-perf/<name>.json

特定文件测试：

bash

timeout 240 /usr/bin/time -l pnpm test <file> --maxWorkers=1 --reporter=verbose

带导入细分的特定文件测试：

bash

timeout 240 /usr/bin/time -l env \
  OPENCLAW_VITEST_IMPORT_DURATIONS=1 \
  OPENCLAW_VITEST_PRINT_IMPORT_BREAKDOWN=1 \
  pnpm test <file> --maxWorkers=1 --reporter=verbose

分组套件测试：

bash

pnpm test:perf:groups --full-suite --allow-failures \
  --output .artifacts/test-perf/<name>.json

扩展批量测试：

bash

pnpm test:extensions:batch <plugin[,plugin...]> -- --reporter=verbose

所有扩展测试：

bash

pnpm test:extensions

包边界插件检查：

bash

pnpm run test:extensions:package-boundary:canary
pnpm run test:extensions:package-boundary:compile

复用现有Vitest JSON报告：

bash

pnpm test:perf:groups --report <vitest-json> \
  --output .artifacts/test-perf/<name>.json

Verification

验证

Always run the targeted test surface that proves the change.
For source changes, run
```
pnpm check:changed
```
before push; in maintainer Testbox mode run it in the warmed Testbox.
For test-only changes, run
```
pnpm test:changed
```
or the exact edited tests.
Run
```
pnpm build
```
when touching lazy-loading, bundled artifacts, package boundaries, dynamic imports, build output, or public surfaces.
For plugin SDK/barrel/runtime changes, add
```
pnpm plugin-sdk:api:check
```
or
```
pnpm plugin-sdk:api:gen
```
when the API surface may drift.
For plugin-suite perf fixes, verify at least one representative plugin batch plus the changed gate; use Package Acceptance if the bug only exists in a packed artifact.
If deps are missing/stale, run
```
pnpm install
```
and retry the exact failed command once.
Use the report format:

markdown

| Metric         | Before |  After |          Gain |
| -------------- | -----: | -----: | ------------: |
| File wall time |   `Xs` |   `Ys` |  `-Zs` (`P%`) |
| Max RSS        |  `XMB` |  `YMB` | `-ZMB` (`P%`) |
| CPU user/sys   | `X/Ys` | `A/Bs` |       explain |

始终运行能验证修改效果的目标测试范围。
对于源码修改，推送前运行
```
pnpm check:changed
```
；在维护者Testbox模式下，在预热后的Testbox中运行该命令。
对于仅修改测试的情况，运行
```
pnpm test:changed
```
或精确的已编辑测试。
当涉及懒加载、捆绑工件、包边界、动态导入、构建输出或公共接口时，运行
```
pnpm build
```
。
对于插件SDK/入口文件/运行时修改，当API表面可能发生变化时，添加
```
pnpm plugin-sdk:api:check
```
或
```
pnpm plugin-sdk:api:gen
```
。
对于插件套件性能优化，至少验证一个代表性插件批量测试及修改的入口；如果问题仅存在于打包后的工件中，使用Package Acceptance流程。
如果依赖缺失/过时，运行
```
pnpm install
```
并重新尝试一次失败的命令。
使用以下报告格式：

markdown

| 指标         | 修改前 | 修改后 | 提升幅度 |
| -------------- | -----: | -----: | ------------: |
| 文件实际耗时 |   `Xs` |   `Ys` |  `-Zs` (`P%`) |
| 最大RSS        |  `XMB` |  `YMB` | `-ZMB` (`P%`) |
| CPU user/sys   | `X/Ys` | `A/Bs` |       说明 |

Handoff

交接

Keep the final concise:

Root cause.
Suite/plugin scope.
Files changed.
Before/after wall, Vitest/import, CPU, and RSS numbers where available.
Leak classification if memory was involved: real leak, retained module graph, or inconclusive.
Coverage retained.
Verification commands.
Testbox ID or workflow URL for remote proof.
Commit hash and push status.

最终内容需简洁明了，包含：

问题根源。
套件/插件范围。
修改的文件。
修改前后的实际耗时、Vitest/导入耗时、CPU及RSS数据（如有）。
内存相关问题的分类：真实泄漏、保留模块图或不确定。
保留的覆盖率情况。
验证命令。
远程验证的Testbox ID或工作流URL。
提交哈希及推送状态。