openclaw-test-performance

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

OpenClaw Test Performance

OpenClaw 测试性能优化

Use evidence first. The goal is real
pnpm test
, plugin-suite, and plugin-inspector speed/RSS improvement with coverage intact, not runner tuning by guesswork.
秉持实证优先原则。目标是在保持覆盖率的前提下,切实提升
pnpm test
、插件套件及plugin-inspector的速度与RSS占用,而非凭猜测调整运行器。

Workflow

工作流程

  1. Read the relevant local
    AGENTS.md
    files before editing:
    • src/agents/AGENTS.md
      for agent/import hotspots.
    • src/channels/AGENTS.md
      and
      src/plugins/AGENTS.md
      for plugin/channel laziness.
    • src/gateway/AGENTS.md
      for server lifecycle tests.
    • test/helpers/AGENTS.md
      and
      test/helpers/channels/AGENTS.md
      for shared contract helpers.
    • src/infra/outbound/AGENTS.md
      for outbound/media/action tests.
  2. Establish a baseline before changing code:
    • Prefer
      pnpm test:perf:groups --full-suite --allow-failures --output <file>
      for full-suite ranking.
    • For bundled plugin breadth, run the smallest relevant
      pnpm test:extensions:batch <plugin[,plugin...]>
      or plugin-inspector command before jumping to the full extension sweep.
    • For a scoped hotspot use:
      /usr/bin/time -l pnpm test <file-or-files> --maxWorkers=1 --reporter=verbose
    • For import-heavy suspicion add:
      OPENCLAW_VITEST_IMPORT_DURATIONS=1 OPENCLAW_VITEST_PRINT_IMPORT_BREAKDOWN=1
      .
  3. Separate wall/runner noise from real file cost:
    • Compare Vitest duration, test body timing, import breakdown, wall time, and max RSS.
    • Re-run single files when grouped/full-suite numbers look stale or noisy.
    • If a full-suite grouped run reports a lane failure but JSON says tests passed, capture that as harness/noise and verify the suspect file directly.
  4. Pick the next attack by return and risk:
    • High return: one file/test dominates seconds or RSS and has a clear root.
    • High leverage: one plugin or SDK barrel causes every plugin-inspector or extension-batch run to load broad runtime.
    • Lower risk: static descriptors, target parsing, routing, auth bypass, setup hints, registry fixtures, or test server lifecycle.
    • Higher risk: real memory/runtime behavior, live providers, protocol contracts, or broad production refactors.
  5. Fix the root cause, not the symptom:
    • Move static metadata/parsing into narrow helpers or lightweight artifacts reused by full runtime and fast paths.
    • Prefer dependency injection, loaded-plugin-only lookup, explicit fixtures, and pure helpers over broad mocks.
    • Reuse suite-level servers/clients when a fresh handshake is irrelevant.
    • Keep schedulers/background loops off unless the test proves scheduling.
    • In plugin paths, move static metadata into manifest/lightweight artifacts and keep runtime plugin loads behind explicit execution boundaries.
  6. Preserve coverage shape:
    • Do not delete a slow integration proof unless the exact production composition is extracted into a named helper and tested.
    • Keep one cheap integration smoke when cross-component wiring matters.
    • State explicitly what incidental coverage was removed, if any.
  7. Re-benchmark the same command after the change and compute seconds plus percent gain.
  8. Update the running report when requested or when this thread is tracking one. Include before/after commands, artifacts, coverage notes, verification, and next attack order.
  9. Commit with
    scripts/committer "<message>" <paths...>
    and push when the user asked for commits/pushes. Stage only files touched for this attack.
  1. 在编辑前阅读相关本地
    AGENTS.md
    文件:
    • src/agents/AGENTS.md
      :代理/导入热点相关内容。
    • src/channels/AGENTS.md
      src/plugins/AGENTS.md
      :插件/通道懒加载相关内容。
    • src/gateway/AGENTS.md
      :服务器生命周期测试相关内容。
    • test/helpers/AGENTS.md
      test/helpers/channels/AGENTS.md
      :共享契约工具相关内容。
    • src/infra/outbound/AGENTS.md
      :出站/媒体/动作测试相关内容。
  2. 修改代码前先建立基准线:
    • 如需全套件排名,优先使用
      pnpm test:perf:groups --full-suite --allow-failures --output <file>
      命令。
    • 如需测试捆绑插件的广度,在进行全扩展扫描前,先运行最小范围的相关命令:
      pnpm test:extensions:batch <plugin[,plugin...]>
      或plugin-inspector命令。
    • 如需定位特定热点,使用:
      /usr/bin/time -l pnpm test <file-or-files> --maxWorkers=1 --reporter=verbose
    • 如需排查导入负载问题,添加环境变量:
      OPENCLAW_VITEST_IMPORT_DURATIONS=1 OPENCLAW_VITEST_PRINT_IMPORT_BREAKDOWN=1
  3. 区分实际文件开销与运行器/耗时噪声:
    • 对比Vitest耗时、测试主体计时、导入细分数据、实际耗时及最大RSS占用。
    • 当分组/全套件数据过时或存在噪声时,重新运行单个文件测试。
    • 如果全套件分组运行报告某个测试通道失败,但JSON显示测试通过,则将其视为测试框架/噪声问题,直接验证可疑文件。
  4. 根据收益与风险选择下一个优化方向:
    • 高收益:单个文件/测试占用大量时间或RSS,且有明确的根源。
    • 高杠杆:单个插件或SDK入口文件导致每次plugin-inspector或扩展批量运行都加载庞大的运行时。
    • 低风险:静态描述符、目标解析、路由、认证绕过、设置提示、注册表 fixtures 或测试服务器生命周期优化。
    • 高风险:实际内存/运行时行为、实时提供者、协议契约或大规模生产代码重构。
  5. 修复根源而非症状:
    • 将静态元数据/解析逻辑移至窄范围工具或轻量级工件,供完整运行时和快速路径复用。
    • 优先使用依赖注入、仅加载已安装插件的查找机制、显式 fixtures 和纯工具函数,而非大范围模拟。
    • 当无需重新握手时,复用套件级别的服务器/客户端。
    • 除非测试需要调度功能,否则关闭调度器/后台循环。
    • 在插件路径中,将静态元数据移至清单/轻量级工件,并将运行时插件加载置于显式执行边界之后。
  6. 保持覆盖率形态:
    • 除非将精确的生产组合提取到命名工具函数中并完成测试,否则不要删除慢集成测试用例。
    • 当跨组件连接至关重要时,保留一个低成本的集成冒烟测试。
    • 如有移除任何附带覆盖率,需明确说明。
  7. 修改后重新运行相同命令进行基准测试,计算时间节省及百分比提升。
  8. 当被要求或当前线程在追踪相关工作时,更新运行报告。包含修改前后的命令、工件、覆盖率说明、验证结果及下一个优化方向。
  9. 使用
    scripts/committer "<message>" <paths...>
    提交代码,并在用户要求时推送。仅暂存本次优化涉及的文件。

Plugin-Suite Workflow

插件套件工作流程

Use this section when perf work involves bundled plugins, plugin-inspector, SDK barrels, package-boundary tests, or extension suites.
  1. Map the suite shape first:
    • source tests:
      pnpm test extensions/<id>
      or
      pnpm test:extensions:batch <id>
    • package boundaries:
      pnpm run test:extensions:package-boundary:canary
      and
      pnpm run test:extensions:package-boundary:compile
    • all bundled source tests:
      pnpm test:extensions
    • plugin import memory:
      pnpm test:extensions:memory -- --json .artifacts/test-perf/extensions-memory.json
    • plugin-inspector/report work: keep report primitives in
      plugin-inspector
      ; keep wrappers thin and collect peak RSS when the command supports it.
  2. Start narrow, then widen:
    • one plugin changed: run that plugin's tests and plugin-inspector slice.
    • SDK/public barrel changed: add representative provider, channel, memory, and feature plugins.
    • loader/runtime mirror changed: add package-boundary checks and build/package proof as needed.
    • unknown shared plugin behavior: run
      test:extensions:batch
      groups before
      pnpm test:extensions
      .
  3. Treat plugin-inspector failures as product signals:
    • JSON must parse.
    • warnings/errors must be classified, not hidden.
    • runtime capture should be quiet and config-tolerant.
    • command output should include wall time, exit code, and peak RSS when available.
  4. For broad or package-heavy plugin proof, use Blacksmith Testbox by default on maintainer machines. Warm once and reuse the same box:
    • blacksmith testbox warmup ci-check-testbox.yml --ref main --idle-timeout 90
    • blacksmith testbox run --id <ID> "OPENCLAW_TESTBOX=1 pnpm test:extensions:batch <ids>"
    • stop the box when done.
  5. If plugin performance is package-artifact sensitive, switch to
    openclaw-pre-release-plugin-testing
    and Package Acceptance rather than trusting source-only timing.
当性能优化工作涉及捆绑插件、plugin-inspector、SDK入口文件、包边界测试或扩展套件时,使用本节内容。
  1. 先梳理套件形态:
    • 源码测试:
      pnpm test extensions/<id>
      pnpm test:extensions:batch <id>
    • 包边界检查:
      pnpm run test:extensions:package-boundary:canary
      pnpm run test:extensions:package-boundary:compile
    • 所有捆绑源码测试:
      pnpm test:extensions
    • 插件导入内存分析:
      pnpm test:extensions:memory -- --json .artifacts/test-perf/extensions-memory.json
    • plugin-inspector/报告工作:将报告原语保留在
      plugin-inspector
      中;保持包装器轻量化,并在命令支持时收集峰值RSS。
  2. 从窄范围开始,逐步扩大:
    • 仅修改了一个插件:运行该插件的测试及对应的plugin-inspector切片。
    • 修改了SDK/公共入口文件:添加代表性的提供者、通道、内存和功能插件测试。
    • 修改了加载器/运行时镜像:根据需要添加包边界检查及构建/包验证。
    • 未知共享插件行为:在运行
      pnpm test:extensions
      前,先运行
      test:extensions:batch
      分组测试。
  3. 将plugin-inspector的失败视为产品信号:
    • JSON必须可解析。
    • 警告/错误必须分类,而非隐藏。
    • 运行时捕获应静默且兼容配置。
    • 命令输出应包含实际耗时、退出码,以及(如果可用)峰值RSS。
  4. 如需进行大范围或包相关的插件验证,默认在维护者机器上使用Blacksmith Testbox。预热一次后复用同一测试盒:
    • blacksmith testbox warmup ci-check-testbox.yml --ref main --idle-timeout 90
    • blacksmith testbox run --id <ID> "OPENCLAW_TESTBOX=1 pnpm test:extensions:batch <ids>"
    • 使用完毕后停止测试盒。
  5. 如果插件性能对包工件敏感,切换至
    openclaw-pre-release-plugin-testing
    和Package Acceptance流程,而非仅依赖源码计时。

Metric Collection

指标收集

Collect at least one stable metric before and after. Prefer the same machine and same command. For Testbox comparisons, use the same
tbx_...
id when possible.
MetricUse forPreferred source
wall timeuser-visible suite cost
/usr/bin/time -l
, test wrapper duration, Testbox run time
Vitest durationtest body/import costVitest output per file/shard
import durationbroad barrel/runtime loads
OPENCLAW_VITEST_IMPORT_DURATIONS=1
max RSSmemory pressure and OOM risk
/usr/bin/time -l
,
pnpm test:extensions:memory
, wrapper memory summaries
CPU/user/sysCPU-bound vs wait-bound split
/usr/bin/time -l
locally, Testbox job timing when local CPU is noisy
heap snapshotsreal leak vs retained module graph
openclaw-test-heap-leaks
workflow
Local scoped command with CPU/RSS:
bash
timeout 240 /usr/bin/time -l pnpm test <file> --maxWorkers=1 --reporter=verbose
Plugin import memory profile:
bash
pnpm build
pnpm test:extensions:memory -- --top 20 --json .artifacts/test-perf/extensions-memory.json
Targeted plugin import memory:
bash
pnpm test:extensions:memory -- --extension discord --extension telegram --skip-combined
Heap/RSS escalation:
bash
OPENCLAW_TEST_MEMORY_TRACE=1 \
OPENCLAW_TEST_HEAPSNAPSHOT_INTERVAL_MS=60000 \
OPENCLAW_TEST_HEAPSNAPSHOT_DIR=.tmp/heapsnap \
OPENCLAW_TEST_WORKERS=2 \
OPENCLAW_TEST_MAX_OLD_SPACE_SIZE_MB=6144 \
pnpm test
Use
openclaw-test-heap-leaks
when RSS keeps growing across intervals, workers OOM, or the suspect command has app-object retention. Do not call RSS growth a leak until snapshots or retainers support it.
修改前后至少收集一项稳定指标。优先使用同一机器和同一命令。如需对比Testbox结果,尽可能使用相同的
tbx_...
ID。
指标适用场景首选数据来源
实际耗时(wall time)用户可见的套件开销
/usr/bin/time -l
、测试包装器耗时、Testbox运行时间
Vitest耗时测试主体/导入开销每个文件/分片的Vitest输出
导入耗时大范围入口文件/运行时加载
OPENCLAW_VITEST_IMPORT_DURATIONS=1
最大RSS内存压力及OOM风险
/usr/bin/time -l
pnpm test:extensions:memory
、包装器内存汇总
CPU/user/sysCPU密集型与等待密集型任务拆分本地使用
/usr/bin/time -l
,当本地CPU存在噪声时使用Testbox任务计时
堆快照真实内存泄漏与保留模块图区分
openclaw-test-heap-leaks
工作流
本地特定范围命令(含CPU/RSS):
bash
timeout 240 /usr/bin/time -l pnpm test <file> --maxWorkers=1 --reporter=verbose
插件导入内存分析:
bash
pnpm build
pnpm test:extensions:memory -- --top 20 --json .artifacts/test-perf/extensions-memory.json
目标插件导入内存分析:
bash
pnpm test:extensions:memory -- --extension discord --extension telegram --skip-combined
堆/RSS升级分析:
bash
OPENCLAW_TEST_MEMORY_TRACE=1 \
OPENCLAW_TEST_HEAPSNAPSHOT_INTERVAL_MS=60000 \
OPENCLAW_TEST_HEAPSNAPSHOT_DIR=.tmp/heapsnap \
OPENCLAW_TEST_WORKERS=2 \
OPENCLAW_TEST_MAX_OLD_SPACE_SIZE_MB=6144 \
pnpm test
当RSS在多个间隔持续增长、工作进程OOM,或可疑命令存在应用对象保留时,使用
openclaw-test-heap-leaks
。除非快照或保留器数据支持,否则不要将RSS增长判定为内存泄漏。

Common Root Causes

常见根源

  • Full bundled channel/plugin runtime loaded for static data.
  • getChannelPlugin()
    fallback used when an already-loaded fixture or pure parser would suffice.
  • Broad
    api.ts
    ,
    runtime-api.ts
    ,
    test-api.ts
    , or plugin-sdk barrels pulled into hot tests.
  • SDK root aliases or package barrels pulling focused subpaths back into a broad plugin graph.
  • Plugin-inspector loading runtime code just to render metadata, reports, or CI policy scores.
  • Bundled plugin capture reusing real config/home state instead of synthetic, redacted, isolated state.
  • Partial-real mocks using
    importActual()
    around broad modules.
  • vi.resetModules()
    plus fresh imports in per-test loops.
  • Test plugin registry seeded in
    beforeAll
    while runtime state resets in
    afterEach
    .
  • Per-test gateway/server/client startup when state reset would suffice.
  • Runtime/default model/auth selection paid by idle snapshots or fixtures.
  • Plugin-owned media/action discovery triggered before checking whether args contain plugin-owned fields.
  • Timings missing from
    test/fixtures/test-timings.unit.json
    , causing hotspot files to stay in shared workers.
  • Parallel Vitest runs sharing
    node_modules/.experimental-vitest-cache
    without distinct
    OPENCLAW_VITEST_FS_MODULE_CACHE_PATH
    values.
  • 为获取静态数据而加载完整的捆绑通道/插件运行时。
  • 在已有加载的fixture或纯解析器可用时,仍使用
    getChannelPlugin()
    回退逻辑。
  • 热点测试中引入了大范围的
    api.ts
    runtime-api.ts
    test-api.ts
    或插件SDK入口文件。
  • SDK根别名或包入口文件将聚焦的子路径重新引入到庞大的插件图中。
  • Plugin-inspector仅为渲染元数据、报告或CI策略评分而加载运行时代码。
  • 捆绑插件捕获复用真实配置/本地状态,而非合成、脱敏、隔离的状态。
  • 使用
    importActual()
    对大范围模块进行半真实模拟。
  • 在每个测试循环中同时使用
    vi.resetModules()
    和重新导入。
  • beforeAll
    中初始化测试插件注册表,而在
    afterEach
    中重置运行时状态。
  • 每次测试都启动网关/服务器/客户端,而仅重置状态即可满足需求。
  • 空闲快照或fixtures需承担运行时/默认模型/认证选择的开销。
  • 在检查参数是否包含插件所属字段前,触发了插件所属的媒体/动作发现逻辑。
  • test/fixtures/test-timings.unit.json
    中缺少计时数据,导致热点文件留在共享工作进程中。
  • 并行Vitest运行共享
    node_modules/.experimental-vitest-cache
    ,但未设置不同的
    OPENCLAW_VITEST_FS_MODULE_CACHE_PATH
    值。

Benchmark Commands

基准测试命令

Scoped file:
bash
timeout 240 /usr/bin/time -l pnpm test <file> --maxWorkers=1 --reporter=verbose
Scoped file with import breakdown:
bash
timeout 240 /usr/bin/time -l env \
  OPENCLAW_VITEST_IMPORT_DURATIONS=1 \
  OPENCLAW_VITEST_PRINT_IMPORT_BREAKDOWN=1 \
  pnpm test <file> --maxWorkers=1 --reporter=verbose
Grouped suite:
bash
pnpm test:perf:groups --full-suite --allow-failures \
  --output .artifacts/test-perf/<name>.json
Extension batch:
bash
pnpm test:extensions:batch <plugin[,plugin...]> -- --reporter=verbose
All extension tests:
bash
pnpm test:extensions
Package-boundary plugin checks:
bash
pnpm run test:extensions:package-boundary:canary
pnpm run test:extensions:package-boundary:compile
Reuse an existing Vitest JSON report:
bash
pnpm test:perf:groups --report <vitest-json> \
  --output .artifacts/test-perf/<name>.json
特定文件测试:
bash
timeout 240 /usr/bin/time -l pnpm test <file> --maxWorkers=1 --reporter=verbose
带导入细分的特定文件测试:
bash
timeout 240 /usr/bin/time -l env \
  OPENCLAW_VITEST_IMPORT_DURATIONS=1 \
  OPENCLAW_VITEST_PRINT_IMPORT_BREAKDOWN=1 \
  pnpm test <file> --maxWorkers=1 --reporter=verbose
分组套件测试:
bash
pnpm test:perf:groups --full-suite --allow-failures \
  --output .artifacts/test-perf/<name>.json
扩展批量测试:
bash
pnpm test:extensions:batch <plugin[,plugin...]> -- --reporter=verbose
所有扩展测试:
bash
pnpm test:extensions
包边界插件检查:
bash
pnpm run test:extensions:package-boundary:canary
pnpm run test:extensions:package-boundary:compile
复用现有Vitest JSON报告:
bash
pnpm test:perf:groups --report <vitest-json> \
  --output .artifacts/test-perf/<name>.json

Verification

验证

  • Always run the targeted test surface that proves the change.
  • For source changes, run
    pnpm check:changed
    before push; in maintainer Testbox mode run it in the warmed Testbox.
  • For test-only changes, run
    pnpm test:changed
    or the exact edited tests.
  • Run
    pnpm build
    when touching lazy-loading, bundled artifacts, package boundaries, dynamic imports, build output, or public surfaces.
  • For plugin SDK/barrel/runtime changes, add
    pnpm plugin-sdk:api:check
    or
    pnpm plugin-sdk:api:gen
    when the API surface may drift.
  • For plugin-suite perf fixes, verify at least one representative plugin batch plus the changed gate; use Package Acceptance if the bug only exists in a packed artifact.
  • If deps are missing/stale, run
    pnpm install
    and retry the exact failed command once.
  • Use the report format:
markdown
| Metric         | Before |  After |          Gain |
| -------------- | -----: | -----: | ------------: |
| File wall time |   `Xs` |   `Ys` |  `-Zs` (`P%`) |
| Max RSS        |  `XMB` |  `YMB` | `-ZMB` (`P%`) |
| CPU user/sys   | `X/Ys` | `A/Bs` |       explain |
  • 始终运行能验证修改效果的目标测试范围。
  • 对于源码修改,推送前运行
    pnpm check:changed
    ;在维护者Testbox模式下,在预热后的Testbox中运行该命令。
  • 对于仅修改测试的情况,运行
    pnpm test:changed
    或精确的已编辑测试。
  • 当涉及懒加载、捆绑工件、包边界、动态导入、构建输出或公共接口时,运行
    pnpm build
  • 对于插件SDK/入口文件/运行时修改,当API表面可能发生变化时,添加
    pnpm plugin-sdk:api:check
    pnpm plugin-sdk:api:gen
  • 对于插件套件性能优化,至少验证一个代表性插件批量测试及修改的入口;如果问题仅存在于打包后的工件中,使用Package Acceptance流程。
  • 如果依赖缺失/过时,运行
    pnpm install
    并重新尝试一次失败的命令。
  • 使用以下报告格式:
markdown
| 指标         | 修改前 | 修改后 | 提升幅度 |
| -------------- | -----: | -----: | ------------: |
| 文件实际耗时 |   `Xs` |   `Ys` |  `-Zs` (`P%`) |
| 最大RSS        |  `XMB` |  `YMB` | `-ZMB` (`P%`) |
| CPU user/sys   | `X/Ys` | `A/Bs` |       说明 |

Handoff

交接

Keep the final concise:
  • Root cause.
  • Suite/plugin scope.
  • Files changed.
  • Before/after wall, Vitest/import, CPU, and RSS numbers where available.
  • Leak classification if memory was involved: real leak, retained module graph, or inconclusive.
  • Coverage retained.
  • Verification commands.
  • Testbox ID or workflow URL for remote proof.
  • Commit hash and push status.
最终内容需简洁明了,包含:
  • 问题根源。
  • 套件/插件范围。
  • 修改的文件。
  • 修改前后的实际耗时、Vitest/导入耗时、CPU及RSS数据(如有)。
  • 内存相关问题的分类:真实泄漏、保留模块图或不确定。
  • 保留的覆盖率情况。
  • 验证命令。
  • 远程验证的Testbox ID或工作流URL。
  • 提交哈希及推送状态。