openclaw-test-performance
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseOpenClaw Test Performance
OpenClaw 测试性能优化
Use evidence first. The goal is real , plugin-suite, and
plugin-inspector speed/RSS improvement with coverage intact, not runner tuning by
guesswork.
pnpm test秉持实证优先原则。目标是在保持覆盖率的前提下,切实提升、插件套件及plugin-inspector的速度与RSS占用,而非凭猜测调整运行器。
pnpm testWorkflow
工作流程
- Read the relevant local files before editing:
AGENTS.md- for agent/import hotspots.
src/agents/AGENTS.md - and
src/channels/AGENTS.mdfor plugin/channel laziness.src/plugins/AGENTS.md - for server lifecycle tests.
src/gateway/AGENTS.md - and
test/helpers/AGENTS.mdfor shared contract helpers.test/helpers/channels/AGENTS.md - for outbound/media/action tests.
src/infra/outbound/AGENTS.md
- Establish a baseline before changing code:
- Prefer for full-suite ranking.
pnpm test:perf:groups --full-suite --allow-failures --output <file> - For bundled plugin breadth, run the smallest relevant or plugin-inspector command before jumping to the full extension sweep.
pnpm test:extensions:batch <plugin[,plugin...]> - For a scoped hotspot use:
/usr/bin/time -l pnpm test <file-or-files> --maxWorkers=1 --reporter=verbose - For import-heavy suspicion add:
.
OPENCLAW_VITEST_IMPORT_DURATIONS=1 OPENCLAW_VITEST_PRINT_IMPORT_BREAKDOWN=1
- Prefer
- Separate wall/runner noise from real file cost:
- Compare Vitest duration, test body timing, import breakdown, wall time, and max RSS.
- Re-run single files when grouped/full-suite numbers look stale or noisy.
- If a full-suite grouped run reports a lane failure but JSON says tests passed, capture that as harness/noise and verify the suspect file directly.
- Pick the next attack by return and risk:
- High return: one file/test dominates seconds or RSS and has a clear root.
- High leverage: one plugin or SDK barrel causes every plugin-inspector or extension-batch run to load broad runtime.
- Lower risk: static descriptors, target parsing, routing, auth bypass, setup hints, registry fixtures, or test server lifecycle.
- Higher risk: real memory/runtime behavior, live providers, protocol contracts, or broad production refactors.
- Fix the root cause, not the symptom:
- Move static metadata/parsing into narrow helpers or lightweight artifacts reused by full runtime and fast paths.
- Prefer dependency injection, loaded-plugin-only lookup, explicit fixtures, and pure helpers over broad mocks.
- Reuse suite-level servers/clients when a fresh handshake is irrelevant.
- Keep schedulers/background loops off unless the test proves scheduling.
- In plugin paths, move static metadata into manifest/lightweight artifacts and keep runtime plugin loads behind explicit execution boundaries.
- Preserve coverage shape:
- Do not delete a slow integration proof unless the exact production composition is extracted into a named helper and tested.
- Keep one cheap integration smoke when cross-component wiring matters.
- State explicitly what incidental coverage was removed, if any.
- Re-benchmark the same command after the change and compute seconds plus percent gain.
- Update the running report when requested or when this thread is tracking one. Include before/after commands, artifacts, coverage notes, verification, and next attack order.
- Commit with and push when the user asked for commits/pushes. Stage only files touched for this attack.
scripts/committer "<message>" <paths...>
- 在编辑前阅读相关本地文件:
AGENTS.md- :代理/导入热点相关内容。
src/agents/AGENTS.md - 和
src/channels/AGENTS.md:插件/通道懒加载相关内容。src/plugins/AGENTS.md - :服务器生命周期测试相关内容。
src/gateway/AGENTS.md - 和
test/helpers/AGENTS.md:共享契约工具相关内容。test/helpers/channels/AGENTS.md - :出站/媒体/动作测试相关内容。
src/infra/outbound/AGENTS.md
- 修改代码前先建立基准线:
- 如需全套件排名,优先使用命令。
pnpm test:perf:groups --full-suite --allow-failures --output <file> - 如需测试捆绑插件的广度,在进行全扩展扫描前,先运行最小范围的相关命令:或plugin-inspector命令。
pnpm test:extensions:batch <plugin[,plugin...]> - 如需定位特定热点,使用:
/usr/bin/time -l pnpm test <file-or-files> --maxWorkers=1 --reporter=verbose - 如需排查导入负载问题,添加环境变量:
。
OPENCLAW_VITEST_IMPORT_DURATIONS=1 OPENCLAW_VITEST_PRINT_IMPORT_BREAKDOWN=1
- 如需全套件排名,优先使用
- 区分实际文件开销与运行器/耗时噪声:
- 对比Vitest耗时、测试主体计时、导入细分数据、实际耗时及最大RSS占用。
- 当分组/全套件数据过时或存在噪声时,重新运行单个文件测试。
- 如果全套件分组运行报告某个测试通道失败,但JSON显示测试通过,则将其视为测试框架/噪声问题,直接验证可疑文件。
- 根据收益与风险选择下一个优化方向:
- 高收益:单个文件/测试占用大量时间或RSS,且有明确的根源。
- 高杠杆:单个插件或SDK入口文件导致每次plugin-inspector或扩展批量运行都加载庞大的运行时。
- 低风险:静态描述符、目标解析、路由、认证绕过、设置提示、注册表 fixtures 或测试服务器生命周期优化。
- 高风险:实际内存/运行时行为、实时提供者、协议契约或大规模生产代码重构。
- 修复根源而非症状:
- 将静态元数据/解析逻辑移至窄范围工具或轻量级工件,供完整运行时和快速路径复用。
- 优先使用依赖注入、仅加载已安装插件的查找机制、显式 fixtures 和纯工具函数,而非大范围模拟。
- 当无需重新握手时,复用套件级别的服务器/客户端。
- 除非测试需要调度功能,否则关闭调度器/后台循环。
- 在插件路径中,将静态元数据移至清单/轻量级工件,并将运行时插件加载置于显式执行边界之后。
- 保持覆盖率形态:
- 除非将精确的生产组合提取到命名工具函数中并完成测试,否则不要删除慢集成测试用例。
- 当跨组件连接至关重要时,保留一个低成本的集成冒烟测试。
- 如有移除任何附带覆盖率,需明确说明。
- 修改后重新运行相同命令进行基准测试,计算时间节省及百分比提升。
- 当被要求或当前线程在追踪相关工作时,更新运行报告。包含修改前后的命令、工件、覆盖率说明、验证结果及下一个优化方向。
- 使用提交代码,并在用户要求时推送。仅暂存本次优化涉及的文件。
scripts/committer "<message>" <paths...>
Plugin-Suite Workflow
插件套件工作流程
Use this section when perf work involves bundled plugins, plugin-inspector, SDK
barrels, package-boundary tests, or extension suites.
- Map the suite shape first:
- source tests: or
pnpm test extensions/<id>pnpm test:extensions:batch <id> - package boundaries: and
pnpm run test:extensions:package-boundary:canarypnpm run test:extensions:package-boundary:compile - all bundled source tests:
pnpm test:extensions - plugin import memory:
pnpm test:extensions:memory -- --json .artifacts/test-perf/extensions-memory.json - plugin-inspector/report work: keep report primitives in ; keep wrappers thin and collect peak RSS when the command supports it.
plugin-inspector
- source tests:
- Start narrow, then widen:
- one plugin changed: run that plugin's tests and plugin-inspector slice.
- SDK/public barrel changed: add representative provider, channel, memory, and feature plugins.
- loader/runtime mirror changed: add package-boundary checks and build/package proof as needed.
- unknown shared plugin behavior: run groups before
test:extensions:batch.pnpm test:extensions
- Treat plugin-inspector failures as product signals:
- JSON must parse.
- warnings/errors must be classified, not hidden.
- runtime capture should be quiet and config-tolerant.
- command output should include wall time, exit code, and peak RSS when available.
- For broad or package-heavy plugin proof, use Blacksmith Testbox by default on
maintainer machines. Warm once and reuse the same box:
blacksmith testbox warmup ci-check-testbox.yml --ref main --idle-timeout 90blacksmith testbox run --id <ID> "OPENCLAW_TESTBOX=1 pnpm test:extensions:batch <ids>"- stop the box when done.
- If plugin performance is package-artifact sensitive, switch to
and Package Acceptance rather than trusting source-only timing.
openclaw-pre-release-plugin-testing
当性能优化工作涉及捆绑插件、plugin-inspector、SDK入口文件、包边界测试或扩展套件时,使用本节内容。
- 先梳理套件形态:
- 源码测试:或
pnpm test extensions/<id>pnpm test:extensions:batch <id> - 包边界检查:和
pnpm run test:extensions:package-boundary:canarypnpm run test:extensions:package-boundary:compile - 所有捆绑源码测试:
pnpm test:extensions - 插件导入内存分析:
pnpm test:extensions:memory -- --json .artifacts/test-perf/extensions-memory.json - plugin-inspector/报告工作:将报告原语保留在中;保持包装器轻量化,并在命令支持时收集峰值RSS。
plugin-inspector
- 源码测试:
- 从窄范围开始,逐步扩大:
- 仅修改了一个插件:运行该插件的测试及对应的plugin-inspector切片。
- 修改了SDK/公共入口文件:添加代表性的提供者、通道、内存和功能插件测试。
- 修改了加载器/运行时镜像:根据需要添加包边界检查及构建/包验证。
- 未知共享插件行为:在运行前,先运行
pnpm test:extensions分组测试。test:extensions:batch
- 将plugin-inspector的失败视为产品信号:
- JSON必须可解析。
- 警告/错误必须分类,而非隐藏。
- 运行时捕获应静默且兼容配置。
- 命令输出应包含实际耗时、退出码,以及(如果可用)峰值RSS。
- 如需进行大范围或包相关的插件验证,默认在维护者机器上使用Blacksmith Testbox。预热一次后复用同一测试盒:
blacksmith testbox warmup ci-check-testbox.yml --ref main --idle-timeout 90blacksmith testbox run --id <ID> "OPENCLAW_TESTBOX=1 pnpm test:extensions:batch <ids>"- 使用完毕后停止测试盒。
- 如果插件性能对包工件敏感,切换至和Package Acceptance流程,而非仅依赖源码计时。
openclaw-pre-release-plugin-testing
Metric Collection
指标收集
Collect at least one stable metric before and after. Prefer the same machine and
same command. For Testbox comparisons, use the same id when possible.
tbx_...| Metric | Use for | Preferred source |
|---|---|---|
| wall time | user-visible suite cost | |
| Vitest duration | test body/import cost | Vitest output per file/shard |
| import duration | broad barrel/runtime loads | |
| max RSS | memory pressure and OOM risk | |
| CPU/user/sys | CPU-bound vs wait-bound split | |
| heap snapshots | real leak vs retained module graph | |
Local scoped command with CPU/RSS:
bash
timeout 240 /usr/bin/time -l pnpm test <file> --maxWorkers=1 --reporter=verbosePlugin import memory profile:
bash
pnpm build
pnpm test:extensions:memory -- --top 20 --json .artifacts/test-perf/extensions-memory.jsonTargeted plugin import memory:
bash
pnpm test:extensions:memory -- --extension discord --extension telegram --skip-combinedHeap/RSS escalation:
bash
OPENCLAW_TEST_MEMORY_TRACE=1 \
OPENCLAW_TEST_HEAPSNAPSHOT_INTERVAL_MS=60000 \
OPENCLAW_TEST_HEAPSNAPSHOT_DIR=.tmp/heapsnap \
OPENCLAW_TEST_WORKERS=2 \
OPENCLAW_TEST_MAX_OLD_SPACE_SIZE_MB=6144 \
pnpm testUse when RSS keeps growing across intervals, workers
OOM, or the suspect command has app-object retention. Do not call RSS growth a
leak until snapshots or retainers support it.
openclaw-test-heap-leaks修改前后至少收集一项稳定指标。优先使用同一机器和同一命令。如需对比Testbox结果,尽可能使用相同的 ID。
tbx_...| 指标 | 适用场景 | 首选数据来源 |
|---|---|---|
| 实际耗时(wall time) | 用户可见的套件开销 | |
| Vitest耗时 | 测试主体/导入开销 | 每个文件/分片的Vitest输出 |
| 导入耗时 | 大范围入口文件/运行时加载 | |
| 最大RSS | 内存压力及OOM风险 | |
| CPU/user/sys | CPU密集型与等待密集型任务拆分 | 本地使用 |
| 堆快照 | 真实内存泄漏与保留模块图区分 | |
本地特定范围命令(含CPU/RSS):
bash
timeout 240 /usr/bin/time -l pnpm test <file> --maxWorkers=1 --reporter=verbose插件导入内存分析:
bash
pnpm build
pnpm test:extensions:memory -- --top 20 --json .artifacts/test-perf/extensions-memory.json目标插件导入内存分析:
bash
pnpm test:extensions:memory -- --extension discord --extension telegram --skip-combined堆/RSS升级分析:
bash
OPENCLAW_TEST_MEMORY_TRACE=1 \
OPENCLAW_TEST_HEAPSNAPSHOT_INTERVAL_MS=60000 \
OPENCLAW_TEST_HEAPSNAPSHOT_DIR=.tmp/heapsnap \
OPENCLAW_TEST_WORKERS=2 \
OPENCLAW_TEST_MAX_OLD_SPACE_SIZE_MB=6144 \
pnpm test当RSS在多个间隔持续增长、工作进程OOM,或可疑命令存在应用对象保留时,使用。除非快照或保留器数据支持,否则不要将RSS增长判定为内存泄漏。
openclaw-test-heap-leaksCommon Root Causes
常见根源
- Full bundled channel/plugin runtime loaded for static data.
- fallback used when an already-loaded fixture or pure parser would suffice.
getChannelPlugin() - Broad ,
api.ts,runtime-api.ts, or plugin-sdk barrels pulled into hot tests.test-api.ts - SDK root aliases or package barrels pulling focused subpaths back into a broad plugin graph.
- Plugin-inspector loading runtime code just to render metadata, reports, or CI policy scores.
- Bundled plugin capture reusing real config/home state instead of synthetic, redacted, isolated state.
- Partial-real mocks using around broad modules.
importActual() - plus fresh imports in per-test loops.
vi.resetModules() - Test plugin registry seeded in while runtime state resets in
beforeAll.afterEach - Per-test gateway/server/client startup when state reset would suffice.
- Runtime/default model/auth selection paid by idle snapshots or fixtures.
- Plugin-owned media/action discovery triggered before checking whether args contain plugin-owned fields.
- Timings missing from , causing hotspot files to stay in shared workers.
test/fixtures/test-timings.unit.json - Parallel Vitest runs sharing without distinct
node_modules/.experimental-vitest-cachevalues.OPENCLAW_VITEST_FS_MODULE_CACHE_PATH
- 为获取静态数据而加载完整的捆绑通道/插件运行时。
- 在已有加载的fixture或纯解析器可用时,仍使用回退逻辑。
getChannelPlugin() - 热点测试中引入了大范围的、
api.ts、runtime-api.ts或插件SDK入口文件。test-api.ts - SDK根别名或包入口文件将聚焦的子路径重新引入到庞大的插件图中。
- Plugin-inspector仅为渲染元数据、报告或CI策略评分而加载运行时代码。
- 捆绑插件捕获复用真实配置/本地状态,而非合成、脱敏、隔离的状态。
- 使用对大范围模块进行半真实模拟。
importActual() - 在每个测试循环中同时使用和重新导入。
vi.resetModules() - 在中初始化测试插件注册表,而在
beforeAll中重置运行时状态。afterEach - 每次测试都启动网关/服务器/客户端,而仅重置状态即可满足需求。
- 空闲快照或fixtures需承担运行时/默认模型/认证选择的开销。
- 在检查参数是否包含插件所属字段前,触发了插件所属的媒体/动作发现逻辑。
- 中缺少计时数据,导致热点文件留在共享工作进程中。
test/fixtures/test-timings.unit.json - 并行Vitest运行共享,但未设置不同的
node_modules/.experimental-vitest-cache值。OPENCLAW_VITEST_FS_MODULE_CACHE_PATH
Benchmark Commands
基准测试命令
Scoped file:
bash
timeout 240 /usr/bin/time -l pnpm test <file> --maxWorkers=1 --reporter=verboseScoped file with import breakdown:
bash
timeout 240 /usr/bin/time -l env \
OPENCLAW_VITEST_IMPORT_DURATIONS=1 \
OPENCLAW_VITEST_PRINT_IMPORT_BREAKDOWN=1 \
pnpm test <file> --maxWorkers=1 --reporter=verboseGrouped suite:
bash
pnpm test:perf:groups --full-suite --allow-failures \
--output .artifacts/test-perf/<name>.jsonExtension batch:
bash
pnpm test:extensions:batch <plugin[,plugin...]> -- --reporter=verboseAll extension tests:
bash
pnpm test:extensionsPackage-boundary plugin checks:
bash
pnpm run test:extensions:package-boundary:canary
pnpm run test:extensions:package-boundary:compileReuse an existing Vitest JSON report:
bash
pnpm test:perf:groups --report <vitest-json> \
--output .artifacts/test-perf/<name>.json特定文件测试:
bash
timeout 240 /usr/bin/time -l pnpm test <file> --maxWorkers=1 --reporter=verbose带导入细分的特定文件测试:
bash
timeout 240 /usr/bin/time -l env \
OPENCLAW_VITEST_IMPORT_DURATIONS=1 \
OPENCLAW_VITEST_PRINT_IMPORT_BREAKDOWN=1 \
pnpm test <file> --maxWorkers=1 --reporter=verbose分组套件测试:
bash
pnpm test:perf:groups --full-suite --allow-failures \
--output .artifacts/test-perf/<name>.json扩展批量测试:
bash
pnpm test:extensions:batch <plugin[,plugin...]> -- --reporter=verbose所有扩展测试:
bash
pnpm test:extensions包边界插件检查:
bash
pnpm run test:extensions:package-boundary:canary
pnpm run test:extensions:package-boundary:compile复用现有Vitest JSON报告:
bash
pnpm test:perf:groups --report <vitest-json> \
--output .artifacts/test-perf/<name>.jsonVerification
验证
- Always run the targeted test surface that proves the change.
- For source changes, run before push; in maintainer Testbox mode run it in the warmed Testbox.
pnpm check:changed - For test-only changes, run or the exact edited tests.
pnpm test:changed - Run when touching lazy-loading, bundled artifacts, package boundaries, dynamic imports, build output, or public surfaces.
pnpm build - For plugin SDK/barrel/runtime changes, add or
pnpm plugin-sdk:api:checkwhen the API surface may drift.pnpm plugin-sdk:api:gen - For plugin-suite perf fixes, verify at least one representative plugin batch plus the changed gate; use Package Acceptance if the bug only exists in a packed artifact.
- If deps are missing/stale, run and retry the exact failed command once.
pnpm install - Use the report format:
markdown
| Metric | Before | After | Gain |
| -------------- | -----: | -----: | ------------: |
| File wall time | `Xs` | `Ys` | `-Zs` (`P%`) |
| Max RSS | `XMB` | `YMB` | `-ZMB` (`P%`) |
| CPU user/sys | `X/Ys` | `A/Bs` | explain |- 始终运行能验证修改效果的目标测试范围。
- 对于源码修改,推送前运行;在维护者Testbox模式下,在预热后的Testbox中运行该命令。
pnpm check:changed - 对于仅修改测试的情况,运行或精确的已编辑测试。
pnpm test:changed - 当涉及懒加载、捆绑工件、包边界、动态导入、构建输出或公共接口时,运行。
pnpm build - 对于插件SDK/入口文件/运行时修改,当API表面可能发生变化时,添加或
pnpm plugin-sdk:api:check。pnpm plugin-sdk:api:gen - 对于插件套件性能优化,至少验证一个代表性插件批量测试及修改的入口;如果问题仅存在于打包后的工件中,使用Package Acceptance流程。
- 如果依赖缺失/过时,运行并重新尝试一次失败的命令。
pnpm install - 使用以下报告格式:
markdown
| 指标 | 修改前 | 修改后 | 提升幅度 |
| -------------- | -----: | -----: | ------------: |
| 文件实际耗时 | `Xs` | `Ys` | `-Zs` (`P%`) |
| 最大RSS | `XMB` | `YMB` | `-ZMB` (`P%`) |
| CPU user/sys | `X/Ys` | `A/Bs` | 说明 |Handoff
交接
Keep the final concise:
- Root cause.
- Suite/plugin scope.
- Files changed.
- Before/after wall, Vitest/import, CPU, and RSS numbers where available.
- Leak classification if memory was involved: real leak, retained module graph, or inconclusive.
- Coverage retained.
- Verification commands.
- Testbox ID or workflow URL for remote proof.
- Commit hash and push status.
最终内容需简洁明了,包含:
- 问题根源。
- 套件/插件范围。
- 修改的文件。
- 修改前后的实际耗时、Vitest/导入耗时、CPU及RSS数据(如有)。
- 内存相关问题的分类:真实泄漏、保留模块图或不确定。
- 保留的覆盖率情况。
- 验证命令。
- 远程验证的Testbox ID或工作流URL。
- 提交哈希及推送状态。