openclaw-testing

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

OpenClaw Testing

OpenClaw测试

Use this skill when deciding what to test, debugging failures, rerunning CI, or validating a change without wasting hours.
当你需要决定测试内容、调试失败问题、重新运行CI,或者在不浪费大量时间的情况下验证变更时,使用本技能。

Read First

必读内容

  • docs/reference/test.md
    for local test commands.
  • docs/ci.md
    for CI scope, release checks, Docker chunks, and runner behavior.
  • Scoped
    AGENTS.md
    files before editing code under a subtree.
  • 本地测试命令请参考
    docs/reference/test.md
  • CI范围、发布检查、Docker分片以及运行器行为请参考
    docs/ci.md
  • 在编辑子树中的代码前,请先阅读对应范围的
    AGENTS.md
    文件。

Default Rule

默认规则

Prove the touched surface first. Do not reflexively run the whole suite.
  1. Inspect the diff and classify the touched surface:
    • source:
      pnpm changed:lanes --json
      , then
      pnpm check:changed
    • tests only:
      pnpm test:changed
    • one failing file:
      pnpm test <path-or-filter> -- --reporter=verbose
    • workflow-only:
      git diff --check
      , workflow syntax/lint (
      actionlint
      when available)
    • docs-only:
      pnpm docs:list
      , docs formatter/lint only if docs tooling changed or requested
  2. Reproduce narrowly before fixing.
  3. Fix root cause.
  4. Rerun the same narrow proof.
  5. Broaden only when the touched contract demands it.
优先验证受变更影响的范围,不要下意识地运行整个测试套件。
  1. 检查代码差异并分类受影响的范围:
    • 源代码变更:执行
      pnpm changed:lanes --json
      ,然后运行
      pnpm check:changed
    • 仅测试代码变更:执行
      pnpm test:changed
    • 单个文件测试失败:执行
      pnpm test <path-or-filter> -- --reporter=verbose
    • 仅工作流变更:执行
      git diff --check
      ,若有
      actionlint
      则进行工作流语法/检查
    • 仅文档变更:执行
      pnpm docs:list
      ,仅当文档工具变更或有明确要求时,才运行文档格式化/检查
  2. 在修复问题前先精准复现故障。
  3. 修复问题根源。
  4. 重新运行相同的精准验证用例。
  5. 仅当受影响的契约要求时,才扩大测试范围。

Guardrails

防护规则

  • Do not kill unrelated processes or tests. If something is running elsewhere, treat it as owned by the user or another agent.
  • Do not run expensive local Docker, full release checks, full
    pnpm test
    , or full
    pnpm check
    unless the user asks or the change genuinely requires it.
  • Prefer GitHub Actions for release/Docker proof when the workflow already has the prepared image and secrets.
  • Use
    scripts/committer "<msg>" <paths...>
    when committing; stage only your files.
  • If deps are missing, run
    pnpm install
    , retry once, then report the first actionable error.
  • For Blacksmith Testbox proof, reuse only an id warmed and claimed in this operator session.
    blacksmith testbox list
    is diagnostics only; a listed id can have a local key and still carry stale rsync state from another lane. After warmup, run
    pnpm testbox:claim --id <id>
    , then prefer
    pnpm testbox:run --id <id> -- "<command>"
    for OpenClaw gates so stale org-visible ids fail fast before syncing. Claims older than 12 hours are stale unless
    OPENCLAW_TESTBOX_CLAIM_TTL_MINUTES
    is explicitly set for long work.
  • 不要终止无关的进程或测试。如果有其他进程在运行,默认视为用户或其他Agent所有。
  • 除非用户要求或变更确实需要,否则不要运行本地Docker、完整发布检查、完整
    pnpm test
    或完整
    pnpm check
  • 当工作流已准备好镜像和密钥时,优先使用GitHub Actions进行发布/Docker验证。
  • 提交代码时使用
    scripts/committer "<msg>" <paths...>
    ;仅暂存你修改的文件。
  • 如果缺少依赖,执行
    pnpm install
    ,重试一次后,报告第一个可处理的错误。
  • 对于Blacksmith Testbox验证,仅重用当前操作员会话中预热并认领的实例ID。
    blacksmith testbox list
    仅用于诊断;列出的ID可能带有本地密钥,且仍保留来自其他流水线的陈旧rsync状态。预热完成后,执行
    pnpm testbox:claim --id <id>
    ,然后优先使用
    pnpm testbox:run --id <id> -- "<command>"
    运行OpenClaw网关测试,这样陈旧的组织可见ID会在同步前快速失败。除非为长时间工作显式设置
    OPENCLAW_TESTBOX_CLAIM_TTL_MINUTES
    ,否则超过12小时的认领将视为过期。

Local Test Shortcuts

本地测试快捷命令

bash
pnpm changed:lanes --json
pnpm check:changed       # changed typecheck/lint/guards; no Vitest
pnpm test:changed        # cheap smart changed Vitest targets
OPENCLAW_TEST_CHANGED_BROAD=1 pnpm test:changed
pnpm test <path-or-filter> -- --reporter=verbose
OPENCLAW_VITEST_MAX_WORKERS=1 pnpm test <path-or-filter>
Use targeted file paths whenever possible. Avoid raw
vitest
; use the repo
pnpm test
wrapper so project routing, workers, and setup stay correct.
bash
pnpm changed:lanes --json
pnpm check:changed       # 对变更内容进行类型检查/代码检查/防护验证;不运行Vitest
pnpm test:changed        # 轻量智能的变更Vitest目标测试
OPENCLAW_TEST_CHANGED_BROAD=1 pnpm test:changed
pnpm test <path-or-filter> -- --reporter=verbose
OPENCLAW_VITEST_MAX_WORKERS=1 pnpm test <path-or-filter>
尽可能使用目标文件路径。避免直接使用
vitest
;使用仓库的
pnpm test
包装器,以确保项目路由、工作线程和设置保持正确。

Command Semantics

命令语义

  • pnpm check
    and
    pnpm check:changed
    do not run Vitest tests. They are for typecheck, lint, and guard proof.
  • pnpm test
    and
    pnpm test:changed
    run Vitest tests.
  • pnpm test:changed
    is intentionally cheap by default: direct test edits, sibling tests, explicit source mappings, and import-graph dependents.
  • OPENCLAW_TEST_CHANGED_BROAD=1 pnpm test:changed
    is the explicit broad fallback for harness/config/package edits that genuinely need it.
  • Do not run extension sweeps just because core changed. If a core edit is for a specific plugin bug, run that plugin's tests explicitly. If a public SDK or contract change needs consumer proof, choose the smallest representative plugin/contract tests first, then broaden only when the risk justifies it.
  • The test wrapper prints a short
    [test] passed|failed|skipped ... in ...
    line. Vitest's own duration is still the per-shard detail.
  • pnpm check
    pnpm check:changed
    不运行Vitest测试,仅用于类型检查、代码检查和防护验证。
  • pnpm test
    pnpm test:changed
    运行Vitest测试。
  • pnpm test:changed
    默认设计为轻量:仅运行直接修改的测试、同级测试、显式映射的源代码测试,以及依赖导入图的相关测试。
  • OPENCLAW_TEST_CHANGED_BROAD=1 pnpm test:changed
    是针对测试工具/配置/包变更的显式宽范围回退方案,仅在确实需要时使用。
  • 不要因为核心代码变更就运行所有扩展测试。如果核心代码变更是针对特定插件Bug,仅运行该插件的测试。如果公共SDK或契约变更需要消费者验证,优先选择最小代表性的插件/契约测试,仅当风险证明有必要时才扩大范围。
  • 测试包装器会输出简短的
    [test] passed|failed|skipped ... in ...
    行。Vitest自身的时长仍为每个分片的详细信息。

Routing Model

路由模型

  • pnpm changed:lanes --json
    answers "which check lanes does this diff touch?" It is used by
    pnpm check:changed
    for typecheck/lint/guard selection.
  • pnpm test:changed
    answers "which Vitest targets are worth running now?" It uses the same changed path list, but applies a cheaper test-target resolver.
  • Direct test edits run themselves. Source edits prefer explicit mappings, sibling
    *.test.ts
    , then import-graph dependents. Shared harness/config/root edits are skipped by default unless they have precise mapped tests.
  • Shared group-room delivery config and source-reply prompt edits are precise mapped tests: they run the core auto-reply regressions plus Discord and Slack delivery tests so cross-channel default changes fail before a PR push.
  • Public SDK or contract edits do not automatically run every plugin test.
    check:changed
    proves extension type contracts; the agent chooses the smallest plugin/contract Vitest proof that matches the actual risk.
  • Use
    OPENCLAW_TEST_CHANGED_BROAD=1 pnpm test:changed
    only when a harness, config, package, or unknown-root edit really needs the broad Vitest fallback.
  • pnpm changed:lanes --json
    用于回答“此代码差异影响哪些检查流水线?”,
    pnpm check:changed
    使用它来选择类型检查/代码检查/防护验证的范围。
  • pnpm test:changed
    用于回答“现在值得运行哪些Vitest目标?”,它使用相同的变更路径列表,但应用更轻量的测试目标解析器。
  • 直接修改的测试文件会运行自身。源代码变更优先选择显式映射的测试、同级
    *.test.ts
    文件,然后是依赖导入图的相关测试。默认情况下,共享测试工具/配置/根目录的变更会被跳过,除非有精准映射的测试。
  • 共享群组交付配置和源回复提示的变更有精准映射的测试:会运行核心自动回复回归测试以及Discord和Slack交付测试,这样跨渠道的默认变更会在PR推送前失败。
  • 公共SDK或契约变更不会自动运行所有插件测试。
    check:changed
    验证扩展类型契约;Agent会选择与实际风险匹配的最小插件/契约Vitest验证用例。
  • 仅当测试工具、配置、包或未知根目录的变更确实需要时,才使用
    OPENCLAW_TEST_CHANGED_BROAD=1 pnpm test:changed
    的宽范围Vitest回退方案。

CI Debugging

CI调试

Start with current run state, not logs for everything:
bash
gh run list --branch main --limit 10
gh run view <run-id> --json status,conclusion,headSha,url,jobs
gh run view <run-id> --job <job-id> --log
  • Check exact SHA. Ignore newer unrelated
    main
    unless asked.
  • For cancelled same-branch runs, confirm whether a newer run superseded it.
  • Fetch full logs only for failed or relevant jobs.
从当前运行状态开始,不要查看所有日志:
bash
gh run list --branch main --limit 10
gh run view <run-id> --json status,conclusion,headSha,url,jobs
gh run view <run-id> --job <job-id> --log
  • 检查精确的SHA值。除非被要求,否则忽略更新的无关
    main
    分支内容。
  • 对于同一分支的已取消运行,确认是否有更新的运行已取代它。
  • 仅获取失败或相关作业的完整日志。

GitHub Release Workflows

GitHub发布工作流

Use the smallest workflow that proves the current risk. The full umbrella is available, but it is usually the last step after narrower proof, not the first rerun after a focused patch.
使用最小的工作流来验证当前风险。完整的集成工作流可用,但通常是在更窄范围验证之后的最后一步,而不是在聚焦补丁后的首次重新运行。

Full Release Validation

完整发布验证

Full Release Validation
(
.github/workflows/full-release-validation.yml
) is the manual "everything before release" umbrella. It resolves a target ref, then dispatches:
  • manual
    CI
    for the full normal CI graph, with Android enabled via
    include_android=true
  • Plugin Prerelease
    for release-only plugin static checks, extension shards, the release-only
    agentic-plugins
    shard, and plugin product Docker lanes
  • OpenClaw Release Checks
    for install smoke, cross-OS release checks, live and E2E checks, Docker release-path suites, OpenWebUI, QA Lab, fast Matrix, and Telegram release lanes
  • optional post-publish Telegram E2E when a package spec is supplied
Run it only when validating an actual release candidate, after broad shared CI or release orchestration changes, or when explicitly asked:
bash
gh workflow run full-release-validation.yml \
  --repo openclaw/openclaw \
  --ref main \
  -f ref=<branch-or-sha> \
  -f provider=openai \
  -f mode=both \
  -f release_profile=stable
Run the workflow itself from the trusted current ref, normally
--ref main
; child workflows are dispatched from that same ref even when
ref
points at an older release branch or tag. Full Release Validation has no separate child workflow ref input; choose the trusted harness by choosing the workflow run ref. Use
release_profile=minimum|stable|full
to control live/provider breadth:
minimum
keeps the fastest OpenAI/core release-critical set,
stable
adds the stable provider/backend set, and
full
adds the broad advisory provider/media matrix. Do not make
full
faster by silently dropping suites; optimize setup, artifact reuse, and sharding instead. The parent verifier job appends a child overview plus slowest-job tables for child runs; rerun only that verifier after a child rerun turns green.
Standalone manual
CI
dispatches do not run the plugin prerelease suite, the extension batch sweep, or the release-only
agentic-plugins
Vitest shard. Those lanes are intentionally reserved for the separate
Plugin Prerelease
child so PRs, main pushes, and ad hoc broad CI checks do not spend Docker/package time or all-plugin runtime time on release-only product coverage.
If a full run is already active on a newer
origin/main
, prefer watching that run over dispatching a duplicate. Do not cancel release, release-check, or child workflow runs unless Peter explicitly asks for cancellation.
The child-dispatch jobs record the child run ids. The final
Verify full validation
job re-queries those child runs and is the canonical parent gate. If a child workflow failed but was later rerun successfully, rerun only the failed parent verifier job; do not dispatch a new full umbrella unless the release evidence is stale.
For bounded recovery after a focused fix, pass
-f rerun_group=<group>
. Supported umbrella groups are
all
,
ci
,
plugin-prerelease
,
release-checks
,
install-smoke
,
cross-os
,
live-e2e
,
package
,
qa
,
qa-parity
,
qa-live
, and
npm-telegram
. Use the narrowest group that covers the failed box. After a targeted release-check fix, do not restart the full umbrella by habit: dispatch the matching
rerun_group
and rerun only the parent verifier/evidence step after the child is green unless the release evidence is stale. For a single failed live/E2E shard, use
-f rerun_group=live-e2e -f live_suite_filter=<suite_id>
so the Blacksmith workflow only spends setup and queue time on that suite.
Full Release Validation
.github/workflows/full-release-validation.yml
)是手动触发的“发布前所有检查”集成工作流。它解析目标引用,然后调度:
  • 手动
    CI
    ,包含完整的常规CI图,通过
    include_android=true
    启用Android检查
  • Plugin Prerelease
    ,包含仅发布阶段的插件静态检查、扩展分片、仅发布阶段的
    agentic-plugins
    分片,以及插件产品Docker流水线
  • OpenClaw Release Checks
    ,包含安装冒烟测试、跨平台发布检查、实时和E2E检查、Docker发布路径套件、OpenWebUI、QA实验室、快速Matrix和Telegram发布流水线
  • 当提供包规格时,可选的发布后Telegram E2E测试
仅在验证实际发布候选版本、完成广泛的共享CI或发布编排变更后,或被明确要求时才运行:
bash
gh workflow run full-release-validation.yml \
  --repo openclaw/openclaw \
  --ref main \
  -f ref=<branch-or-sha> \
  -f provider=openai \
  -f mode=both \
  -f release_profile=stable
从可信的当前引用运行工作流,通常是
--ref main
;即使
ref
指向旧的发布分支或标签,子工作流也会从同一引用调度。完整发布验证没有单独的子工作流引用输入;通过选择工作流运行引用来选择可信的测试工具。使用
release_profile=minimum|stable|full
控制实时/提供商范围:
minimum
保留最快的OpenAI/核心发布关键集,
stable
添加稳定的提供商/后端集,
full
添加广泛的建议性提供商/媒体矩阵。不要通过静默删除套件来加快
full
的速度;而是优化设置、工件重用和分片。父验证器作业会附加子工作流概述以及最慢作业表;子工作流重新运行通过后,仅重新运行该验证器作业。
独立的手动
CI
调度不会运行插件预发布套件、扩展批量扫描或仅发布阶段的
agentic-plugins
Vitest分片。这些流水线被有意保留为单独的
Plugin Prerelease
子工作流,这样PR、主分支推送和临时的广泛CI检查就不会在仅发布阶段的产品覆盖上花费Docker/包时间或全插件运行时间。
如果较新的
origin/main
上已有完整运行在进行中,优先观察该运行,而不是调度重复运行。除非Peter明确要求取消,否则不要取消发布、发布检查或子工作流运行。
子调度作业会记录子运行ID。最终的
Verify full validation
作业会重新查询这些子运行,是规范的父网关。如果子工作流失败但后来重新运行成功,仅重新运行失败的父验证器作业;除非发布证据过期,否则不要调度新的完整集成工作流。
对于聚焦修复后的有限恢复,传递
-f rerun_group=<group>
。支持的集成组包括
all
ci
plugin-prerelease
release-checks
install-smoke
cross-os
live-e2e
package
qa
qa-parity
qa-live
npm-telegram
。使用覆盖失败范围的最小组。完成针对性的发布检查修复后,不要习惯性地重启完整集成工作流:调度匹配的
rerun_group
,子工作流通过后仅重新运行父验证器/证据步骤,除非发布证据过期。对于单个失败的实时/E2E分片,使用
-f rerun_group=live-e2e -f live_suite_filter=<suite_id>
,这样Blacksmith工作流仅在该套件上花费设置和排队时间。

Release Evidence

发布证据

After release-candidate validation or before a release decision, record the important run ids in the private
openclaw/releases-private
evidence ledger. Use the manual
OpenClaw Release Evidence
(
openclaw-release-evidence.yml
) workflow there. It writes durable summaries under
evidence/<release-id>/
and commits:
  • release-evidence.md
  • release-evidence.json
  • index.json
  • runs/<label>.json
Use one run per line:
text
full-release-validation openclaw/openclaw <run-id> blocking
package-acceptance openclaw/openclaw <run-id> blocking
release-checks openclaw/openclaw <run-id> blocking
Store summaries, run URLs, artifact metadata, timings, pass/fail state, and short release-manager notes there. Do not store raw logs, provider prompts/responses, channel transcripts, signing material, or secret-bearing config in git; raw logs stay in Actions artifacts.
When
Full Release Validation
completes and
OPENCLAW_RELEASES_PRIVATE_DISPATCH_TOKEN
is configured in the public repo, it requests the private
OpenClaw Release Evidence From Full Validation
workflow. That private workflow reads the parent full-validation run, extracts the child CI/release-checks/Telegram run ids from the parent logs, and opens the evidence PR automatically. If the token is absent or the run predates this wiring, trigger that private workflow manually with the full-validation run id.
在发布候选版本验证后或发布决策前,将重要的运行ID记录在私有
openclaw/releases-private
证据台账中。使用那里的手动
OpenClaw Release Evidence
openclaw-release-evidence.yml
)工作流。它会在
evidence/<release-id>/
下写入持久化摘要并提交:
  • release-evidence.md
  • release-evidence.json
  • index.json
  • runs/<label>.json
每行记录一个运行:
text
full-release-validation openclaw/openclaw <run-id> blocking
package-acceptance openclaw/openclaw <run-id> blocking
release-checks openclaw/openclaw <run-id> blocking
在其中存储摘要、运行URL、工件元数据、时间、通过/失败状态以及简短的发布经理备注。不要在git中存储原始日志、提供商提示/响应、渠道记录、签名材料或包含密钥的配置;原始日志保留在Actions工件中。
Full Release Validation
完成且公共仓库中配置了
OPENCLAW_RELEASES_PRIVATE_DISPATCH_TOKEN
时,它会请求私有
OpenClaw Release Evidence From Full Validation
工作流。该私有工作流读取父完整验证运行,从父日志中提取子CI/发布检查/Telegram运行ID,并自动打开证据PR。如果令牌不存在或运行早于该配置,则使用完整验证运行ID手动触发该私有工作流。

Release Checks

发布检查

OpenClaw Release Checks
(
openclaw-release-checks.yml
) is the release child workflow. It is broader than normal CI but narrower than the umbrella because it does not dispatch the separate full normal CI child. It runs Package Acceptance with artifact-native delta lanes and
telegram_mode=mock-openai
, so the release package tarball also goes through offline plugin proof, bundled-channel compat, and Telegram package QA. The Docker release-path chunks cover the overlapping package/update/plugin lanes. Use it when release-path validation is needed without rerunning the entire umbrella.
bash
gh workflow run openclaw-release-checks.yml \
  --repo openclaw/openclaw \
  --ref main \
  -f ref=<branch-or-sha> \
  -f provider=openai \
  -f mode=both \
  -f release_profile=stable \
  -f rerun_group=all
Release-check rerun groups are
all
,
install-smoke
,
cross-os
,
live-e2e
,
package
,
qa
,
qa-parity
, and
qa-live
.
OpenClaw Release Checks
uses the trusted workflow ref to resolve the selected ref once as
release-package-under-test
and passes that artifact into cross-OS release checks, release-path Docker live/E2E checks, and Package Acceptance. When
Full Release Validation
dispatches release checks, it passes the requested branch/tag plus an
expected_sha
so branch/tag refs resolve through the fast remote-ref path while the package and QA jobs still validate the exact SHA.
The full install-smoke child is split on purpose: one job prepares or reuses the target-SHA GHCR root Dockerfile smoke image, QR package install runs in its own job, root Dockerfile/gateway smokes pull the prepared image, and installer/Bun smokes pull the same image while building only their small installer images. If install-smoke gets slow again, first check whether the root image was reused or rebuilt before adding/removing coverage.
The full-profile native live media shards use the prebuilt
ghcr.io/openclaw/openclaw-live-media-runner:ubuntu-24.04
container so
ffmpeg
/
ffprobe
are already present. If those jobs suddenly spend minutes in dependency setup again, first check the
Live Media Runner Image
workflow and the
Verify preinstalled live media dependencies
step before assuming the media tests themselves slowed down.
The release Docker path intentionally shards the plugin/runtime tail. The workflow uses
plugins-runtime-plugins
,
plugins-runtime-services
, and
plugins-runtime-install-a
through
plugins-runtime-install-d
; aggregate aliases such as
plugins-runtime-core
,
plugins-runtime
, and
plugins-integrations
remain for manual reruns.
The release QA parity box is internally split into candidate and baseline lane jobs, followed by a report job that downloads both artifacts and runs
pnpm openclaw qa parity-report
. For parity failures, inspect the failed lane first; inspect the report job when both lane summaries exist but the comparison fails.
OpenClaw Release Checks
openclaw-release-checks.yml
)是发布子工作流。它比常规CI更广泛,但比集成工作流更窄,因为它不调度单独的完整常规CI子工作流。它运行Package Acceptance,包含工件原生增量流水线和
telegram_mode=mock-openai
,因此发布包tarball还会经过离线插件验证、捆绑渠道兼容性和Telegram包QA。Docker发布路径分片覆盖重叠的包/更新/插件流水线。当需要发布路径验证但无需重新运行整个集成工作流时使用它。
bash
gh workflow run openclaw-release-checks.yml \
  --repo openclaw/openclaw \
  --ref main \
  -f ref=<branch-or-sha> \
  -f provider=openai \
  -f mode=both \
  -f release_profile=stable \
  -f rerun_group=all
发布检查的重新运行组包括
all
install-smoke
cross-os
live-e2e
package
qa
qa-parity
qa-live
OpenClaw Release Checks
使用可信的工作流引用将所选引用解析一次为
release-package-under-test
,并将该工件传递到跨平台发布检查、发布路径Docker实时/E2E检查和Package Acceptance中。当
Full Release Validation
调度发布检查时,它会传递请求的分支/标签以及
expected_sha
,这样分支/标签引用通过快速远程引用路径解析,而包和QA作业仍验证精确的SHA。
完整的安装冒烟测试子工作流被有意拆分:一个作业准备或重用目标SHA的GHCR根Dockerfile冒烟镜像,QR包安装在自己的作业中运行,根Dockerfile/网关冒烟测试拉取准备好的镜像,安装程序/Bun冒烟测试拉取相同的镜像,同时仅构建它们的小型安装程序镜像。如果安装冒烟测试再次变慢,首先检查根镜像是被重用还是重建,然后再考虑添加/删除覆盖范围。
完整配置文件的原生实时媒体分片使用预构建的
ghcr.io/openclaw/openclaw-live-media-runner:ubuntu-24.04
容器,因此
ffmpeg
/
ffprobe
已存在。如果这些作业突然在依赖设置上花费数分钟,首先检查
Live Media Runner Image
工作流和
Verify preinstalled live media dependencies
步骤,然后再假设媒体测试本身变慢。
发布Docker路径有意拆分了插件/运行时的尾部工作流。工作流使用
plugins-runtime-plugins
plugins-runtime-services
plugins-runtime-install-a
plugins-runtime-install-d
;聚合别名如
plugins-runtime-core
plugins-runtime
plugins-integrations
仍用于手动重新运行。
发布QA奇偶校验框在内部拆分为候选和基准流水线作业,然后是一个报告作业,下载两个工件并运行
pnpm openclaw qa parity-report
。对于奇偶校验失败,首先检查失败的流水线;当两个流水线摘要都存在但比较失败时,检查报告作业。

QA Lab Matrix Profiles

QA实验室Matrix配置文件

pnpm openclaw qa matrix
defaults to
--profile all
. Do not assume the CLI default is the fast release path. Use explicit profiles:
  • --profile fast
    : release-critical Matrix transport contract; add
    --fail-fast
    only when the target CLI supports it
  • --profile transport|media|e2ee-smoke|e2ee-deep|e2ee-cli
    : sharded full Matrix proof
  • OPENCLAW_QA_MATRIX_NO_REPLY_WINDOW_MS=3000
    : CI-friendly no-reply quiet window when paired with fast or sharded gates
QA-Lab - All Lanes
uses explicit fast Matrix on scheduled runs; manual dispatch keeps
matrix_profile=all
as the default and always shards that full Matrix selection.
OpenClaw Release Checks
uses explicit fast Matrix; run the all-lanes workflow when release investigation needs full Matrix media/E2EE inventory.
pnpm openclaw qa matrix
默认使用
--profile all
。不要假设CLI默认是快速发布路径。使用显式配置文件:
  • --profile fast
    :发布关键的Matrix传输契约;仅当目标CLI支持时添加
    --fail-fast
  • --profile transport|media|e2ee-smoke|e2ee-deep|e2ee-cli
    :分片的完整Matrix验证
  • OPENCLAW_QA_MATRIX_NO_REPLY_WINDOW_MS=3000
    :与快速或分片网关配对时,CI友好的无回复静默窗口
QA-Lab - All Lanes
在计划运行时使用显式快速Matrix;手动调度保留
matrix_profile=all
作为默认值,并始终对完整Matrix选择进行分片。
OpenClaw Release Checks
使用显式快速Matrix;当发布调查需要完整Matrix媒体/E2EE清单时,运行全流水线工作流。

Reusable Live/E2E Checks

可重用的实时/E2E检查

OpenClaw Live And E2E Checks (Reusable)
(
openclaw-live-and-e2e-checks-reusable.yml
) is the preferred entry point for targeted live, Docker, model, and E2E proof. Inputs let you turn off unrelated lanes:
bash
gh workflow run openclaw-live-and-e2e-checks-reusable.yml \
  --repo openclaw/openclaw \
  --ref main \
  -f ref=<sha> \
  -f include_repo_e2e=false \
  -f include_release_path_suites=false \
  -f include_openwebui=false \
  -f include_live_suites=true \
  -f live_models_only=true \
  -f live_model_providers=fireworks
Useful knobs:
  • docker_lanes='<lane[,lane]>'
    : run selected Docker scheduler lanes against prepared artifacts instead of the release chunk matrix. Multiple selected lanes fan out as parallel targeted Docker jobs after one shared package/image preparation step.
  • include_live_suites=false
    : skip live/provider suites when testing Docker scheduler or release packaging only.
  • live_models_only=true
    : run only Docker live model coverage.
  • live_model_providers=fireworks
    (or comma/space separated providers): run one targeted Docker live model job instead of the full provider matrix.
  • blank
    live_model_providers
    : run the full live-model provider matrix.
Release-path Docker chunks are currently
core
,
package-update-openai
,
package-update-anthropic
,
package-update-core
,
plugins-runtime-plugins
,
plugins-runtime-services
,
plugins-runtime-install-a
,
plugins-runtime-install-b
,
plugins-runtime-install-c
,
plugins-runtime-install-d
,
bundled-channels-core
,
bundled-channels-update-a
,
bundled-channels-update-b
, and
bundled-channels-contracts
. The aggregate
bundled-channels
,
plugins-runtime-core
,
plugins-runtime
, and
plugins-integrations
chunks remain valid for manual one-shot reruns, but release checks use the split chunks.
When live suites are enabled, the workflow shards broad native
pnpm test:live
coverage through
scripts/test-live-shard.mjs
instead of one serial
live-all
job:
  • native-live-src-agents
  • native-live-src-gateway-core
  • native-live-src-gateway-profiles
    (release CI runs this with provider filters such as
    OPENCLAW_LIVE_GATEWAY_PROVIDERS=anthropic
    )
  • native-live-src-gateway-backends
  • native-live-test
  • native-live-extensions-a-k
  • native-live-extensions-l-n
  • native-live-extensions-openai
  • native-live-extensions-o-z
  • native-live-extensions-o-z-other
  • native-live-extensions-xai
  • native-live-extensions-media
  • native-live-extensions-media-audio
  • native-live-extensions-media-music
  • native-live-extensions-media-music-google
  • native-live-extensions-media-music-minimax
  • native-live-extensions-media-video
Use
node scripts/test-live-shard.mjs <shard> --list
to see the exact files before rerunning a failed native live shard. The aggregate
o-z
and
media
shards remain useful locally; release CI uses the smaller provider/media shards so one live-provider flake does not force a broad native live rerun.
For model-list or provider-selection fixes, use
live_models_only=true
plus the specific
live_model_providers
allowlist. Confirm logs show the expected
OPENCLAW_LIVE_PROVIDERS
and selected model ids before declaring proof.
OpenClaw Live And E2E Checks (Reusable)
openclaw-live-and-e2e-checks-reusable.yml
)是针对性实时、Docker、模型和E2E验证的首选入口点。输入可以关闭无关流水线:
bash
gh workflow run openclaw-live-and-e2e-checks-reusable.yml \
  --repo openclaw/openclaw \
  --ref main \
  -f ref=<sha> \
  -f include_repo_e2e=false \
  -f include_release_path_suites=false \
  -f include_openwebui=false \
  -f include_live_suites=true \
  -f live_models_only=true \
  -f live_model_providers=fireworks
实用配置项:
  • docker_lanes='<lane[,lane]>'
    :针对准备好的工件运行选定的Docker调度流水线,而不是发布分片矩阵。多个选定流水线在一个共享包/镜像准备步骤后,会展开为并行的针对性Docker作业。
  • include_live_suites=false
    :仅测试Docker调度或发布打包时,跳过实时/提供商套件。
  • live_models_only=true
    :仅运行Docker实时模型覆盖测试。
  • live_model_providers=fireworks
    (或逗号/空格分隔的提供商):运行一个针对性Docker实时模型作业,而不是完整的提供商矩阵。
  • live_model_providers
    :运行完整的实时模型提供商矩阵。
发布路径Docker分片目前包括
core
package-update-openai
package-update-anthropic
package-update-core
plugins-runtime-plugins
plugins-runtime-services
plugins-runtime-install-a
plugins-runtime-install-b
plugins-runtime-install-c
plugins-runtime-install-d
bundled-channels-core
bundled-channels-update-a
bundled-channels-update-b
bundled-channels-contracts
。聚合的
bundled-channels
plugins-runtime-core
plugins-runtime
plugins-integrations
分片仍可用于手动一次性重新运行,但发布检查使用拆分后的分片,这样提供商安装程序检查、插件运行时检查、捆绑插件安装/卸载分片和捆绑渠道检查可以在不同机器上运行。
bundled-channels
内的捆绑渠道运行时依赖覆盖使用拆分后的
bundled-channel-*
bundled-channel-update-*
流水线,而不是串行的
bundled-channel-deps
流水线,因此失败会产生针对确切渠道/更新场景的低成本重新运行。捆绑插件安装/卸载扫描也拆分为
bundled-plugin-install-uninstall-0
bundled-plugin-install-uninstall-7
;选择旧的
bundled-plugin-install-uninstall
流水线会展开为所有8个分片。
当启用实时套件时,工作流通过
scripts/test-live-shard.mjs
将广泛的原生
pnpm test:live
覆盖分片,而不是一个串行的
live-all
作业:
  • native-live-src-agents
  • native-live-src-gateway-core
  • native-live-src-gateway-profiles
    (发布CI使用提供商过滤器运行,如
    OPENCLAW_LIVE_GATEWAY_PROVIDERS=anthropic
  • native-live-src-gateway-backends
  • native-live-test
  • native-live-extensions-a-k
  • native-live-extensions-l-n
  • native-live-extensions-openai
  • native-live-extensions-o-z
  • native-live-extensions-o-z-other
  • native-live-extensions-xai
  • native-live-extensions-media
  • native-live-extensions-media-audio
  • native-live-extensions-media-music
  • native-live-extensions-media-music-google
  • native-live-extensions-media-music-minimax
  • native-live-extensions-media-video
重新运行失败的原生实时分片前,使用
node scripts/test-live-shard.mjs <shard> --list
查看确切文件。聚合的
o-z
media
分片在本地仍然有用;发布CI使用更小的提供商/媒体分片,这样一个实时提供商的故障不会导致广泛的原生实时重新运行。
对于模型列表或提供商选择修复,使用
live_models_only=true
加上特定的
live_model_providers
允许列表。在确认验证通过前,检查日志是否显示预期的
OPENCLAW_LIVE_PROVIDERS
和选定的模型ID。

Docker

Docker相关

Docker is expensive. First inspect the scheduler without running Docker:
bash
OPENCLAW_DOCKER_ALL_DRY_RUN=1 pnpm test:docker:all
OPENCLAW_DOCKER_ALL_DRY_RUN=1 OPENCLAW_DOCKER_ALL_LANES=install-e2e pnpm test:docker:all
OPENCLAW_DOCKER_ALL_LANES=install-e2e node scripts/test-docker-all.mjs --plan-json
Run one failed lane locally only when explicitly asked or when GitHub is not usable:
bash
OPENCLAW_DOCKER_ALL_LANES=<lane> \
OPENCLAW_DOCKER_ALL_BUILD=0 \
OPENCLAW_DOCKER_ALL_PREFLIGHT=0 \
OPENCLAW_SKIP_DOCKER_BUILD=1 \
OPENCLAW_DOCKER_E2E_BARE_IMAGE='<prepared-bare-image>' \
OPENCLAW_DOCKER_E2E_FUNCTIONAL_IMAGE='<prepared-functional-image>' \
pnpm test:docker:all
For release validation, prefer the reusable GitHub workflow input:
yaml
docker_lanes: install-e2e
Multiple lanes are allowed:
yaml
docker_lanes: install-e2e bundled-channel-update-acpx
That skips the release chunk matrix and runs one targeted Docker job against the prepared GHCR images and the selected package artifact. Rerun commands generated inside GitHub artifacts include
package_artifact_run_id
,
package_artifact_name
,
docker_e2e_bare_image
, and
docker_e2e_functional_image
when available, so failed lanes can reuse the exact tarball and prepared images from the failed run. When the fix changes package contents, omit those reuse inputs so the workflow packs a new tarball. Live-only targeted reruns skip the E2E images and build only the live-test image. Release-path normal mode fans out into smaller Docker chunk jobs:
  • core
  • package-update-openai
  • package-update-anthropic
  • package-update-core
  • plugins-runtime-plugins
  • plugins-runtime-services
  • plugins-runtime-install-a
  • plugins-runtime-install-b
  • plugins-runtime-install-c
  • plugins-runtime-install-d
  • bundled-channels
OpenWebUI is folded into
plugins-runtime-services
for full release-path coverage and keeps a standalone
openwebui
chunk only for OpenWebUI-only dispatches. The legacy
package-update
,
plugins-runtime-core
,
plugins-runtime
, and
plugins-integrations
chunks still work as aggregate aliases for manual reruns, but the release workflow uses the split chunks so provider installer checks, plugin runtime checks, bundled plugin install/uninstall shards, and bundled-channel checks can run on separate machines. The bundled-channel runtime-dependency coverage inside
bundled-channels
uses the split
bundled-channel-*
and
bundled-channel-update-*
lanes rather than the serial
bundled-channel-deps
lane, so failures produce cheap targeted reruns for the exact channel/update scenario. The bundled plugin install/uninstall sweep is also split into
bundled-plugin-install-uninstall-0
through
bundled-plugin-install-uninstall-7
; selecting the legacy
bundled-plugin-install-uninstall
lane expands to all eight shards.
Docker运行成本较高。首先在不运行Docker的情况下检查调度器:
bash
OPENCLAW_DOCKER_ALL_DRY_RUN=1 pnpm test:docker:all
OPENCLAW_DOCKER_ALL_DRY_RUN=1 OPENCLAW_DOCKER_ALL_LANES=install-e2e pnpm test:docker:all
OPENCLAW_DOCKER_ALL_LANES=install-e2e node scripts/test-docker-all.mjs --plan-json
仅当被明确要求或GitHub不可用时,才在本地运行单个失败的流水线:
bash
OPENCLAW_DOCKER_ALL_LANES=<lane> \
OPENCLAW_DOCKER_ALL_BUILD=0 \
OPENCLAW_DOCKER_ALL_PREFLIGHT=0 \
OPENCLAW_SKIP_DOCKER_BUILD=1 \
OPENCLAW_DOCKER_E2E_BARE_IMAGE='<prepared-bare-image>' \
OPENCLAW_DOCKER_E2E_FUNCTIONAL_IMAGE='<prepared-functional-image>' \
pnpm test:docker:all
对于发布验证,优先使用可重用的GitHub工作流输入:
yaml
docker_lanes: install-e2e
允许多个流水线:
yaml
docker_lanes: install-e2e bundled-channel-update-acpx
这会跳过发布分片矩阵,并针对准备好的GHCR镜像和选定的包工件运行一个针对性Docker作业。GitHub工件中生成的重新运行命令包括
package_artifact_run_id
package_artifact_name
docker_e2e_bare_image
docker_e2e_functional_image
(如果可用),因此失败的流水线可以重用失败运行中的精确tarball和准备好的镜像。如果修复变更了包内容,请省略这些重用输入,以便工作流打包新的tarball。仅实时的针对性重新运行会跳过E2E镜像,仅构建实时测试镜像。发布路径正常模式会展开为更小的Docker分片作业:
  • core
  • package-update-openai
  • package-update-anthropic
  • package-update-core
  • plugins-runtime-plugins
  • plugins-runtime-services
  • plugins-runtime-install-a
  • plugins-runtime-install-b
  • plugins-runtime-install-c
  • plugins-runtime-install-d
  • bundled-channels
OpenWebUI被纳入
plugins-runtime-services
以实现完整的发布路径覆盖,仅在OpenWebUI专属调度时保留独立的
openwebui
分片。旧的
package-update
plugins-runtime-core
plugins-runtime
plugins-integrations
分片仍可作为聚合别名用于手动重新运行,但发布工作流使用拆分后的分片,这样提供商安装程序检查、插件运行时检查、捆绑插件安装/卸载分片和捆绑渠道检查可以在不同机器上运行。
bundled-channels
内的捆绑渠道运行时依赖覆盖使用拆分后的
bundled-channel-*
bundled-channel-update-*
流水线,而不是串行的
bundled-channel-deps
流水线,因此失败会产生针对确切渠道/更新场景的低成本重新运行。捆绑插件安装/卸载扫描也拆分为
bundled-plugin-install-uninstall-0
bundled-plugin-install-uninstall-7
;选择旧的
bundled-plugin-install-uninstall
流水线会展开为所有8个分片。

Package Acceptance

包验收

Use the manual
Package Acceptance
workflow when the question is "does this installable package work as a product?" rather than "does this source diff pass Vitest?"
In release validation, treat Package Acceptance as the package-candidate shard inside the larger release umbrella, not as a competing full-test path. Full Release Validation and private release gauntlets should call Package Acceptance for tarball resolution, Docker product/package proof, and optional Telegram QA against the same resolved
package-under-test
artifact; keep orchestration, secret policy, blocking/advisory status, and evidence rollup in the caller.
Good defaults:
bash
gh workflow run package-acceptance.yml --ref main \
  -f source=npm \
  -f workflow_ref=main \
  -f package_spec=openclaw@beta \
  -f suite_profile=product \
  -f telegram_mode=mock-openai
Npm candidate selection:
  • Resolve the registry immediately before dispatch:
    npm view openclaw dist-tags --json --prefer-online --cache /tmp/openclaw-npm-cache-verify-$$
    and
    npm view openclaw@beta version dist.tarball dist.integrity --json --prefer-online --cache /tmp/openclaw-npm-cache-verify-$$
    .
  • If Peter asks for "latest beta", use
    source=npm
    with
    package_spec=openclaw@beta
    , then record the resolved version from
    npm view
    or the workflow summary.
  • For reruns, release proof, or comparing one known package, prefer the exact immutable spec:
    package_spec=openclaw@YYYY.M.D-beta.N
    or
    package_spec=openclaw@YYYY.M.D
    .
  • For stable package proof, use
    package_spec=openclaw@latest
    only when the question is explicitly the current stable dist-tag; otherwise pin the exact version.
  • source=npm
    only accepts registry specs for
    openclaw@beta
    ,
    openclaw@latest
    , or exact OpenClaw release versions. Do not pass semver ranges, git refs, file paths, tarball URLs, or plugin package names there.
  • If the candidate is a tarball URL, use
    source=url
    with
    package_sha256
    . If it is an Actions tarball artifact, use
    source=artifact
    . If it is an unpublished source candidate, use
    source=ref
    with a trusted ref or SHA.
  • Package acceptance tests exactly the selected package candidate. Do not apply
    openclaw update --channel beta
    fallback semantics here; if
    beta
    is absent, stale, older than
    latest
    , or points at a broken tarball, report that tag state instead of silently testing
    latest
    .
Profiles:
  • smoke
    : quick confidence that the tarball installs, can onboard a channel, can run an agent turn, and basic gateway/config lanes work.
  • package
    : release-package contract. Adds installer/update, doctor install switching, bundled plugin runtime deps, plugin install/update, and package repair lanes. This is the default native replacement for most Parallels package/update coverage.
  • product
    : package profile plus broader product surfaces: MCP channels, cron/subagent cleanup, OpenAI web search, and OpenWebUI.
  • full
    : split Docker release-path chunks with OpenWebUI.
  • custom
    : exact
    docker_lanes
    list for a focused rerun.
Candidate sources:
  • source=npm
    :
    openclaw@beta
    ,
    openclaw@latest
    , or an exact release version.
  • source=ref
    : pack
    package_ref
    using the trusted
    workflow_ref
    harness. This intentionally separates old package commits from new workflow/test code.
  • source=url
    : HTTPS
    .tgz
    plus required
    package_sha256
    .
  • source=artifact
    : download one
    .tgz
    from
    artifact_run_id
    /
    artifact_name
    .
Ref model:
  • gh workflow run ... --ref <workflow-ref>
    selects the workflow file revision GitHub executes.
  • workflow_ref
    is the trusted harness/script ref passed to reusable Docker E2E.
  • package_ref
    is the source ref to build when
    source=ref
    . It can be an older branch/tag/SHA as long as it is reachable from an OpenClaw branch or release tag.
Example: run latest package acceptance harness against an older trusted commit:
bash
gh workflow run package-acceptance.yml --ref main \
  -f workflow_ref=main \
  -f source=ref \
  -f package_ref=<branch-or-sha> \
  -f suite_profile=package \
  -f telegram_mode=mock-openai
Use
telegram_mode=mock-openai
or
telegram_mode=live-frontier
when the same resolved
package-under-test
tarball should also run through the Telegram QA workflow in the
qa-live-shared
environment. The standalone Telegram workflow still accepts a published npm spec for post-publish checks, but Package Acceptance passes the resolved artifact for
source=npm
,
ref
,
url
, and
artifact
. Use
telegram_mode=none
only when intentionally skipping Telegram credentialed package proof for a focused rerun.
Docker E2E images never copy repo sources as the app under test: the bare image is a Node/Git runner, and the functional image installs the same prebuilt npm tarball that bare lanes mount.
scripts/package-openclaw-for-docker.mjs
is the single packer for local scripts and CI and validates the tarball inventory before Docker consumes it.
scripts/test-docker-all.mjs --plan-json
is the scheduler-owned CI plan for image kind, package, live image, lane, and credential needs. Docker lane definitions live in the single scenario catalog
scripts/lib/docker-e2e-scenarios.mjs
; planner logic lives in
scripts/lib/docker-e2e-plan.mjs
.
scripts/docker-e2e.mjs
converts plan and summary JSON into GitHub outputs and step summaries. Every scheduler run writes
.artifacts/docker-tests/**/summary.json
plus
failures.json
. Read those before rerunning. Lane entries include
command
,
rerunCommand
, status, timing, timeout state, image kind, and log file path. The summary also includes top-level phase timings for preflight, image build, package prep, lane pools, and cleanup. Use
pnpm test:docker:timings <summary.json>
to rank slow lanes and phases before deciding whether a broader rerun is justified.
当你需要验证“这个可安装包是否能作为产品正常工作”,而不是“这个源代码差异是否通过Vitest测试”时,使用手动
Package Acceptance
工作流。
在发布验证中,将Package Acceptance视为更大发布集成工作流中的包候选分片,而不是竞争的完整测试路径。完整发布验证和私有发布流程应调用Package Acceptance进行tarball解析、Docker产品/包验证,以及针对相同解析的
package-under-test
工件的可选Telegram QA;将编排、密钥策略、阻塞/建议状态和证据汇总保留在调用方中。
推荐默认配置:
bash
gh workflow run package-acceptance.yml --ref main \
  -f source=npm \
  -f workflow_ref=main \
  -f package_spec=openclaw@beta \
  -f suite_profile=product \
  -f telegram_mode=mock-openai
Npm候选版本选择:
  • 调度前立即解析注册表:
    npm view openclaw dist-tags --json --prefer-online --cache /tmp/openclaw-npm-cache-verify-$$
    npm view openclaw@beta version dist.tarball dist.integrity --json --prefer-online --cache /tmp/openclaw-npm-cache-verify-$$
  • 如果Peter要求“最新beta版本”,使用
    source=npm
    package_spec=openclaw@beta
    ,然后记录
    npm view
    或工作流摘要中解析的版本。
  • 对于重新运行、发布验证或比较已知包,优先使用精确的不可变规格:
    package_spec=openclaw@YYYY.M.D-beta.N
    package_spec=openclaw@YYYY.M.D
  • 对于稳定包验证,仅当明确询问当前稳定dist-tag时,才使用
    package_spec=openclaw@latest
    ;否则固定精确版本。
  • source=npm
    仅接受
    openclaw@beta
    openclaw@latest
    或精确OpenClaw发布版本的注册表规格。不要传递语义版本范围、git引用、文件路径、tarball URL或插件包名称。
  • 如果候选版本是tarball URL,使用
    source=url
    并提供
    package_sha256
    。如果是Actions tarball工件,使用
    source=artifact
    。如果是未发布的源代码候选版本,使用
    source=ref
    并提供可信的引用或SHA。
  • 包验收会精确测试选定的包候选版本。不要在此处应用
    openclaw update --channel beta
    的回退语义;如果
    beta
    不存在、过期、早于
    latest
    或指向损坏的tarball,请报告该标签状态,而不是静默测试
    latest
配置文件:
  • smoke
    :快速确认tarball可安装、可接入渠道、可运行Agent轮次,以及基本网关/配置流水线正常工作。
  • package
    :发布包契约。添加安装程序/更新、医生安装切换、捆绑插件运行时依赖、插件安装/更新和包修复流水线。这是大多数Parallels包/更新覆盖的默认原生替代方案。
  • product
    :包配置文件加上更广泛的产品场景:MCP渠道、定时任务/子Agent清理、OpenAI网页搜索和OpenWebUI。
  • full
    :包含OpenWebUI的拆分Docker发布路径分片。
  • custom
    :针对聚焦重新运行的精确
    docker_lanes
    列表。
候选版本来源:
  • source=npm
    openclaw@beta
    openclaw@latest
    或精确发布版本。
  • source=ref
    :使用可信的
    workflow_ref
    测试工具打包
    package_ref
    。这有意将旧包提交与新工作流/测试代码分离。
  • source=url
    :HTTPS
    .tgz
    加上必填的
    package_sha256
  • source=artifact
    :从
    artifact_run_id
    /
    artifact_name
    下载一个
    .tgz
引用模型:
  • gh workflow run ... --ref <workflow-ref>
    选择GitHub执行的工作流文件版本。
  • workflow_ref
    是传递给可重用Docker E2E的可信测试工具/脚本引用。
  • package_ref
    source=ref
    时要构建的源代码引用。只要它可从OpenClaw分支或发布标签访问,就可以是旧的分支/标签/SHA。
示例:使用最新的包验收测试工具针对旧的可信提交运行:
bash
gh workflow run package-acceptance.yml --ref main \
  -f workflow_ref=main \
  -f source=ref \
  -f package_ref=<branch-or-sha> \
  -f suite_profile=package \
  -f telegram_mode=mock-openai
当相同解析的
package-under-test
tarball也需要在
qa-live-shared
环境中通过Telegram QA工作流运行时,使用
telegram_mode=mock-openai
telegram_mode=live-frontier
。独立的Telegram工作流仍接受已发布的npm规格用于发布后检查,但Package Acceptance会为
source=npm
ref
url
artifact
传递解析后的工件。仅当有意为聚焦重新运行跳过Telegram认证包验证时,才使用
telegram_mode=none
Docker E2E镜像永远不会将仓库源代码作为被测应用复制:基础镜像是Node/Git运行器,功能镜像安装与基础流水线挂载的相同预构建npm tarball。
scripts/package-openclaw-for-docker.mjs
是本地脚本和CI的单一打包器,会在Docker使用前验证tarball清单。
scripts/test-docker-all.mjs --plan-json
是调度器所属的CI计划,包含镜像类型、包、实时镜像、流水线和密钥需求。Docker流水线定义位于单一场景目录
scripts/lib/docker-e2e-scenarios.mjs
;规划器逻辑位于
scripts/lib/docker-e2e-plan.mjs
scripts/docker-e2e.mjs
将计划和摘要JSON转换为GitHub输出和步骤摘要。每个调度器运行都会写入
.artifacts/docker-tests/**/summary.json
failures.json
。重新运行前请阅读这些文件。流水线条目包括
command
rerunCommand
、状态、时间、超时状态、镜像类型和日志文件路径。摘要还包括预检查、镜像构建、包准备、流水线池和清理的顶级阶段时间。在决定是否需要更广泛的重新运行前,使用
pnpm test:docker:timings <summary.json>
对慢流水线和阶段进行排名。

Cheap Docker Reruns

低成本Docker重新运行

First derive the smallest rerun command from artifacts:
bash
pnpm test:docker:rerun <github-run-id>
pnpm test:docker:rerun .artifacts/docker-tests/<run>/failures.json
The script downloads Docker E2E artifacts for a GitHub run, reads
summary.json
/
failures.json
, and prints a combined targeted workflow command plus per-lane commands. Prefer the combined targeted command when several lanes failed for the same patch:
bash
gh workflow run openclaw-live-and-e2e-checks-reusable.yml \
  -f ref=<sha> \
  -f include_repo_e2e=false \
  -f include_release_path_suites=false \
  -f include_openwebui=false \
  -f docker_lanes='install-e2e bundled-channel-update-acpx' \
  -f include_live_suites=false \
  -f live_models_only=false
That path still runs the prepare job, so it creates a new tarball for
<sha>
. If the SHA-tagged GHCR bare/functional image already exists, CI skips rebuilding that image and only uploads the fresh package artifact before the targeted lane job. Do not rerun the full release path unless the failed lane list or touched surface really requires it.
首先从工件中推导最小的重新运行命令:
bash
pnpm test:docker:rerun <github-run-id>
pnpm test:docker:rerun .artifacts/docker-tests/<run>/failures.json
该脚本会下载GitHub运行的Docker E2E工件,读取
summary.json
/
failures.json
,并打印组合的针对性工作流命令以及每个流水线的命令。当多个流水线因同一补丁失败时,优先使用组合的针对性命令:
bash
gh workflow run openclaw-live-and-e2e-checks-reusable.yml \
  -f ref=<sha> \
  -f include_repo_e2e=false \
  -f include_release_path_suites=false \
  -f include_openwebui=false \
  -f docker_lanes='install-e2e bundled-channel-update-acpx' \
  -f include_live_suites=false \
  -f live_models_only=false
该路径仍会运行准备作业,因此会为
<sha>
创建新的tarball。如果带有SHA标签的GHCR基础/功能镜像已存在,CI会跳过重建该镜像,仅在针对性流水线作业前上传新的包工件。除非失败流水线列表或受影响范围确实需要,否则不要重新运行完整的发布路径。

Docker Expected Timings

Docker预期时间

Treat these as ballpark. Blacksmith queue time, GHCR pull speed, provider latency, npm cache state, and Docker daemon health can dominate.
Current local timing artifact (
.artifacts/docker-tests/lane-timings.json
) has these rough bands:
  • Tiny lanes, seconds to under 1 minute:
    agents-delete-shared-workspace
    ~3s,
    plugin-update
    ~7s,
    config-reload
    ~14s,
    pi-bundle-mcp-tools
    ~15s,
    onboard
    ~18s,
    session-runtime-context
    ~20s,
    gateway-network
    ~34s,
    qr
    ~44s.
  • Medium deterministic lanes, ~1-5 minutes:
    npm-onboard-channel-agent
    ~96s,
    openai-image-auth
    ~99s, bundled channel/update lanes usually ~90-300s when split,
    openwebui
    ~225s,
    mcp-channels
    ~274s.
  • Heavy deterministic lanes, ~6-10 minutes:
    bundled-channel-root-owned
    ~429s,
    bundled-channel-setup-entry
    ~420s,
    bundled-channel-load-failure
    ~383s,
    cron-mcp-cleanup
    ~567s.
  • Live provider lanes, often ~15-20 minutes:
    live-gateway
    ~958s,
    live-models
    ~1054s.
  • Installer/release lanes:
    install-e2e
    and package-update paths can vary widely with npm, provider, and package registry behavior. Budget tens of minutes; prefer GitHub targeted reruns over local repeats.
Default fallback lane timeout is 120 minutes. A timeout usually means debug the lane log/artifacts first, not “run the whole thing again.”
将这些视为大致参考。Blacksmith排队时间、GHCR拉取速度、提供商延迟、npm缓存状态和Docker守护进程健康状况可能占主导。
当前本地时间工件(
.artifacts/docker-tests/lane-timings.json
)有以下大致区间:
  • 小型流水线,几秒到1分钟以内:
    agents-delete-shared-workspace
    ~3秒,
    plugin-update
    ~7秒,
    config-reload
    ~14秒,
    pi-bundle-mcp-tools
    ~15秒,
    onboard
    ~18秒,
    session-runtime-context
    ~20秒,
    gateway-network
    ~34秒,
    qr
    ~44秒。
  • 中等确定性流水线,约1-5分钟:
    npm-onboard-channel-agent
    ~96秒,
    openai-image-auth
    ~99秒,捆绑渠道/更新流水线拆分后通常约90-300秒,
    openwebui
    ~225秒,
    mcp-channels
    ~274秒。
  • 大型确定性流水线,约6-10分钟:
    bundled-channel-root-owned
    ~429秒,
    bundled-channel-setup-entry
    ~420秒,
    bundled-channel-load-failure
    ~383秒,
    cron-mcp-cleanup
    ~567秒。
  • 实时提供商流水线,通常约15-20分钟:
    live-gateway
    ~958秒,
    live-models
    ~1054秒。
  • 安装程序/发布流水线:
    install-e2e
    和包更新路径的时间可能因npm、提供商和包注册表行为而有很大差异。预算几十分钟;优先选择GitHub针对性重新运行,而不是本地重复运行。
默认的流水线超时回退为120分钟。超时通常意味着首先调试流水线日志/工件,而不是“再次运行整个流程”。

Failure Workflow

故障处理流程

  1. Identify exact failing job, SHA, lane, and artifact path.
  2. Read
    failures.json
    ,
    summary.json
    , and the failed lane log tail.
  3. Use
    pnpm test:docker:rerun <run-id|failures.json>
    to generate targeted GitHub rerun commands.
  4. If the lane has
    rerunCommand
    , use that only as a local starting point.
  5. For Docker release failures, dispatch targeted
    docker_lanes=<failed-lane>
    on GitHub before considering local Docker.
  6. Patch narrowly, then rerun the failed file/lane only.
  7. Broaden to
    pnpm check:changed
    or CI only after the isolated proof passes.
  1. 确定确切的失败作业、SHA、流水线和工件路径。
  2. 阅读
    failures.json
    summary.json
    和失败流水线的日志尾部。
  3. 使用
    pnpm test:docker:rerun <run-id|failures.json>
    生成针对性的GitHub重新运行命令。
  4. 如果流水线有
    rerunCommand
    ,仅将其作为本地起点使用。
  5. 对于Docker发布失败,在考虑本地Docker之前,先在GitHub上调度针对性的
    docker_lanes=<failed-lane>
  6. 进行窄范围修复,然后仅重新运行失败的文件/流水线。
  7. 仅在孤立验证通过后,才扩大到
    pnpm check:changed
    或CI。

When To Escalate

何时升级处理

  • Public SDK/plugin contract changes: run changed gate plus relevant extension validation.
  • Build output, lazy imports, package boundaries, or published surfaces: include
    pnpm build
    .
  • Workflow edits: run
    pnpm check:workflows
    .
  • Release branch or tag validation: use release docs and GitHub workflows; avoid local Docker unless Peter explicitly asks.
  • 公共SDK/插件契约变更:运行变更网关测试加上相关的扩展验证。
  • 构建输出、延迟导入、包边界或发布表面变更:包含
    pnpm build
  • 工作流编辑:运行
    pnpm check:workflows
  • 发布分支或标签验证:使用发布文档和GitHub工作流;除非Peter明确要求,否则避免使用本地Docker。