qa-execution

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Systematic Project QA

系统化项目QA

Required Inputs

必要输入

qa-output-path (optional): Directory where QA artifacts (issues, screenshots, verification reports) are stored. When provided, create the directory if it does not exist and use it for all QA outputs. When omitted, fall back to repository conventions or
```
/tmp/codex-qa-<slug>
```
.

qa-output-path（可选）：存储QA产物（问题、截图、验证报告）的目录。若提供该参数，需在目录不存在时创建，并将所有QA输出存入其中；若未提供，则遵循仓库约定或回退至
```
/tmp/codex-qa-<slug>
```
。

Procedures

执行流程

Step 1: Discover the Repository QA Contract

Read root instructions, repository docs, and CI/build files before running commands.
Execute
```
python3 scripts/discover-project-contract.py --root .
```
to surface candidate install, verify, build, test, lint, start commands, Web UI signals, and E2E signals.
Read
```
references/project-signals.md
```
when command ownership is ambiguous or when multiple ecosystems are present.
Read
```
references/e2e-coverage.md
```
to decide whether the repository already supports public-surface automated coverage and how strong that support is.
Prefer repository-defined umbrella commands such as
```
make verify
```
,
```
just verify
```
, or CI entrypoints over language-default commands.
Identify the changed surface and the regression-critical surface before choosing scenarios.
Determine whether the project has a Web UI surface. Indicators include: a
```
start
```
or
```
dev
```
command that launches a web server, framework config files (
```
next.config.*
```
,
```
vite.config.*
```
,
```
nuxt.config.*
```
,
```
angular.json
```
,
```
svelte.config.*
```
), or HTML/template entry points. Record the dev server URL (default
```
http://localhost:3000
```
unless the project specifies otherwise).
Record the E2E contract in working notes: support detected or not, harness name, canonical command, known spec locations, and blockers.
Resolve the QA artifact directory. If the user provided a
```
qa-output-path
```
argument, use that path. Otherwise, use repository conventions. If neither exists, fall back to
```
/tmp/codex-qa-<slug>
```
. Create the
```
qa/
```
subdirectory under the resolved path if it does not exist. Store all issues, screenshots, and verification reports under
```
<qa-output-path>/qa/
```
.

Step 2: Define the QA Scope

Check whether
```
<qa-output-path>/qa/test-cases/
```
and
```
<qa-output-path>/qa/test-plans/
```
contain artifacts from a prior
```
qa-report
```
run. If they exist, read the test plans, test case IDs, and automation annotations to seed the execution matrix and prioritize P0/P1 test cases.
Build a short execution matrix covering baseline verification, changed workflows, unchanged business-critical workflows, and automation follow-up.
Read
```
references/checklist.md
```
and ensure every required category has a planned validation.
Prefer public entry points such as CLI commands, HTTP endpoints, browser flows, worker jobs, and documented setup commands over internal test helpers.
Classify each changed or regression-critical public flow as
```
existing-e2e
```
,
```
needs-e2e
```
,
```
manual-only
```
, or
```
blocked
```
.
Require the
```
needs-e2e
```
classification when the repository already supports E2E and the flow is P0, P1, release-critical smoke coverage, or a reproduced public regression. Do not downgrade such flows to
```
manual-only
```
without a concrete reason.
When a Web UI surface exists, read
```
references/web-ui-qa.md
```
and select 3-5 critical user flows to exercise through the browser. Prioritize flows that cover the changed surface and the most business-critical paths.
Create the smallest realistic fixture or fake project needed to exercise the workflow when the repository does not already include one.
Treat mocks as a local unit-test boundary only. Do not use mocks or stubs as final proof that a user flow works.

Step 3: Establish the Baseline

Install dependencies with the repository-preferred command before testing runtime flows.
Run the canonical verification gate once before scenario testing to establish baseline health. Execute in fastest-first order: lint and type-check, then build, then unit tests, then integration tests.
If the E2E command is separate from the umbrella gate, decide whether to run it in baseline now or after runtime prerequisites are ready, then record that plan explicitly.
If the baseline fails, read the first failing output carefully and determine whether it is pre-existing or introduced by current work before moving on.
When the project has a Web UI surface, start the dev server in the background using the discovered start command. Confirm readiness by waiting for the server to respond (e.g.,
```
curl -sf -o /dev/null http://localhost:<port>
```
returns 0, or startup logs emit a ready signal).
Start services in the closest supported production-like mode and confirm readiness through observable signals such as health checks, startup logs, or successful handshakes.

Step 4: Execute CLI and API Flows

Drive CLI and API workflows through the same interfaces a real operator or user would use.
Capture the exact command, input, and observable result for each scenario.
Validate changed features first, then validate at least one regression-critical flow outside the changed surface.
Exercise live integrations when credentials and local prerequisites exist. When they do not, validate every reachable local boundary and record the blocked live step explicitly.
Record whether each validated flow already has matching automated coverage or should move to
```
needs-e2e
```
.
Re-run the scenario from a clean state when the first attempt leaves the environment ambiguous.

Step 5: Execute Web UI Flows

Skip this step if the project has no Web UI surface.

Read
```
references/web-ui-qa.md
```
for the full browser testing procedure and checklist.
Use the
```
agent-browser
```
CLI (from the
```
agent-browser
```
companion skill) for all browser interactions. The core loop is: open, snapshot, interact, re-snapshot, verify. Valid commands are:
```
open
```
,
```
back
```
,
```
forward
```
,
```
reload
```
,
```
snapshot -i
```
,
```
click @ref
```
,
```
fill @ref "text"
```
,
```
select @ref "value"
```
,
```
press Key
```
,
```
check @ref
```
,
```
uncheck @ref
```
,
```
wait
```
,
```
get text @ref
```
,
```
get url
```
,
```
get title
```
,
```
screenshot
```
,
```
state save
```
,
```
state load
```
,
```
close
```
. Do not invent commands outside this set.
For each critical user flow identified in Step 2: a. Navigate to the entry URL:
```
agent-browser open <url>
```
. b. Take an interactive snapshot:
```
agent-browser snapshot -i
```
to get element refs (
```
@e1
```
,
```
@e2
```
, etc.). c. Execute the planned interactions using refs:
```
agent-browser click @e1
```
,
```
agent-browser fill @e2 "text"
```
, etc. d. Re-snapshot after every navigation or significant DOM change. Refs become stale after page transitions. e. Verify the expected outcome by checking element text, page URL, or visible state via snapshot output. f. Capture screenshot evidence:
```
agent-browser screenshot <qa-output-path>/qa/screenshots/<flow-name>.png
```
.
Test critical form flows: fill valid data and verify success, fill invalid data and verify error messages appear.
When the changed surface includes responsive behavior, test at multiple viewports. Read the viewport testing section of
```
references/web-ui-qa.md
```
for session setup.
Verify navigation flows: page transitions, back/forward, deep links, and 404 handling.
Check error and loading states: trigger error conditions and verify the UI handles them gracefully.
Map each browser flow to its automation classification. When a harness exists but no matching spec exists, keep the flow in
```
needs-e2e
```
until coverage is added or the blocker is documented.
Close the browser session after all flows complete:
```
agent-browser close
```
.

Step 6: Diagnose and Fix Regressions

Reproduce each failure consistently before proposing a fix.
Activate companion debugging and test-hygiene skills when available, especially root-cause debugging and anti-workaround guidance.
Add or update the narrowest regression coverage that proves the bug when the repository supports automated coverage for that surface.
When the repository already supports E2E and the failure affects a public browser, HTTP, or CLI flow, add or update E2E coverage instead of stopping at unit or integration proof.
If the harness does not exist, keep manual proof and record the blocker rather than bootstrapping a new E2E framework during QA.
Fix production code or real configuration at the source of the failure. Do not weaken tests to match broken behavior.
Re-run the narrow reproduction, updated automated coverage, impacted scenario, and baseline gate after each fix.
For Web UI regressions, reproduce the visual failure with
```
agent-browser
```
, capture before/after screenshots under
```
<qa-output-path>/qa/screenshots/
```
, and verify the fix through the same browser flow.
Use
```
assets/issue-template.md
```
to write issue files under
```
<qa-output-path>/qa/issues/
```
. Create the subdirectory if it does not exist. Name each file using the
```
BUG-<num>.md
```
convention (e.g.,
```
BUG-001.md
```
). Assign Severity (Critical/High/Medium/Low) and Priority (P0-P3) to every issue. When an issue was discovered while executing a test case from
```
qa-report
```
, include the TC-ID in the Related section and fill in the automation follow-up fields.

Step 7: Verify the Final State

Re-run the full repository verification gate from scratch after the last code change.
Re-run the most important CLI and API scenarios after the full gate passes.
Re-run the narrow E2E specs that were added or updated and, when the repository supports E2E, re-run the canonical E2E command or the smallest repository-defined subset that covers the touched critical flows.
When Web UI flows were tested, re-run the critical browser flows and capture final screenshot evidence.
Summarize the evidence using
```
assets/verification-report-template.md
```
and write the report to
```
<qa-output-path>/qa/verification-report.md
```
. The report must include these mandatory fields: Claim, Command, Executed timestamp, Exit code, Output summary, Warnings, Errors, Verdict (PASS or FAIL), plus an Automated Coverage section with support detected, required flows, specs added or updated, commands executed, and manual-only or blocked items. When Web UI flows were tested, append a Browser Evidence section with: Dev server URL, Flows tested count, per-flow entry (name, entry URL, final URL, verdict, screenshot path), Viewports tested, Authentication method, and Blocked flows.
Report blocked scenarios, missing credentials, or environment gaps with the exact command or prerequisite that stopped execution.
Do not claim completion without fresh verification evidence from the current state of the repository.

步骤1：发现仓库QA契约

在运行命令前，阅读根目录说明文档、仓库文档和CI/构建文件。
执行
```
python3 scripts/discover-project-contract.py --root .
```
，识别候选的安装、验证、构建、测试、代码检查、启动命令，以及Web UI信号和E2E信号。
当命令归属不明确或存在多个技术栈时，阅读
```
references/project-signals.md
```
。
阅读
```
references/e2e-coverage.md
```
，判断仓库是否已支持公网层面的自动化覆盖，以及该覆盖的强度如何。
优先使用仓库定义的统一命令，如
```
make verify
```
、
```
just verify
```
或CI入口命令，而非语言默认命令。
在选择测试场景前，识别变更范围和回归关键范围。
判断项目是否存在Web UI界面，判断指标包括：存在启动Web服务器的
```
start
```
或
```
dev
```
命令、框架配置文件（
```
next.config.*
```
、
```
vite.config.*
```
、
```
nuxt.config.*
```
、
```
angular.json
```
、
```
svelte.config.*
```
），或HTML/模板入口文件。记录开发服务器URL（默认
```
http://localhost:3000
```
，除非项目另有指定）。
在工作笔记中记录E2E契约：是否检测到支持、测试工具名称、标准命令、已知测试用例位置和阻塞点。
确定QA产物目录：若用户提供了
```
qa-output-path
```
参数，则使用该路径；否则遵循仓库约定；若两者都不存在，则回退至
```
/tmp/codex-qa-<slug>
```
。若
```
<qa-output-path>/qa/
```
子目录不存在则创建，并将所有问题、截图和验证报告存入该目录。

步骤2：定义QA范围

检查
```
<qa-output-path>/qa/test-cases/
```
和
```
<qa-output-path>/qa/test-plans/
```
是否包含之前
```
qa-report
```
运行产生的产物。若存在，阅读测试计划、测试用例ID和自动化标注，以此为基础构建执行矩阵并优先处理P0/P1级测试用例。
构建一个简短的执行矩阵，涵盖基线验证、变更工作流、未变更的业务关键工作流和自动化后续工作。
阅读
```
references/checklist.md
```
，确保每个必填类别都有对应的验证计划。
优先使用公网入口，如CLI命令、HTTP端点、浏览器流程、工作任务和文档化的设置命令，而非内部测试辅助工具。
将每个变更或回归关键的公网流程分类为
```
existing-e2e
```
（已有E2E覆盖）、
```
needs-e2e
```
（需要E2E覆盖）、
```
manual-only
```
（仅手动测试）或
```
blocked
```
（阻塞）。
当仓库已支持E2E且流程为P0、P1级，或属于发布关键的冒烟覆盖，或为已复现的公网回归问题时，必须将其归类为
```
needs-e2e
```
。若无具体理由，不得将此类流程降级为
```
manual-only
```
。
若存在Web UI界面，阅读
```
references/web-ui-qa.md
```
并选择3-5个关键用户流程通过浏览器执行。优先选择覆盖变更范围和最核心业务路径的流程。
当仓库未包含所需的测试环境时，创建最小化的真实测试数据或模拟项目以执行工作流。
仅将模拟数据视为本地单元测试的边界，不得将模拟或存根作为用户流正常工作的最终证明。

步骤3：建立基线

在测试运行时流程前，使用仓库推荐的命令安装依赖。
在场景测试前先运行一次标准验证流程，以确定基线健康状态。按从快到慢的顺序执行：代码检查和类型校验，然后是构建，接着是单元测试，最后是集成测试。
若E2E命令独立于统一验证流程，决定是现在就在基线阶段运行，还是在运行时前置条件就绪后运行，并明确记录该计划。
若基线验证失败，仔细阅读第一个失败输出，确定其为已有问题还是当前工作引入的问题，然后再继续。
若项目存在Web UI界面，使用发现的启动命令在后台启动开发服务器。通过等待服务器响应确认就绪（例如
```
curl -sf -o /dev/null http://localhost:<port>
```
返回0，或启动日志输出就绪信号）。
以最接近生产环境的支持模式启动服务，并通过可观测信号（如健康检查、启动日志或成功握手）确认服务就绪。

步骤4：执行CLI与API流程

通过真实操作员或用户会使用的相同界面来驱动CLI和API工作流。
记录每个场景的精确命令、输入和可观测结果。
先验证变更的功能，然后至少验证一个变更范围外的回归关键流程。
当存在凭据和本地前置条件时，执行实时集成测试；若不存在，则验证所有可触及的本地边界，并明确记录被阻塞的实时步骤。
记录每个已验证的流程是否已有对应的自动化覆盖，或是否应归类为
```
needs-e2e
```
。
若首次尝试后环境状态不明确，从干净状态重新运行该场景。

步骤5：执行Web UI流程

若项目无Web UI界面，跳过此步骤。

阅读
```
references/web-ui-qa.md
```
获取完整的浏览器测试流程和检查清单。
所有浏览器交互均使用
```
agent-browser
```
CLI（来自
```
agent-browser
```
配套技能）。核心流程为：打开、快照、交互、重新快照、验证。支持的命令包括：
```
open
```
、
```
back
```
、
```
forward
```
、
```
reload
```
、
```
snapshot -i
```
、
```
click @ref
```
、
```
fill @ref "text"
```
、
```
select @ref "value"
```
、
```
press Key
```
、
```
check @ref
```
、
```
uncheck @ref
```
、
```
wait
```
、
```
get text @ref
```
、
```
get url
```
、
```
get title
```
、
```
screenshot
```
、
```
state save
```
、
```
state load
```
、
```
close
```
。请勿使用此集合外的命令。
对于步骤2中识别的每个关键用户流程： a. 导航至入口URL：
```
agent-browser open <url>
```
。 b. 拍摄交互式快照：
```
agent-browser snapshot -i
```
以获取元素引用（
```
@e1
```
、
```
@e2
```
等）。 c. 使用引用执行计划的交互操作：
```
agent-browser click @e1
```
、
```
agent-browser fill @e2 "text"
```
等。 d. 在每次导航或DOM发生重大变化后重新拍摄快照。页面跳转后引用会失效。 e. 通过检查元素文本、页面URL或快照输出中的可见状态来验证预期结果。 f. 捕获截图证据：
```
agent-browser screenshot <qa-output-path>/qa/screenshots/<flow-name>.png
```
。
测试关键表单流程：填写有效数据并验证成功状态，填写无效数据并验证错误提示是否显示。
若变更范围包含响应式行为，在多个视口下进行测试。阅读
```
references/web-ui-qa.md
```
中的视口测试部分进行会话设置。
验证导航流程：页面跳转、前进/后退、深度链接和404错误处理。
检查错误和加载状态：触发错误条件并验证UI是否能优雅处理。
将每个浏览器流程映射到其自动化分类。若存在测试工具但无对应的测试用例，需将该流程保留在
```
needs-e2e
```
分类中，直到添加了覆盖或记录了阻塞点。
所有流程完成后关闭浏览器会话：
```
agent-browser close
```
。

步骤6：诊断并修复回归问题

在提出修复方案前，先持续复现每个失败场景。
若有配套的调试和测试卫生技能，激活这些技能，尤其是根本原因调试和反 workaround 指导技能。
当仓库支持该层面的自动化覆盖时，添加或更新最精准的回归覆盖用例以证明问题已修复。
当仓库已支持E2E且失败影响公网浏览器、HTTP或CLI流程时，添加或更新E2E覆盖，而非仅停留在单元或集成测试层面的验证。
若不存在测试工具，保留手动验证证据并记录阻塞点，而非在QA过程中搭建新的E2E框架。
修复生产代码或真实配置中的问题根源，不得为了匹配错误行为而弱化测试。
每次修复后，重新运行窄范围的复现用例、更新后的自动化覆盖、受影响的场景以及基线验证流程。
对于Web UI回归问题，使用
```
agent-browser
```
复现视觉故障，在
```
<qa-output-path>/qa/screenshots/
```
下捕获修复前后的截图，并通过相同的浏览器流程验证修复效果。
使用
```
assets/issue-template.md
```
在
```
<qa-output-path>/qa/issues/
```
下撰写问题文件。若该子目录不存在则创建。每个文件使用
```
BUG-<num>.md
```
命名规范（例如
```
BUG-001.md
```
）。为每个问题分配严重性（Critical/High/Medium/Low）和优先级（P0-P3）。若问题是在执行
```
qa-report
```
的测试用例时发现的，在相关部分包含TC-ID，并填写自动化后续字段。

步骤7：验证最终状态

最后一次代码变更后，从头重新运行完整的仓库验证流程。
完整验证流程通过后，重新运行最重要的CLI和API场景。
重新运行已添加或更新的窄范围E2E测试用例；若仓库支持E2E，重新运行标准E2E命令或覆盖受影响关键流程的最小仓库定义子集。
若已测试Web UI流程，重新运行关键浏览器流程并捕获最终截图证据。
使用
```
assets/verification-report-template.md
```
汇总证据，并将报告写入
```
<qa-output-path>/qa/verification-report.md
```
。报告必须包含以下必填字段：声明、命令、执行时间戳、退出码、输出摘要、警告、错误、 verdict（PASS或FAIL），以及自动化覆盖部分（包含检测到的支持情况、所需流程、添加或更新的测试用例、执行的命令，以及仅手动或阻塞的项）。若已测试Web UI流程，需附加浏览器证据部分：开发服务器URL、测试的流程数量、每个流程的条目（名称、入口URL、最终URL、 verdict、截图路径）、测试的视口、认证方式和阻塞的流程。
报告被阻塞的场景、缺失的凭据或环境差距，以及导致执行停止的精确命令或前置条件。
若无当前仓库状态下的新鲜验证证据，不得声称已完成QA。

Error Handling

错误处理

If command discovery returns multiple plausible gates, prefer the broadest repository-defined command and explain the tie-breaker.
If E2E support signals are weak or contradictory, prefer explicit config files and runnable commands before claiming that the repository supports E2E.
If no canonical verify command exists, read
```
references/project-signals.md
```
, choose the broadest safe install, lint, test, and build commands for the detected ecosystem, and state that assumption explicitly.
If a required live dependency is unavailable, validate every local boundary that does not require the missing dependency and report the blocked live validation separately.
If a workflow requires data or services absent from the repository, create the smallest realistic fixture outside the main source tree unless the repository has its own fixture convention.
If a failure appears unrelated to the requested change, prove that with a clean reproduction before excluding it from the QA scope.
If the repository has an E2E harness but credentials, runtime services, or test data prevent execution, keep the affected flow classified as
```
blocked
```
and report the exact prerequisite that is missing.
If the repository lacks an E2E harness, do not bootstrap a new framework during QA. Keep live manual evidence and document the automation gap as
```
manual-only
```
or
```
blocked
```
.
If
```
agent-browser
```
is not installed or the dev server fails to start, skip Web UI flows, document the blocker in the verification report, and continue with CLI and API validation only.
If a browser flow hangs or times out, close the session with
```
agent-browser close
```
, record the failure, and attempt the flow once more from a clean session before marking it as blocked.

若命令发现返回多个可行的验证流程，优先选择最全面的仓库定义命令，并解释选择依据。
若E2E支持信号薄弱或矛盾，优先依据明确的配置文件和可运行命令，再判断仓库是否支持E2E。
若不存在标准验证命令，阅读
```
references/project-signals.md
```
，为检测到的技术栈选择最全面的安全安装、代码检查、测试和构建命令，并明确说明该假设。
若所需的实时依赖不可用，验证所有不依赖该缺失项的本地边界，并单独报告被阻塞的实时验证。
若工作流需要仓库中不存在的数据或服务，除非仓库有自己的测试数据约定，否则在主源码树外创建最小化的真实测试数据。
若失败似乎与请求的变更无关，需通过干净的复现环境证明这一点，再将其排除在QA范围外。
若仓库有E2E工具但凭据、运行时服务或测试数据导致无法执行，将受影响的流程归类为
```
blocked
```
，并报告缺失的精确前置条件。
若仓库无E2E工具，请勿在QA过程中搭建新框架。保留手动验证证据，并将自动化缺口记录为
```
manual-only
```
或
```
blocked
```
。
若未安装
```
agent-browser
```
或开发服务器启动失败，跳过Web UI流程，在验证报告中记录阻塞点，仅继续执行CLI和API验证。
若浏览器流程挂起或超时，使用
```
agent-browser close
```
关闭会话，记录失败，然后从干净会话重新尝试一次，再标记为阻塞。