qa-run

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Arguments: $FEATURE (or "all" for entire suite)

Read CLAUDE.md before doing anything else. Ensure the dev server is running before proceeding.

━━━ BROWSER AGENT (Playwright MCP — runs first) ━━━

Before writing ANY test file, explore the live application using Playwright MCP. This is non-negotiable for web applications. You MUST see the real app first.

STEP A — NAVIGATE EVERY SCREEN: For each screen related to $FEATURE: 1. browser_navigate to the screen URL 2. browser_snapshot — capture the accessibility tree (This gives you the REAL selectors, roles, and accessible names. Never guess selectors. Always get them from the live app.) 3. browser_take_screenshot — save to qa/browser-tests/$FEATURE/ 4. Compare what you see against docs/SCREENS.md and wireframes/ 5. Log any discrepancies immediately

STEP B — TEST EVERY INTERACTION: On each screen: 1. browser_click every button — verify correct result 2. browser_type into every input — verify it accepts input 3. browser_select_option on every dropdown 4. browser_press_key Tab through the page — verify focus order 5. browser_press_key Enter on focused buttons — verify activation 6. For forms: submit with valid data, empty data, and invalid data

STEP C — TEST THE HAPPY PATH LIVE: Read P0 acceptance criteria from docs/PRD.md. Execute each Given/When/Then by actually doing it in the browser: - browser_navigate to start - browser_type / browser_click / browser_select_option to perform actions - browser_verify_text_visible / browser_verify_element_visible for assertions - browser_take_screenshot at each step

STEP D — RESPONSIVE CHECK: For the 3 most important screens: browser_resize width=1440 height=900 → browser_take_screenshot (desktop) browser_resize width=768 height=1024 → browser_take_screenshot (tablet) browser_resize width=375 height=812 → browser_take_screenshot (mobile) Verify: no overflow, no cut-off content, touch targets ≥ 44px on mobile

STEP E — HEALTH CHECK: browser_console_messages — flag any JavaScript errors or warnings browser_network_requests — flag any failed requests (4xx/5xx)

STEP F — GENERATE INITIAL TEST FILES: Use browser_generate_playwright_test to create .spec.ts files from your session. Save to: tests/e2e/$FEATURE-browser.spec.ts These become the foundation that the Engineer Agent refines below.

Output: qa/browser-tests/$FEATURE/exploration.md (Summary of what was found: working elements, broken elements, missing elements, selectors discovered, accessibility tree findings)

━━━ ANALYST AGENT ━━━

Read the source code for $FEATURE. Read qa/plans/ for any existing test coverage on this feature. Read qa/browser-tests/$FEATURE/exploration.md for live browser findings.

Map every testable surface:

Every user-facing interaction (clicks, inputs, form submissions, keyboard nav)
Every API call this feature makes and its possible response shapes
Every UI state: loading, empty, error, success, partial data
Every data-testid attribute or accessible role present in the DOM
Every validation rule (client-side and server-side)
Every route or navigation this feature triggers

Output: qa/plans/$FEATURE.md

━━━ PLANNER AGENT ━━━

Read qa/plans/$FEATURE.md. Assign priority and write a Given/When/Then for each:

P0 — "If this breaks, the product is unusable" (auth flows, data saving, core feature paths) P1 — "If this breaks, a significant feature is degraded" (secondary flows, important edge cases) P2 — "Edge case — good to have covered" (unusual inputs, rare states, nice-to-have validation)

Also include for each screen:

Empty state scenario (user has no data yet)
Error state scenario (network fails, server returns 500)
Mobile viewport scenario (at least for P0 items)

Output: qa/plans/$FEATURE-prioritized.md

━━━ ENGINEER AGENT ━━━

CRITICAL: The dev server must be running. Use Playwright MCP to navigate the actual, running application before writing any test.

For each scenario in qa/plans/$FEATURE-prioritized.md:

Navigate to the relevant route using Playwright MCP
Confirm the element you intend to target is visible and accessible
Note the exact accessible role, label, or testId
Then write the Playwright test

Write all tests to: tests/e2e/$FEATURE.spec.ts

Playwright rules — these are absolute, no exceptions: ALLOWED: getByRole('button', { name: 'Save' }) ALLOWED: getByLabel('Email address') ALLOWED: getByText('No transactions yet') ALLOWED: getByTestId('transaction-list') FORBIDDEN: page.$('.save-btn') FORBIDDEN: page.$('#submit') FORBIDDEN: page.$x('//button[@class="primary"]') FORBIDDEN: page.waitForTimeout(3000) <- use expect().toBeVisible() instead

Every test must:

Have a descriptive name explaining what it verifies
Assert a specific, meaningful outcome (not just "doesn't crash")
Use proper async/await throughout
Clean up any data it creates (use beforeEach/afterEach hooks)

━━━ SENTINEL AGENT ━━━

Read tests/e2e/$FEATURE.spec.ts line by line.

BLOCK (stop QA loop, return to Engineer) if any of these exist:

Any selector containing "." or "#" or "//"
Any action missing an await keyword
Any test block with zero assertions (expect() calls)
Any page.waitForTimeout() greater than 2000ms
Any test that only navigates and clicks with no assertion

WARN (flag but do not block) for:

Test names that don't clearly describe the scenario
Missing afterEach cleanup for data-creating tests
Tests that could affect each other's state

Output: qa/audits/$FEATURE-audit.md

If blockers found: list exact line numbers. Return to Engineer. If no blockers: proceed.

━━━ EXECUTION ━━━

Run: npx playwright test tests/e2e/$FEATURE.spec.ts --reporter=json Save full output to: qa/runs/$FEATURE-latest.json

━━━ HEALER AGENT (runs only if failures exist) ━━━

For each failed test:

Read the full error message and attached screenshot
Navigate to the failing page using Playwright MCP to inspect current state
Make a determination:

BROKEN TEST (the test is wrong): -> The page structure changed, selector no longer exists, or the expected text changed (not a regression, just drift) -> Fix: update the selector or assertion to match current reality -> Re-run the specific test -> If fixed: continue

CONFIRMED BUG (the application is wrong): -> The feature is not behaving as the PRD acceptance criteria describe -> Do NOT fix the test to hide the bug -> Create: qa/bugs/$FEATURE-[timestamp].md with: - Which test failed - What the expected behaviour is (from PRD) - What the actual behaviour is - Screenshot path - Steps to reproduce -> STOP the QA loop -> Report: "Bug confirmed in $FEATURE. QA loop stopped. Run /build $FEATURE with this bug report to fix."

Maximum 3 fix attempts per test before treating as confirmed bug.

━━━ EXPANDER AGENT (runs only if all tests pass) ━━━

Review qa/plans/$FEATURE-prioritized.md and tests/e2e/$FEATURE.spec.ts.

Find gaps — scenarios not yet covered. Look specifically for:

What happens when the user submits an empty form?
What happens at maximum input length (e.g. 10,000 character input)?
What happens if the user navigates away mid-flow and returns?
What happens if the user hits browser back/forward?
What happens on a very slow connection? (use Playwright network throttling)
What happens if the user is not authenticated and tries this feature?
What happens with special characters or emoji in text inputs?

Add 3-5 new tests to tests/e2e/$FEATURE.spec.ts. Append the new scenarios to qa/plans/$FEATURE-prioritized.md. Run the full suite again: npx playwright test tests/e2e/$FEATURE.spec.ts

━━━ SNAPSHOT AGENT ━━━

For every page involved in $FEATURE, capture screenshots at three viewports:

Desktop: 1440 x 900
Tablet: 768 x 1024
Mobile: 375 x 812

Save to: qa/visual-baselines/$FEATURE/[screen]-desktop.png qa/visual-baselines/$FEATURE/[screen]-tablet.png qa/visual-baselines/$FEATURE/[screen]-mobile.png

FIRST RUN BEHAVIOUR: These screenshots ARE the baseline. Save them. Document in qa/visual-baselines/$FEATURE/README.md:

Date baseline was created
What build/commit this represents
Any known intentional visual quirks

SUBSEQUENT RUN BEHAVIOUR: Run: npx playwright test --project=visual Compare each screenshot against baseline. If pixel difference > 2%: flag as visual regression. Save diff images to: qa/visual-reports/$FEATURE-[date]-diff.png A visual regression is treated the same as a test failure.

TO INTENTIONALLY UPDATE BASELINE: Run: npx playwright test --project=visual --update-snapshots Commit new baseline files. Document what changed and why in qa/visual-baselines/$FEATURE/README.md.

━━━ QUALITY GATE ━━━

Calculate score:

P0 tests: All must pass. Any P0 failure = score 0 = STOP HERE. P0 passing: 40 points P1 passing: [passing / total] x 30 points P2 passing: [passing / total] x 15 points Visual match: All snapshots match baseline = 15 points Any visual regression = 0 points for this category

TOTAL POSSIBLE: 100 points

Score < 85: FAIL -> Write full report to qa/QUALITY_LOG.md -> Output to user: which tests failed, which snapshots regressed, what the likely causes are -> "Run /build $FEATURE with this report to address failures."

Score >= 85: PASS -> Append to qa/QUALITY_LOG.md: date, feature, score, test count -> "QA passed for $FEATURE. Score: [X]/100. Proceed to CI."

参数：$FEATURE（或传入"all"以测试整个套件）

在执行任何操作前，请先阅读CLAUDE.md。请确保开发服务器已启动，再继续后续操作。

━━━ 浏览器Agent（Playwright MCP — 优先执行） ━━━

在编写任何测试文件之前，请使用Playwright MCP探索实际运行的应用程序。对于Web应用来说，这是必不可少的步骤。你必须先查看真实的应用界面。

步骤A — 遍历所有相关页面：针对与$FEATURE相关的每个页面： 1. 使用browser_navigate导航至该页面的URL 2. 执行browser_snapshot — 捕获可访问性树（这将为你提供真实的选择器、角色和可访问名称。绝不要猜测选择器，务必从运行中的应用获取。） 3. 执行browser_take_screenshot — 将截图保存至qa/browser-tests/$FEATURE/ 4. 将实际看到的内容与docs/SCREENS.md和线框图进行对比 5. 立即记录所有不一致之处

步骤B — 测试所有交互操作：在每个页面上： 1. 点击每个按钮 — 验证操作结果是否正确 2. 在每个输入框中输入内容 — 验证是否可正常输入 3. 在每个下拉菜单中执行browser_select_option操作 4. 按Tab键遍历页面 — 验证焦点顺序是否正确 5. 在获得焦点的按钮上按Enter键 — 验证是否可正常激活 6. 针对表单：分别使用有效数据、空数据和无效数据进行提交测试

步骤C — 实际执行主路径测试：阅读docs/PRD.md中的P0验收标准。通过在浏览器中实际操作来执行每个Given/When/Then步骤： - 使用browser_navigate导航至起始页面 - 使用browser_type / browser_click / browser_select_option执行操作 - 使用browser_verify_text_visible / browser_verify_element_visible进行断言 - 在每个步骤执行browser_take_screenshot

步骤D — 响应式检查：针对3个最重要的页面：执行browser_resize width=1440 height=900 → browser_take_screenshot（桌面端）执行browser_resize width=768 height=1024 → browser_take_screenshot（平板端）执行browser_resize width=375 height=812 → browser_take_screenshot（移动端）验证：无内容溢出、无内容被截断、移动端触摸目标尺寸≥44px

步骤E — 健康检查：执行browser_console_messages — 标记所有JavaScript错误或警告执行browser_network_requests — 标记所有失败的请求（4xx/5xx状态码）

步骤F — 生成初始测试文件：使用browser_generate_playwright_test从你的会话中创建.spec.ts文件。保存至：tests/e2e/$FEATURE-browser.spec.ts 这些文件将作为后续工程师Agent优化的基础。

输出文件：qa/browser-tests/$FEATURE/exploration.md （记录探索结果摘要：正常工作的元素、损坏的元素、缺失的元素、发现的选择器、可访问性树相关结论）

━━━ 分析师Agent ━━━

阅读$FEATURE的源代码。阅读qa/plans/中与该功能相关的现有测试覆盖文档。阅读qa/browser-tests/$FEATURE/exploration.md中的浏览器探索结果。

梳理所有可测试的维度：

所有用户交互操作（点击、输入、表单提交、键盘导航）
该功能发起的所有API调用及其可能的响应格式
所有UI状态：加载中、空状态、错误、成功、部分数据
DOM中存在的所有data-testid属性或可访问角色
所有验证规则（客户端和服务端）
该功能触发的所有路由或导航行为

输出文件：qa/plans/$FEATURE.md

━━━ 规划师Agent ━━━

阅读qa/plans/$FEATURE.md。为每个测试项分配优先级，并编写对应的Given/When/Then场景：

P0 — "如果该部分失效，产品将无法使用" （认证流程、数据保存、核心功能路径） P1 — "如果该部分失效，重要功能将降级" （次要流程、重要边缘场景） P2 — "边缘场景——建议覆盖" （异常输入、罕见状态、锦上添花的验证规则）

同时为每个页面添加以下场景：

空状态场景（用户暂无数据）
错误状态场景（网络故障、服务器返回500）
移动端视口场景（至少覆盖P0项）

输出文件：qa/plans/$FEATURE-prioritized.md

━━━ 工程师Agent ━━━

重要提示：开发服务器必须处于运行状态。在编写任何测试之前，请使用Playwright MCP 导航至实际运行的应用程序。

针对qa/plans/$FEATURE-prioritized.md中的每个场景：

使用Playwright MCP导航至相关路由
确认你要定位的元素可见且可访问
记录确切的可访问角色、标签或testId
编写Playwright测试用例

将所有测试写入：tests/e2e/$FEATURE.spec.ts

Playwright规则 — 以下规则为硬性要求，无例外：允许使用：getByRole('button', { name: 'Save' }) 允许使用：getByLabel('Email address') 允许使用：getByText('No transactions yet') 允许使用：getByTestId('transaction-list') 禁止使用：page.$('.save-btn') 禁止使用：page.$('#submit') 禁止使用：page.$x('//button[@class="primary"]') 禁止使用：page.waitForTimeout(3000) <- 请改用expect().toBeVisible()

每个测试必须满足：

具有描述性名称，说明该测试的验证内容
断言具体、有意义的结果（而非仅验证"不崩溃"）
全程正确使用async/await
清理所有测试创建的数据（使用beforeEach/afterEach钩子）

━━━ 哨兵Agent ━━━

逐行阅读tests/e2e/$FEATURE.spec.ts。

如果存在以下任何情况，将阻止QA循环继续执行，返回至工程师Agent：

任何包含"."、"#"或"//"的选择器
任何缺少await关键字的操作
任何未包含断言（expect()调用）的测试块
任何page.waitForTimeout()超过2000ms的代码
仅执行导航和点击操作但无任何断言的测试

以下情况将发出警告（标记但不阻止）：

测试名称无法清晰描述场景
创建数据的测试缺少afterEach清理步骤
可能互相影响状态的测试

输出文件：qa/audits/$FEATURE-audit.md

如果发现阻塞问题：列出具体行号，返回至工程师Agent。如果无阻塞问题：继续执行后续流程。

━━━ 执行阶段 ━━━

运行命令：npx playwright test tests/e2e/$FEATURE.spec.ts --reporter=json 将完整输出保存至：qa/runs/$FEATURE-latest.json

━━━ 修复Agent（仅当存在测试失败时运行） ━━━

针对每个失败的测试：

阅读完整错误信息和附带的截图
使用Playwright MCP导航至失败页面，检查当前状态
做出判断：

测试用例错误（测试本身存在问题）： -> 页面结构已更改、选择器不再存在，或预期文本已更改（并非回归问题，仅为测试与实际脱节） -> 修复方式：更新选择器或断言以匹配当前实际情况 -> 重新运行该特定测试 -> 若修复成功：继续执行

确认存在Bug（应用程序本身存在问题）： -> 功能表现不符合PRD验收标准的描述 -> 请勿修改测试来掩盖Bug -> 创建文件：qa/bugs/$FEATURE-[timestamp].md，包含以下内容： - 哪个测试失败 - 预期行为（来自PRD） - 实际行为 - 截图路径 - 复现步骤 -> 停止QA循环 -> 输出："$FEATURE中已确认存在Bug。QA循环已停止。请携带此Bug报告运行/build $FEATURE以修复问题。"

每个测试最多尝试修复3次，若仍失败则判定为已确认Bug。

━━━ 扩展Agent（仅当所有测试通过时运行） ━━━

回顾qa/plans/$FEATURE-prioritized.md和tests/e2e/$FEATURE.spec.ts。

找出测试覆盖缺口 — 尚未覆盖的场景。重点关注：

用户提交空表单时会发生什么？
输入达到最大长度（例如10000字符）时会发生什么？
用户在流程中途导航离开并返回时会发生什么？
用户点击浏览器前进/后退按钮时会发生什么？
在极慢网络环境下会发生什么？（使用Playwright网络节流功能）
未认证用户尝试使用该功能时会发生什么？
文本输入中包含特殊字符或表情符号时会发生什么？

在tests/e2e/$FEATURE.spec.ts中添加3-5个新测试用例。将新场景追加至qa/plans/$FEATURE-prioritized.md。重新运行完整测试套件：npx playwright test tests/e2e/$FEATURE.spec.ts

━━━ 快照Agent ━━━

针对$FEATURE涉及的每个页面，在三个视口尺寸下捕获截图：

桌面端：1440 x 900
平板端：768 x 1024
移动端：375 x 812

保存至：qa/visual-baselines/$FEATURE/[screen]-desktop.png qa/visual-baselines/$FEATURE/[screen]-tablet.png qa/visual-baselines/$FEATURE/[screen]-mobile.png

首次运行行为：这些截图将作为基准保存。在qa/visual-baselines/$FEATURE/README.md中记录：

基准创建日期
对应的构建/提交版本
任何已知的有意设计的视觉差异

后续运行行为：运行命令：npx playwright test --project=visual 将每个截图与基准进行对比。若像素差异>2%：标记为视觉回归。将差异图片保存至：qa/visual-reports/$FEATURE-[date]-diff.png 视觉回归将被视为测试失败。

若需主动更新基准：运行命令：npx playwright test --project=visual --update-snapshots 提交新的基准文件。在qa/visual-baselines/$FEATURE/README.md中记录变更内容及原因。

━━━ 质量门 ━━━

计算得分：

P0测试：必须全部通过。任何P0测试失败 → 得分0 → 立即停止。 P0全部通过：40分 P1测试得分：[通过数 / 总数] × 30分 P2测试得分：[通过数 / 总数] × 15分视觉匹配：所有快照与基准一致 → 15分存在任何视觉回归 → 该类别得0分

总分满分：100分

得分<85：不通过 -> 将完整报告写入qa/QUALITY_LOG.md -> 向用户输出：哪些测试失败、哪些快照出现回归、可能的原因是什么 -> "请携带此报告运行/build $FEATURE以解决问题。"

得分≥85：通过 -> 向qa/QUALITY_LOG.md追加记录：日期、功能、得分、测试数量 -> "$FEATURE的QA已通过。得分：[X]/100。可继续执行CI流程。"