qa-run
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseArguments: $FEATURE (or "all" for entire suite)
Read CLAUDE.md before doing anything else.
Ensure the dev server is running before proceeding.
━━━ BROWSER AGENT (Playwright MCP — runs first) ━━━
Before writing ANY test file, explore the live application using Playwright MCP.
This is non-negotiable for web applications. You MUST see the real app first.
STEP A — NAVIGATE EVERY SCREEN:
For each screen related to $FEATURE:
1. browser_navigate to the screen URL
2. browser_snapshot — capture the accessibility tree
(This gives you the REAL selectors, roles, and accessible names.
Never guess selectors. Always get them from the live app.)
3. browser_take_screenshot — save to qa/browser-tests/$FEATURE/
4. Compare what you see against docs/SCREENS.md and wireframes/
5. Log any discrepancies immediately
STEP B — TEST EVERY INTERACTION:
On each screen:
1. browser_click every button — verify correct result
2. browser_type into every input — verify it accepts input
3. browser_select_option on every dropdown
4. browser_press_key Tab through the page — verify focus order
5. browser_press_key Enter on focused buttons — verify activation
6. For forms: submit with valid data, empty data, and invalid data
STEP C — TEST THE HAPPY PATH LIVE:
Read P0 acceptance criteria from docs/PRD.md.
Execute each Given/When/Then by actually doing it in the browser:
- browser_navigate to start
- browser_type / browser_click / browser_select_option to perform actions
- browser_verify_text_visible / browser_verify_element_visible for assertions
- browser_take_screenshot at each step
STEP D — RESPONSIVE CHECK:
For the 3 most important screens:
browser_resize width=1440 height=900 → browser_take_screenshot (desktop)
browser_resize width=768 height=1024 → browser_take_screenshot (tablet)
browser_resize width=375 height=812 → browser_take_screenshot (mobile)
Verify: no overflow, no cut-off content, touch targets ≥ 44px on mobile
STEP E — HEALTH CHECK:
browser_console_messages — flag any JavaScript errors or warnings
browser_network_requests — flag any failed requests (4xx/5xx)
STEP F — GENERATE INITIAL TEST FILES:
Use browser_generate_playwright_test to create .spec.ts files from your session.
Save to: tests/e2e/$FEATURE-browser.spec.ts
These become the foundation that the Engineer Agent refines below.
Output: qa/browser-tests/$FEATURE/exploration.md
(Summary of what was found: working elements, broken elements, missing elements,
selectors discovered, accessibility tree findings)
━━━ ANALYST AGENT ━━━
Read the source code for $FEATURE.
Read qa/plans/ for any existing test coverage on this feature.
Read qa/browser-tests/$FEATURE/exploration.md for live browser findings.
Map every testable surface:
- Every user-facing interaction (clicks, inputs, form submissions, keyboard nav)
- Every API call this feature makes and its possible response shapes
- Every UI state: loading, empty, error, success, partial data
- Every data-testid attribute or accessible role present in the DOM
- Every validation rule (client-side and server-side)
- Every route or navigation this feature triggers
Output: qa/plans/$FEATURE.md
━━━ PLANNER AGENT ━━━
Read qa/plans/$FEATURE.md.
Assign priority and write a Given/When/Then for each:
P0 — "If this breaks, the product is unusable"
(auth flows, data saving, core feature paths)
P1 — "If this breaks, a significant feature is degraded"
(secondary flows, important edge cases)
P2 — "Edge case — good to have covered"
(unusual inputs, rare states, nice-to-have validation)
Also include for each screen:
- Empty state scenario (user has no data yet)
- Error state scenario (network fails, server returns 500)
- Mobile viewport scenario (at least for P0 items)
Output: qa/plans/$FEATURE-prioritized.md
━━━ ENGINEER AGENT ━━━
CRITICAL: The dev server must be running. Use Playwright MCP to
navigate the actual, running application before writing any test.
For each scenario in qa/plans/$FEATURE-prioritized.md:
- Navigate to the relevant route using Playwright MCP
- Confirm the element you intend to target is visible and accessible
- Note the exact accessible role, label, or testId
- Then write the Playwright test
Write all tests to: tests/e2e/$FEATURE.spec.ts
Playwright rules — these are absolute, no exceptions:
ALLOWED: getByRole('button', { name: 'Save' })
ALLOWED: getByLabel('Email address')
ALLOWED: getByText('No transactions yet')
ALLOWED: getByTestId('transaction-list')
FORBIDDEN: page.$('.save-btn')
FORBIDDEN: page.$('#submit')
FORBIDDEN: page.$x('//button[@class="primary"]')
FORBIDDEN: page.waitForTimeout(3000) <- use expect().toBeVisible() instead
Every test must:
- Have a descriptive name explaining what it verifies
- Assert a specific, meaningful outcome (not just "doesn't crash")
- Use proper async/await throughout
- Clean up any data it creates (use beforeEach/afterEach hooks)
━━━ SENTINEL AGENT ━━━
Read tests/e2e/$FEATURE.spec.ts line by line.
BLOCK (stop QA loop, return to Engineer) if any of these exist:
- Any selector containing "." or "#" or "//"
- Any action missing an await keyword
- Any test block with zero assertions (expect() calls)
- Any page.waitForTimeout() greater than 2000ms
- Any test that only navigates and clicks with no assertion
WARN (flag but do not block) for:
- Test names that don't clearly describe the scenario
- Missing afterEach cleanup for data-creating tests
- Tests that could affect each other's state
Output: qa/audits/$FEATURE-audit.md
If blockers found: list exact line numbers. Return to Engineer.
If no blockers: proceed.
━━━ EXECUTION ━━━
Run: npx playwright test tests/e2e/$FEATURE.spec.ts --reporter=json
Save full output to: qa/runs/$FEATURE-latest.json
━━━ HEALER AGENT (runs only if failures exist) ━━━
For each failed test:
- Read the full error message and attached screenshot
- Navigate to the failing page using Playwright MCP to inspect current state
- Make a determination:
BROKEN TEST (the test is wrong):
-> The page structure changed, selector no longer exists, or
the expected text changed (not a regression, just drift)
-> Fix: update the selector or assertion to match current reality
-> Re-run the specific test
-> If fixed: continue
CONFIRMED BUG (the application is wrong):
-> The feature is not behaving as the PRD acceptance criteria describe
-> Do NOT fix the test to hide the bug
-> Create: qa/bugs/$FEATURE-[timestamp].md with:
- Which test failed
- What the expected behaviour is (from PRD)
- What the actual behaviour is
- Screenshot path
- Steps to reproduce
-> STOP the QA loop
-> Report: "Bug confirmed in $FEATURE. QA loop stopped.
Run /build $FEATURE with this bug report to fix."
Maximum 3 fix attempts per test before treating as confirmed bug.
━━━ EXPANDER AGENT (runs only if all tests pass) ━━━
Review qa/plans/$FEATURE-prioritized.md and tests/e2e/$FEATURE.spec.ts.
Find gaps — scenarios not yet covered. Look specifically for:
- What happens when the user submits an empty form?
- What happens at maximum input length (e.g. 10,000 character input)?
- What happens if the user navigates away mid-flow and returns?
- What happens if the user hits browser back/forward?
- What happens on a very slow connection? (use Playwright network throttling)
- What happens if the user is not authenticated and tries this feature?
- What happens with special characters or emoji in text inputs?
Add 3-5 new tests to tests/e2e/$FEATURE.spec.ts.
Append the new scenarios to qa/plans/$FEATURE-prioritized.md.
Run the full suite again: npx playwright test tests/e2e/$FEATURE.spec.ts
━━━ SNAPSHOT AGENT ━━━
For every page involved in $FEATURE, capture screenshots at three viewports:
- Desktop: 1440 x 900
- Tablet: 768 x 1024
- Mobile: 375 x 812
Save to: qa/visual-baselines/$FEATURE/[screen]-desktop.png
qa/visual-baselines/$FEATURE/[screen]-tablet.png
qa/visual-baselines/$FEATURE/[screen]-mobile.png
FIRST RUN BEHAVIOUR:
These screenshots ARE the baseline. Save them.
Document in qa/visual-baselines/$FEATURE/README.md:
- Date baseline was created
- What build/commit this represents
- Any known intentional visual quirks
SUBSEQUENT RUN BEHAVIOUR:
Run: npx playwright test --project=visual
Compare each screenshot against baseline.
If pixel difference > 2%: flag as visual regression.
Save diff images to: qa/visual-reports/$FEATURE-[date]-diff.png
A visual regression is treated the same as a test failure.
TO INTENTIONALLY UPDATE BASELINE:
Run: npx playwright test --project=visual --update-snapshots
Commit new baseline files.
Document what changed and why in qa/visual-baselines/$FEATURE/README.md.
━━━ QUALITY GATE ━━━
Calculate score:
P0 tests: All must pass. Any P0 failure = score 0 = STOP HERE.
P0 passing: 40 points
P1 passing: [passing / total] x 30 points
P2 passing: [passing / total] x 15 points
Visual match: All snapshots match baseline = 15 points
Any visual regression = 0 points for this category
TOTAL POSSIBLE: 100 points
Score < 85: FAIL
-> Write full report to qa/QUALITY_LOG.md
-> Output to user: which tests failed, which snapshots regressed,
what the likely causes are
-> "Run /build $FEATURE with this report to address failures."
Score >= 85: PASS
-> Append to qa/QUALITY_LOG.md: date, feature, score, test count
-> "QA passed for $FEATURE. Score: [X]/100. Proceed to CI."
参数:$FEATURE(或传入"all"以测试整个套件)
在执行任何操作前,请先阅读CLAUDE.md。
请确保开发服务器已启动,再继续后续操作。
━━━ 浏览器Agent(Playwright MCP — 优先执行) ━━━
在编写任何测试文件之前,请使用Playwright MCP探索实际运行的应用程序。
对于Web应用来说,这是必不可少的步骤。你必须先查看真实的应用界面。
步骤A — 遍历所有相关页面:
针对与$FEATURE相关的每个页面:
1. 使用browser_navigate导航至该页面的URL
2. 执行browser_snapshot — 捕获可访问性树
(这将为你提供真实的选择器、角色和可访问名称。
绝不要猜测选择器,务必从运行中的应用获取。)
3. 执行browser_take_screenshot — 将截图保存至qa/browser-tests/$FEATURE/
4. 将实际看到的内容与docs/SCREENS.md和线框图进行对比
5. 立即记录所有不一致之处
步骤B — 测试所有交互操作:
在每个页面上:
1. 点击每个按钮 — 验证操作结果是否正确
2. 在每个输入框中输入内容 — 验证是否可正常输入
3. 在每个下拉菜单中执行browser_select_option操作
4. 按Tab键遍历页面 — 验证焦点顺序是否正确
5. 在获得焦点的按钮上按Enter键 — 验证是否可正常激活
6. 针对表单:分别使用有效数据、空数据和无效数据进行提交测试
步骤C — 实际执行主路径测试:
阅读docs/PRD.md中的P0验收标准。
通过在浏览器中实际操作来执行每个Given/When/Then步骤:
- 使用browser_navigate导航至起始页面
- 使用browser_type / browser_click / browser_select_option执行操作
- 使用browser_verify_text_visible / browser_verify_element_visible进行断言
- 在每个步骤执行browser_take_screenshot
步骤D — 响应式检查:
针对3个最重要的页面:
执行browser_resize width=1440 height=900 → browser_take_screenshot(桌面端)
执行browser_resize width=768 height=1024 → browser_take_screenshot(平板端)
执行browser_resize width=375 height=812 → browser_take_screenshot(移动端)
验证:无内容溢出、无内容被截断、移动端触摸目标尺寸≥44px
步骤E — 健康检查:
执行browser_console_messages — 标记所有JavaScript错误或警告
执行browser_network_requests — 标记所有失败的请求(4xx/5xx状态码)
步骤F — 生成初始测试文件:
使用browser_generate_playwright_test从你的会话中创建.spec.ts文件。
保存至:tests/e2e/$FEATURE-browser.spec.ts
这些文件将作为后续工程师Agent优化的基础。
输出文件:qa/browser-tests/$FEATURE/exploration.md
(记录探索结果摘要:正常工作的元素、损坏的元素、缺失的元素、
发现的选择器、可访问性树相关结论)
━━━ 分析师Agent ━━━
阅读$FEATURE的源代码。
阅读qa/plans/中与该功能相关的现有测试覆盖文档。
阅读qa/browser-tests/$FEATURE/exploration.md中的浏览器探索结果。
梳理所有可测试的维度:
- 所有用户交互操作(点击、输入、表单提交、键盘导航)
- 该功能发起的所有API调用及其可能的响应格式
- 所有UI状态:加载中、空状态、错误、成功、部分数据
- DOM中存在的所有data-testid属性或可访问角色
- 所有验证规则(客户端和服务端)
- 该功能触发的所有路由或导航行为
输出文件:qa/plans/$FEATURE.md
━━━ 规划师Agent ━━━
阅读qa/plans/$FEATURE.md。
为每个测试项分配优先级,并编写对应的Given/When/Then场景:
P0 — "如果该部分失效,产品将无法使用"
(认证流程、数据保存、核心功能路径)
P1 — "如果该部分失效,重要功能将降级"
(次要流程、重要边缘场景)
P2 — "边缘场景——建议覆盖"
(异常输入、罕见状态、锦上添花的验证规则)
同时为每个页面添加以下场景:
- 空状态场景(用户暂无数据)
- 错误状态场景(网络故障、服务器返回500)
- 移动端视口场景(至少覆盖P0项)
输出文件:qa/plans/$FEATURE-prioritized.md
━━━ 工程师Agent ━━━
重要提示:开发服务器必须处于运行状态。在编写任何测试之前,请使用Playwright MCP
导航至实际运行的应用程序。
针对qa/plans/$FEATURE-prioritized.md中的每个场景:
- 使用Playwright MCP导航至相关路由
- 确认你要定位的元素可见且可访问
- 记录确切的可访问角色、标签或testId
- 编写Playwright测试用例
将所有测试写入:tests/e2e/$FEATURE.spec.ts
Playwright规则 — 以下规则为硬性要求,无例外:
允许使用:getByRole('button', { name: 'Save' })
允许使用:getByLabel('Email address')
允许使用:getByText('No transactions yet')
允许使用:getByTestId('transaction-list')
禁止使用:page.$('.save-btn')
禁止使用:page.$('#submit')
禁止使用:page.$x('//button[@class="primary"]')
禁止使用:page.waitForTimeout(3000) <- 请改用expect().toBeVisible()
每个测试必须满足:
- 具有描述性名称,说明该测试的验证内容
- 断言具体、有意义的结果(而非仅验证"不崩溃")
- 全程正确使用async/await
- 清理所有测试创建的数据(使用beforeEach/afterEach钩子)
━━━ 哨兵Agent ━━━
逐行阅读tests/e2e/$FEATURE.spec.ts。
如果存在以下任何情况,将阻止QA循环继续执行,返回至工程师Agent:
- 任何包含"."、"#"或"//"的选择器
- 任何缺少await关键字的操作
- 任何未包含断言(expect()调用)的测试块
- 任何page.waitForTimeout()超过2000ms的代码
- 仅执行导航和点击操作但无任何断言的测试
以下情况将发出警告(标记但不阻止):
- 测试名称无法清晰描述场景
- 创建数据的测试缺少afterEach清理步骤
- 可能互相影响状态的测试
输出文件:qa/audits/$FEATURE-audit.md
如果发现阻塞问题:列出具体行号,返回至工程师Agent。
如果无阻塞问题:继续执行后续流程。
━━━ 执行阶段 ━━━
运行命令:npx playwright test tests/e2e/$FEATURE.spec.ts --reporter=json
将完整输出保存至:qa/runs/$FEATURE-latest.json
━━━ 修复Agent(仅当存在测试失败时运行) ━━━
针对每个失败的测试:
- 阅读完整错误信息和附带的截图
- 使用Playwright MCP导航至失败页面,检查当前状态
- 做出判断:
测试用例错误(测试本身存在问题):
-> 页面结构已更改、选择器不再存在,或
预期文本已更改(并非回归问题,仅为测试与实际脱节)
-> 修复方式:更新选择器或断言以匹配当前实际情况
-> 重新运行该特定测试
-> 若修复成功:继续执行
确认存在Bug(应用程序本身存在问题):
-> 功能表现不符合PRD验收标准的描述
-> 请勿修改测试来掩盖Bug
-> 创建文件:qa/bugs/$FEATURE-[timestamp].md,包含以下内容:
- 哪个测试失败
- 预期行为(来自PRD)
- 实际行为
- 截图路径
- 复现步骤
-> 停止QA循环
-> 输出:"$FEATURE中已确认存在Bug。QA循环已停止。
请携带此Bug报告运行/build $FEATURE以修复问题。"
每个测试最多尝试修复3次,若仍失败则判定为已确认Bug。
━━━ 扩展Agent(仅当所有测试通过时运行) ━━━
回顾qa/plans/$FEATURE-prioritized.md和tests/e2e/$FEATURE.spec.ts。
找出测试覆盖缺口 — 尚未覆盖的场景。重点关注:
- 用户提交空表单时会发生什么?
- 输入达到最大长度(例如10000字符)时会发生什么?
- 用户在流程中途导航离开并返回时会发生什么?
- 用户点击浏览器前进/后退按钮时会发生什么?
- 在极慢网络环境下会发生什么?(使用Playwright网络节流功能)
- 未认证用户尝试使用该功能时会发生什么?
- 文本输入中包含特殊字符或表情符号时会发生什么?
在tests/e2e/$FEATURE.spec.ts中添加3-5个新测试用例。
将新场景追加至qa/plans/$FEATURE-prioritized.md。
重新运行完整测试套件:npx playwright test tests/e2e/$FEATURE.spec.ts
━━━ 快照Agent ━━━
针对$FEATURE涉及的每个页面,在三个视口尺寸下捕获截图:
- 桌面端:1440 x 900
- 平板端:768 x 1024
- 移动端:375 x 812
保存至:qa/visual-baselines/$FEATURE/[screen]-desktop.png
qa/visual-baselines/$FEATURE/[screen]-tablet.png
qa/visual-baselines/$FEATURE/[screen]-mobile.png
首次运行行为:
这些截图将作为基准保存。
在qa/visual-baselines/$FEATURE/README.md中记录:
- 基准创建日期
- 对应的构建/提交版本
- 任何已知的有意设计的视觉差异
后续运行行为:
运行命令:npx playwright test --project=visual
将每个截图与基准进行对比。
若像素差异>2%:标记为视觉回归。
将差异图片保存至:qa/visual-reports/$FEATURE-[date]-diff.png
视觉回归将被视为测试失败。
若需主动更新基准:
运行命令:npx playwright test --project=visual --update-snapshots
提交新的基准文件。
在qa/visual-baselines/$FEATURE/README.md中记录变更内容及原因。
━━━ 质量门 ━━━
计算得分:
P0测试:必须全部通过。任何P0测试失败 → 得分0 → 立即停止。
P0全部通过:40分
P1测试得分:[通过数 / 总数] × 30分
P2测试得分:[通过数 / 总数] × 15分
视觉匹配:所有快照与基准一致 → 15分
存在任何视觉回归 → 该类别得0分
总分满分:100分
得分<85:不通过
-> 将完整报告写入qa/QUALITY_LOG.md
-> 向用户输出:哪些测试失败、哪些快照出现回归、
可能的原因是什么
-> "请携带此报告运行/build $FEATURE以解决问题。"
得分≥85:通过
-> 向qa/QUALITY_LOG.md追加记录:日期、功能、得分、测试数量
-> "$FEATURE的QA已通过。得分:[X]/100。可继续执行CI流程。"