ui-test
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseUI Test — Agentic UI Testing Skill
UI Test — 智能体UI测试技能
Test UI changes in a real browser. Your job is to try to break things, not confirm they work.
Three workflows:
- Diff-driven — analyze a git diff, test only what changed
- Exploratory — navigate the app, find bugs the developer didn't think about
- Parallel — fan out independent test groups across multiple Browserbase browsers
在真实浏览器中测试UI变更。你的任务是尝试破坏功能,而非验证功能正常运行。
三种工作流:
- 差异驱动 — 分析git diff,仅测试变更内容
- 探索式 — 遍历应用,查找开发者未考虑到的Bug
- 并行 — 将独立测试组分发到多个Browserbase浏览器中运行
How Testing Works
测试工作原理
The main agent coordinates — it plans test strategy, delegates to sub-agents, and merges results. Sub-agents do the actual browser testing.
主Agent负责协调 — 制定测试策略、委派任务给子Agent、合并测试结果。子Agent负责执行实际的浏览器测试。
Planning: multiple angles, then execute once
规划阶段:多维度规划,一次性执行
You MUST complete all three planning rounds yourself and output them before launching any sub-agents. Planning happens in your own response — it is NOT delegated to sub-agents. Do not skip ahead to execution.
Round 1 — Functional: What are the core user flows? What should work? Write out each test as: action → expected result.
Round 2 — Adversarial: Re-read Round 1. What did you miss? Think about: different user types/roles, error paths, empty states, race conditions, edge inputs (empty, huge, special chars, rapid clicks).
Round 3 — Coverage gaps: Re-read Rounds 1–2. What about: accessibility (axe-core, keyboard-only), mobile viewports, console errors, visual consistency with the rest of the app?
Deduplicate: Merge all three rounds into one numbered list of tests. Remove overlaps. Assign each test to a group (e.g. Group A, Group B).
Then execute once — launch one sub-agent per group. Each sub-agent receives its specific list of tests to run, nothing more. Sub-agents do not explore or plan — they execute assigned tests and report results.
Output the three rounds, the merged plan, and the group assignments in your response before calling any Agent tool.
在启动任何子Agent之前,你必须自行完成全部三轮规划并输出结果。 规划由你自己在响应中完成,不得委派给子Agent,不得跳过规划直接进入执行阶段。
第一轮 — 功能测试: 核心用户流程有哪些?哪些功能应该正常运行?将每个测试用例写为:操作 → 预期结果。
第二轮 — 对抗测试: 重读第一轮规划,你遗漏了哪些场景?考虑:不同用户类型/角色、错误路径、空状态、竞态条件、边缘输入(空值、超长内容、特殊字符、快速点击)。
第三轮 — 覆盖缺口: 重读前两轮规划,还有哪些方面没覆盖?可访问性(axe-core、纯键盘操作)、移动端视口、控制台错误、与应用其他部分的视觉一致性?
去重: 将三轮规划的内容合并为一个编号的测试用例列表,移除重复内容,将每个测试用例分配到一个分组(例如A组、B组)。
然后一次性执行 — 每个分组启动一个子Agent,每个子Agent仅收到分配给自己的测试用例列表,无其他内容。子Agent不进行探索或规划,仅执行分配的测试并上报结果。
在调用任何Agent工具之前,你需要在响应中输出三轮规划内容、合并后的测试计划以及分组分配结果。
Principles for splitting work
任务拆分原则
- Sub-agents run assigned tests, not open exploration. The main agent hands each sub-agent a specific numbered list of tests. Sub-agents do not plan, explore, or decide what to test — they execute the list and stop.
- The bottleneck is the slowest agent — split work so no single agent has a disproportionate share. Many small agents > few large ones.
- Size the effort to the change — a single component fix doesn't need many agents or many steps. A full-page redesign does. Let the scope of the diff drive the plan.
- No early stopping on failures — find as many bugs as possible within the assigned tests.
- 子Agent仅执行分配的测试,不进行开放式探索:主Agent向每个子Agent提供明确的编号测试列表,子Agent不负责规划、探索或决定测试内容,仅执行列表内容后停止。
- 瓶颈取决于最慢的Agent:拆分任务时避免单个Agent分配到过多任务,多个小型Agent优于少数大型Agent。
- 工作量匹配变更规模:单个组件修复不需要太多Agent或测试步骤,整页重设计则需要,根据diff的范围制定测试计划。
- 遇到失败不提前终止:在分配的测试范围内尽可能多地查找Bug。
Giving sub-agents a step budget
为子Agent设置步骤预算
The main agent MUST include an explicit browse step limit in every sub-agent prompt. Sub-agents do not self-limit — they will run until done unless told otherwise.
As a rough heuristic: ~25 steps for a few targeted checks, ~40 for a full page with functional + adversarial + a11y, ~75 for multiple pages or a broad category. Adjust based on what the assigned tests actually require — these are starting points, not rules.
As a rough heuristic: ~25 steps for a few targeted checks, ~40 for a full page with functional + adversarial + a11y, ~75 for multiple pages or a broad category. Adjust based on what the assigned tests actually require — these are starting points, not rules.
Every sub-agent prompt must include:
You have a budget of N browse steps (each `browse` command = 1 step). Count your steps as you go. When you reach N, stop immediately and report:
- STEP_PASS/STEP_FAIL for every test you completed
- STEP_SKIP|<test-id>|budget reached for every test you didn't get to
Do not retry or continue after hitting the budget.
Run only these tests: [numbered list from the merged plan]
Do not explore beyond the assigned tests.
Do NOT generate an HTML report or write any files. Return only step markers and your findings as text.The main agent should NOT run commands itself (except to verify the dev server is up). All testing happens in sub-agents.
browseWhen a sub-agent hits its budget, the main agent accepts the partial results as-is. Do not re-run or retry the sub-agent. Include SKIPPED tests in the final report so the developer knows what wasn't covered.
主Agent必须在每个子Agent的提示词中明确指定browse步骤上限。 子Agent不会自行限制步骤,除非被明确告知,否则会一直运行到完成。
粗略参考规则:少量定向检查约25步,包含功能+对抗+可访问性的单页测试约40步,多页面或大范围测试约75步。根据实际分配的测试需求调整数值,以上仅为起点参考,非硬性规则。
粗略参考规则:少量定向检查约25步,包含功能+对抗+可访问性的单页测试约40步,多页面或大范围测试约75步。根据实际分配的测试需求调整数值,以上仅为起点参考,非硬性规则。
每个子Agent的提示词必须包含:
You have a budget of N browse steps (each `browse` command = 1 step). Count your steps as you go. When you reach N, stop immediately and report:
- STEP_PASS/STEP_FAIL for every test you completed
- STEP_SKIP|<test-id>|budget reached for every test you didn't get to
Do not retry or continue after hitting the budget.
Run only these tests: [numbered list from the merged plan]
Do not explore beyond the assigned tests.
Do NOT generate an HTML report or write any files. Return only step markers and your findings as text.主Agent不应自行运行命令(验证开发服务是否启动除外),所有测试都在子Agent中执行。
browse当子Agent耗尽步骤预算时,主Agent直接接受部分结果,无需重新运行或重试子Agent。 最终报告中需包含跳过的测试,方便开发者了解未覆盖的内容。
Reporting
报告规则
Every sub-agent reports back with:
Tests: 8 | Passed: 5 | Failed: 2 | Skipped: 1 | Pages visited: 2The main agent merges into a final report with:
Tests: 20 | Passed: 14 | Failed: 4 | Skipped: 2 | Agents: 3 | Pass rate: 70%Do not report "steps used" — browse command counts are implementation plumbing, not a meaningful metric for reviewers.
每个子Agent返回的结果格式:
Tests: 8 | Passed: 5 | Failed: 2 | Skipped: 1 | Pages visited: 2主Agent合并后的最终报告格式:
Tests: 20 | Passed: 14 | Failed: 4 | Skipped: 2 | Agents: 3 | Pass rate: 70%不要报告「使用的步骤数」—— browse命令计数是实现细节,对评审人员无参考意义。
Testing Philosophy
测试理念
You are an adversarial tester. Your goal is to find bugs, not prove correctness.
- Try to break every feature you test. Don't just check "does the button exist?" — click it twice rapidly, submit empty forms, paste 500 characters, press Escape mid-flow.
- Test what the developer didn't think about. Empty states, error recovery, keyboard-only navigation, mobile overflow.
- Every assertion must be evidence-based. Compare before/after snapshots. Check specific elements by ref. Never report PASS without concrete evidence from the accessibility tree or a deterministic check.
- Report failures with enough detail to reproduce. Include the exact action, what you expected, what you got, and a suggested fix.
你是一名对抗性测试人员,目标是查找Bug,而非证明功能正确。
- 尝试破坏你测试的每个功能:不要只检查「按钮是否存在」,要快速点击两次、提交空表单、粘贴500个字符、在流程中按Escape键。
- 测试开发者未考虑到的场景:空状态、错误恢复、纯键盘导航、移动端内容溢出。
- 每个断言都要有证据支撑:对比前后快照、通过ref检查特定元素,没有可访问性树或确定性检查的具体证据,不得报告通过。
- 报告失败时提供足够的复现细节:包含具体操作、预期结果、实际结果以及修复建议。
Assertion Protocol
断言协议
Every test step MUST produce a structured assertion. Do not write freeform "this looks good."
每个测试步骤必须输出结构化断言,不得写自由格式的「看起来没问题」。
Step markers
步骤标记
For each test step, emit exactly one marker:
STEP_PASS|<step-id>|<evidence>or
STEP_FAIL|<step-id>|<expected> → <actual>|<screenshot-path>- : short identifier like
step-id,homepage-cta,form-validation-errormodal-cancel - : what you observed that proves the step passed (element ref, text content, URL, eval result)
evidence - : what you expected vs what you got
expected → actual - : path to the saved screenshot (failures only — see Screenshot Capture below)
screenshot-path
每个测试步骤必须输出恰好一个标记:
STEP_PASS|<step-id>|<evidence>或
STEP_FAIL|<step-id>|<expected> → <actual>|<screenshot-path>- :短标识符,例如
step-id、homepage-cta、form-validation-errormodal-cancel - :证明步骤通过的观察结果(元素ref、文本内容、URL、eval执行结果)
evidence - :预期结果与实际结果的对比
expected → actual - :保存的截图路径(仅失败步骤需要,见下文截图捕获规则)
screenshot-path
Screenshot Capture for Failures
失败场景的截图捕获
Every STEP_FAIL MUST have an accompanying screenshot so the developer can see what went wrong visually.
When a test step fails:
bash
undefined每个STEP_FAIL必须附带截图,方便开发者直观看到问题。
当测试步骤失败时:
bash
undefined1. Take a screenshot immediately after observing the failure
1. 观察到失败后立即截图
browse screenshot --path .context/ui-test-screenshots/<step-id>.png
browse screenshot --path .context/ui-test-screenshots/<step-id>.png
If --path is not supported, take the screenshot and save manually:
如果--path参数不支持,先截图再手动保存:
browse screenshot
browse screenshot
The browse CLI will output the screenshot path — move/copy it:
browse CLI会输出截图路径,移动/复制文件到目标位置:
cp /tmp/browse-screenshot-*.png .context/ui-test-screenshots/<step-id>.png
Setup the screenshot directory at the start of any test run:
```bash
mkdir -p .context/ui-test-screenshotsRules:
- File name = step-id (e.g., ,
double-submit.png,axe-audit.png)modal-focus-trap.png - Store in — this directory is gitignored and accessible to the developer and other agents
.context/ui-test-screenshots/ - For parallel runs, include the session name: (e.g.,
<session>-<step-id>.png)signup-double-submit.png - Take the screenshot at the moment of failure — capture the broken state, not after recovery
- For visual/layout bugs, also screenshot the baseline (working state) for comparison:
<step-id>-baseline.png
cp /tmp/browse-screenshot-*.png .context/ui-test-screenshots/<step-id>.png
测试运行开始时先创建截图目录:
```bash
mkdir -p .context/ui-test-screenshots规则:
- 文件名 = step-id(例如、
double-submit.png、axe-audit.png)modal-focus-trap.png - 存储在目录下,该目录已加入gitignore,开发者和其他Agent均可访问
.context/ui-test-screenshots/ - 并行运行时,在文件名中加入会话名:(例如
<session>-<step-id>.png)signup-double-submit.png - 失败瞬间立即截图,捕获故障状态,而非恢复后的状态
- 针对视觉/布局Bug,同时截图基线(正常状态)用于对比:
<step-id>-baseline.png
How to verify (in order of rigor)
验证方式(按严谨度排序)
- Deterministic check (strongest) — returns structured data you can inspect. Examples: axe-core violation count,
browse eval, form field value, console error array, element count.document.title - Snapshot element match — a specific element with a specific role and text exists in the accessibility tree. Check by ref: . An element either exists in the tree or it doesn't.
@0-12 button "Save" - Before/after comparison — snapshot before action, act, snapshot after. Verify the tree changed in the expected way (element appeared, disappeared, text changed).
- Screenshot + visual judgment (weakest) — only for visual-only properties (color, spacing, layout) that the accessibility tree cannot capture. Always accompany with what specifically you're evaluating.
- 确定性检查(最严谨) — 返回可检查的结构化数据,例如:axe-core违规数量、
browse eval、表单字段值、控制台错误数组、元素数量。document.title - 快照元素匹配 — 可访问性树中存在指定role和文本的特定元素,通过ref检查:,元素要么存在要么不存在,判断无歧义。
@0-12 button "Save" - 前后对比 — 操作前快照、执行操作、操作后快照,验证树结构按预期变化(元素出现、消失、文本变更)。
- 截图+视觉判断(最不严谨) — 仅用于可访问性树无法捕获的纯视觉属性(颜色、间距、布局),必须同时说明具体评估的内容。
Before/after comparison pattern
前后对比模式
This is the core verification loop. Use it for every interaction:
bash
undefined这是核心验证循环,每次交互都要使用:
bash
undefined1. BEFORE: capture state
1. 操作前:捕获状态
browse snapshot
browse snapshot
Record: what elements exist, their text, their refs
记录:存在哪些元素、元素文本、元素ref
2. ACT: perform the interaction
2. 执行操作
browse click @0-12
browse click @0-12
3. AFTER: capture new state
3. 操作后:捕获新状态
browse snapshot
browse snapshot
Compare: what changed? What appeared? What disappeared?
对比:发生了什么变化?什么出现了?什么消失了?
4. ASSERT: emit marker based on comparison
4. 断言:根据对比结果输出标记
If dialog appeared: STEP_PASS|modal-open|dialog "Confirm" appeared at @0-20
如果弹窗出现:STEP_PASS|modal-open|dialog "Confirm" appeared at @0-20
If nothing changed:
如果无变化:
browse screenshot --path .context/ui-test-screenshots/modal-open.png
browse screenshot --path .context/ui-test-screenshots/modal-open.png
STEP_FAIL|modal-open|expected dialog to appear → snapshot unchanged|.context/ui-test-screenshots/modal-open.png
STEP_FAIL|modal-open|expected dialog to appear → snapshot unchanged|.context/ui-test-screenshots/modal-open.png
undefinedundefinedSetup
环境搭建
bash
which browse || npm install -g @browserbasehq/browse-clibash
which browse || npm install -g @browserbasehq/browse-cliAvoid permission fatigue
避免权限反复确认
This skill runs many commands (snapshots, clicks, evals). To avoid approving each one, add to your allowed commands:
browsebrowseAdd both patterns to (project-level) or (user-level):
.claude/settings.json~/.claude/settings.jsonjson
{
"permissions": {
"allow": [
"Bash(browse:*)",
"Bash(BROWSE_SESSION=*)"
]
}
}The first pattern covers plain commands. The second covers parallel sessions (). Both are needed to avoid approval prompts.
browseBROWSE_SESSION=signup browse open ...该技能会运行大量命令(快照、点击、eval),为避免每次都需要确认,将加入允许命令列表:
browsebrowse将以下两个规则添加到(项目级别)或(用户级别):
.claude/settings.json~/.claude/settings.jsonjson
{
"permissions": {
"allow": [
"Bash(browse:*)",
"Bash(BROWSE_SESSION=*)"
]
}
}第一个规则覆盖普通命令,第二个规则覆盖并行会话(),两者都配置才能避免权限提示。
browseBROWSE_SESSION=signup browse open ...Mode Selection
模式选择
| Target | Mode | Command | Auth |
|---|---|---|---|
| Local | | None needed (clean isolated local browser by default) |
| Deployed/staging site | Remote | | cookie-sync → |
Rule: If the target URL contains or , always use .
localhost127.0.0.1browse env local| 目标 | 模式 | 命令 | 鉴权 |
|---|---|---|---|
| 本地 | | 无需鉴权(默认使用干净隔离的本地浏览器) |
| 已部署/预发站点 | 远程 | | cookie同步 → |
规则:如果目标URL包含或,必须使用。
localhost127.0.0.1browse env localLocal Mode (default for localhost)
本地模式(localhost默认模式)
bash
browse env local
browse open http://localhost:3000browse env localUse local-mode variants only when needed:
- — auto-discover existing local Chrome, fallback to isolated. Use this only when the test explicitly needs existing local login/cookies/state.
browse env local --auto-connect - — attach to a specific CDP target (explicit local browser attach).
browse env local <port|url>
bash
browse env local
browse open http://localhost:3000browse env local仅在必要时使用本地模式变体:
- — 自动发现现有本地Chrome,无可用实例时回退到隔离浏览器,仅当测试明确需要现有本地登录态/cookie/状态时使用。
browse env local --auto-connect - — 连接到指定CDP目标(显式连接本地浏览器)。
browse env local <port|url>
Remote Mode (deployed sites via cookie-sync)
远程模式(通过cookie同步测试已部署站点)
bash
undefinedbash
undefinedStep 1: Sync cookies from local Chrome to Browserbase
步骤1:将本地Chrome的cookie同步到Browserbase
node .claude/skills/cookie-sync/scripts/cookie-sync.mjs --domains your-app.com
node .claude/skills/cookie-sync/scripts/cookie-sync.mjs --domains your-app.com
Output: Context ID: ctx_abc123
输出:Context ID: ctx_abc123
Step 2: Switch to remote mode
步骤2:切换到远程模式
browse env remote
browse open https://staging.your-app.com --context-id ctx_abc123 --persist
browse snapshot
browse env remote
browse open https://staging.your-app.com --context-id ctx_abc123 --persist
browse snapshot
... run tests ...
... 运行测试 ...
browse stop
Cookie-sync flags: `--domains`, `--context`, `--stealth`, `--proxy "City,ST,US"`browse stop
Cookie同步参数:`--domains`、`--context`、`--stealth`、`--proxy "City,ST,US"`Workflow A: Diff-Driven Testing
工作流A:差异驱动测试
Phase 1: Analyze the diff
阶段1:分析diff
bash
git diff --name-only HEAD~1 # or: git diff --name-only / git diff --name-only main...HEAD
git diff HEAD~1 -- <file> # read actual changesCategorize changed files:
| File pattern | UI impact | What to test |
|---|---|---|
| Component | Render, interaction, state, edge cases |
| Route/page | Navigation, page load, content, 404 handling |
| Style | Visual appearance (screenshot), responsive |
| Form | Validation, submission, empty input, long input, special chars |
| Interactive | Open/close, escape, focus trap, cancel vs confirm |
| Navigation | Links, active states, routing, keyboard nav |
| Non-UI files only | None | Skip — report "no UI tests needed" |
bash
git diff --name-only HEAD~1 # 或:git diff --name-only / git diff --name-only main...HEAD
git diff HEAD~1 -- <file> # 查看具体变更内容对变更文件分类:
| 文件模式 | UI影响 | 测试内容 |
|---|---|---|
| 组件 | 渲染、交互、状态、边缘场景 |
| 路由/页面 | 导航、页面加载、内容、404处理 |
| 样式 | 视觉表现(截图)、响应式 |
| 表单 | 验证、提交、空输入、长输入、特殊字符 |
| 交互组件 | 打开/关闭、Escape键、焦点陷阱、取消 vs 确认 |
| 导航 | 链接、激活状态、路由、键盘导航 |
| 仅非UI文件 | 无 | 跳过,报告「无需UI测试」 |
Phase 2: Map files to URLs
阶段2:文件映射到URL
Detect framework:
cat package.json | grep -E '"(next|react|vue|nuxt|svelte|@sveltejs|angular|vite)"'| Framework | Default port | File → URL pattern |
|---|---|---|
| Next.js App Router | 3000 | |
| Next.js Pages Router | 3000 | |
| Vite | 5173 | Check router config |
| Nuxt | 3000 | |
| SvelteKit | 5173 | |
| Angular | 4200 | Check routing module |
检测框架:
cat package.json | grep -E '"(next|react|vue|nuxt|svelte|@sveltejs|angular|vite)"'| 框架 | 默认端口 | 文件→URL规则 |
|---|---|---|
| Next.js App Router | 3000 | |
| Next.js Pages Router | 3000 | |
| Vite | 5173 | 查看路由配置 |
| Nuxt | 3000 | |
| SvelteKit | 5173 | |
| Angular | 4200 | 查看路由模块 |
Phase 3: Ensure the right code is running
阶段3:确保运行的是对应代码
Before testing, verify the dev server is serving the code from the diff — not a stale branch.
If testing a PR or specific branch:
bash
undefined测试前,验证开发服务运行的是diff对应的代码,而非过时分支的代码。
如果测试PR或指定分支:
bash
undefinedCheck what branch is currently checked out
查看当前检出的分支
git branch --show-current
git branch --show-current
If it's not the PR branch, switch to it
如果不是目标PR分支,切换分支
git fetch origin <branch> && git checkout <branch>
git fetch origin <branch> && git checkout <branch>
Install deps — the lockfile may differ between branches
安装依赖 — 不同分支的锁文件可能不同
yarn install # or npm install / pnpm install
If the dev server was already running on a different branch, restart it after checkout.
**Find a running dev server:**
```bash
for port in 3000 3001 5173 4200 8080 8000 5000; do
s=$(curl -s -o /dev/null -w "%{http_code}" "http://localhost:$port" 2>/dev/null)
if [ "$s" != "000" ]; then echo "Dev server on port $port (HTTP $s)"; fi
doneIf nothing found: tell the user to start their dev server.
Verify it actually renders:
After + , check that the accessibility tree contains real page content (navigation, headings, interactive elements) — not just an error overlay or empty body. Next.js dev servers can return HTTP 200 while showing a full-screen build error dialog. If the snapshot is empty or dominated by an error dialog, the server is broken — fix the build before testing.
browse openbrowse snapshotyarn install # 或 npm install / pnpm install
如果开发服务之前运行在其他分支上,切换分支后重启服务。
**查找运行中的开发服务:**
```bash
for port in 3000 3001 5173 4200 8080 8000 5000; do
s=$(curl -s -o /dev/null -w "%{http_code}" "http://localhost:$port" 2>/dev/null)
if [ "$s" != "000" ]; then echo "Dev server on port $port (HTTP $s)"; fi
done如果未找到:告知用户启动开发服务。
验证页面可正常渲染:
执行 + 后,检查可访问性树包含真实页面内容(导航、标题、交互元素),而非仅错误覆盖层或空body。Next.js开发服务可能返回HTTP 200但显示全屏构建错误弹窗,如果快照为空或被错误弹窗主导,说明服务已损坏,测试前先修复构建问题。
browse openbrowse snapshotPhase 4: Generate test plan
阶段4:生成测试计划
For each changed area, plan both happy path AND adversarial tests:
Test Plan (based on git diff)
=============================
Changed: src/components/SignupForm.tsx (added email validation)
1. [happy] Valid email submits successfully
URL: http://localhost:3000/signup
Steps: fill valid email → submit → verify success message appears
2. [adversarial] Invalid email shows error
Steps: fill "not-an-email" → submit → verify error message appears
3. [adversarial] Empty form submission
Steps: click submit without filling anything → verify error, no crash
4. [adversarial] XSS in email field
Steps: fill "<script>alert(1)</script>" → submit → verify sanitized/rejected
5. [adversarial] Rapid double-submit
Steps: click submit twice quickly → verify no duplicate submission
6. [adversarial] Keyboard-only flow
Steps: Tab to email → type → Tab to submit → Enter → verify success针对每个变更区域,同时规划** happy path和对抗测试 **:
Test Plan (based on git diff)
=============================
Changed: src/components/SignupForm.tsx (added email validation)
1. [happy] Valid email submits successfully
URL: http://localhost:3000/signup
Steps: fill valid email → submit → verify success message appears
2. [adversarial] Invalid email shows error
Steps: fill "not-an-email" → submit → verify error message appears
3. [adversarial] Empty form submission
Steps: click submit without filling anything → verify error, no crash
4. [adversarial] XSS in email field
Steps: fill "<script>alert(1)</script>" → submit → verify sanitized/rejected
5. [adversarial] Rapid double-submit
Steps: click submit twice quickly → verify no duplicate submission
6. [adversarial] Keyboard-only flow
Steps: Tab to email → type → Tab to submit → Enter → verify successPhase 5: Execute tests
阶段5:执行测试
bash
browse stop 2>/dev/null
mkdir -p .context/ui-test-screenshotsbash
browse stop 2>/dev/null
mkdir -p .context/ui-test-screenshotslocalhost/default QA → clean, reproducible local run
localhost/默认QA → 干净可复现的本地运行
browse env local
For each test, follow the **before/after pattern**:
```bashbrowse env local
每个测试遵循**前后对比模式**:
```bashNavigate
导航
browse open http://localhost:3000/path
browse wait load
browse open http://localhost:3000/path
browse wait load
BEFORE snapshot
操作前快照
browse snapshot
browse snapshot
Note the current state: elements, refs, text
记录当前状态:元素、ref、文本
ACT
执行操作
browse click @0-ref
browse click @0-ref
or: browse fill "selector" "value"
或:browse fill "selector" "value"
or: browse type "text"
或:browse type "text"
or: browse press Enter
或:browse press Enter
AFTER snapshot
操作后快照
browse snapshot
browse snapshot
Compare against BEFORE: what changed?
与操作前对比:发生了什么变化?
ASSERT with marker
带标记的断言
STEP_PASS|step-id|evidence OR STEP_FAIL|step-id|expected → actual
STEP_PASS|step-id|evidence 或 STEP_FAIL|step-id|expected → actual
undefinedundefinedPhase 6: Report results
阶段6:报告结果
undefinedundefinedUI Test Results
UI Test Results
STEP_PASS|valid-email-submit|status "Thanks!" appeared at @0-42 after submit
STEP_PASS|valid-email-submit|status "Thanks!" appeared at @0-42 after submit
- URL: http://localhost:3000/signup
- Before: form with email input @0-3, submit button @0-7
- Action: filled "user@test.com", clicked @0-7
- After: form replaced by status element with "Thanks! We'll be in touch."
- URL: http://localhost:3000/signup
- Before: form with email input @0-3, submit button @0-7
- Action: filled "user@test.com", clicked @0-7
- After: form replaced by status element with "Thanks! We'll be in touch."
STEP_FAIL|double-submit|expected single submission → form submitted twice|.context/ui-test-screenshots/double-submit.png
STEP_FAIL|double-submit|expected single submission → form submitted twice|.context/ui-test-screenshots/double-submit.png
- URL: http://localhost:3000/signup
- Before: form with submit button @0-7
- Action: clicked @0-7 twice rapidly
- After: two success toasts appeared, suggesting duplicate submission
- Screenshot: .context/ui-test-screenshots/double-submit.png
- Suggestion: disable submit button after first click, or debounce the handler
Summary: 4/6 passed, 2 failed
Failed: double-submit, xss-sanitization
Screenshots saved to — open any failed step's screenshot to see the broken state.
.context/ui-test-screenshots/
Always `browse stop` when done.- URL: http://localhost:3000/signup
- Before: form with submit button @0-7
- Action: clicked @0-7 twice rapidly
- After: two success toasts appeared, suggesting duplicate submission
- Screenshot: .context/ui-test-screenshots/double-submit.png
- Suggestion: disable submit button after first click, or debounce the handler
Summary: 4/6 passed, 2 failed
Failed: double-submit, xss-sanitization
Screenshots saved to — open any failed step's screenshot to see the broken state.
.context/ui-test-screenshots/
测试完成后始终执行`browse stop`。Phase 7: Generate HTML report
阶段7:生成HTML报告
After producing the text report, generate a standalone HTML report that a reviewer can open in a browser. The report embeds screenshots inline (base64) so it works as a single file — no external dependencies.
Why: Text reports are good for the agent conversation, but reviewers (PMs, designers, other engineers) want a visual artifact they can open, scan, and share. Screenshots inline make failures immediately obvious.
生成文本报告后,生成可在浏览器中打开的独立HTML报告。报告将截图内嵌为base64格式,单个文件即可运行,无外部依赖。
原因: 文本报告适合Agent会话,但评审人员(产品经理、设计师、其他工程师)需要可打开、浏览、分享的可视化产物,内嵌截图可以让失败问题一目了然。
How to generate
生成方式
- Read the HTML template at references/report-template.html
- Build the report by replacing the template placeholders with actual test data:
| Placeholder | Value |
|---|---|
| Report title for |
| Report title for the visible |
| One-line context: date, app URL, user, branch |
| Total STEP_PASS + STEP_FAIL count |
| Number of sub-agents that ran |
| Number of STEP_PASS |
| Number of STEP_FAIL |
| Integer percentage (e.g., "92") |
| |
| HTML for failed test cards (see below) |
| HTML for passed test cards (see below) |
- For each test result, generate a card. Failed tests should be open by default so reviewers see them immediately:
<details>
html
<!-- Failed test card (open by default) -->
<div class="section">
<h2>Failures <span class="count">{{FAIL_COUNT}}</span></h2>
<details class="test-card fail" open>
<summary>
<span class="badge fail">FAIL</span>
<span class="step-id">step-id-here</span>
<span class="evidence">expected → actual</span>
</summary>
<div class="body">
<dl>
<dt>URL</dt><dd>http://localhost:3000/path</dd>
<dt>Action</dt><dd>What was done</dd>
<dt>Expected</dt><dd>What should have happened</dd>
<dt>Actual</dt><dd>What happened instead</dd>
</dl>
<div class="suggestion">Fix: description of suggested fix</div>
<div class="screenshot">
<img src="data:image/png;base64,..." alt="Screenshot of failure">
<div class="caption">step-id.png — captured at moment of failure</div>
</div>
</div>
</details>
</div>
<!-- Passed test card (collapsed by default) -->
<div class="section">
<h2>Passed <span class="count">{{PASS_COUNT}}</span></h2>
<details class="test-card pass">
<summary>
<span class="badge pass">PASS</span>
<span class="step-id">step-id-here</span>
<span class="evidence">evidence summary</span>
</summary>
<div class="body">
<dl>
<dt>URL</dt><dd>http://localhost:3000/path</dd>
<dt>Evidence</dt><dd>What was observed</dd>
</dl>
</div>
</details>
</div>- Embed screenshots as base64 so the HTML is fully self-contained:
bash
undefined- 读取references/report-template.html的HTML模板
- 用实际测试数据替换模板占位符生成报告:
| 占位符 | 取值 |
|---|---|
| |
| 页面可见 |
| 单行上下文:日期、应用URL、用户、分支 |
| STEP_PASS + STEP_FAIL总数 |
| 运行的子Agent数量 |
| STEP_PASS数量 |
| STEP_FAIL数量 |
| 整数百分比(例如"92") |
| |
| 失败测试卡片的HTML(见下文) |
| 通过测试卡片的HTML(见下文) |
- 每个测试结果生成一个卡片,失败测试默认展开,方便评审人员立即看到:
<details>
html
<!-- 失败测试卡片(默认展开) -->
<div class="section">
<h2>Failures <span class="count">{{FAIL_COUNT}}</span></h2>
<details class="test-card fail" open>
<summary>
<span class="badge fail">FAIL</span>
<span class="step-id">step-id-here</span>
<span class="evidence">expected → actual</span>
</summary>
<div class="body">
<dl>
<dt>URL</dt><dd>http://localhost:3000/path</dd>
<dt>Action</dt><dd>What was done</dd>
<dt>Expected</dt><dd>What should have happened</dd>
<dt>Actual</dt><dd>What happened instead</dd>
</dl>
<div class="suggestion">Fix: description of suggested fix</div>
<div class="screenshot">
<img src="data:image/png;base64,..." alt="Screenshot of failure">
<div class="caption">step-id.png — captured at moment of failure</div>
</div>
</div>
</details>
</div>
<!-- 通过测试卡片(默认折叠) -->
<div class="section">
<h2>Passed <span class="count">{{PASS_COUNT}}</span></h2>
<details class="test-card pass">
<summary>
<span class="badge pass">PASS</span>
<span class="step-id">step-id-here</span>
<span class="evidence">evidence summary</span>
</summary>
<div class="body">
<dl>
<dt>URL</dt><dd>http://localhost:3000/path</dd>
<dt>Evidence</dt><dd>What was observed</dd>
</dl>
</div>
</details>
</div>- 将截图内嵌为base64,确保HTML完全独立:
bash
undefinedConvert screenshot to base64 data URI
将截图转换为base64数据URI
base64 -i .context/ui-test-screenshots/step-id.png | tr -d '\n'
base64 -i .context/ui-test-screenshots/step-id.png | tr -d '\n'
Use as: src="data:image/png;base64,<output>"
用法:src="data:image/png;base64,<output>"
Read each screenshot file referenced in STEP_FAIL markers, base64-encode it, and embed it as an `<img src="data:image/png;base64,...">` in the corresponding test card. For STEP_PASS, only embed a screenshot if one was explicitly taken (e.g., baseline screenshots).
5. Write the final HTML to `.context/ui-test-report.html`:
```bash
读取STEP_FAIL标记中引用的每个截图文件,base64编码后内嵌到对应测试卡片的`<img src="data:image/png;base64,...">`中。STEP_PASS仅在明确截图时(例如基线截图)才内嵌。
5. 将最终HTML写入`.context/ui-test-report.html`:
```bashWrite the generated HTML
写入生成的HTML
cat > .context/ui-test-report.html << 'REPORT_EOF'
<!DOCTYPE html>
...generated report...
REPORT_EOF
cat > .context/ui-test-report.html << 'REPORT_EOF'
<!DOCTYPE html>
...generated report...
REPORT_EOF
Open it for the reviewer
为评审人员打开报告
open .context/ui-test-report.html # macOS
open .context/ui-test-report.html # macOS
xdg-open .context/ui-test-report.html # Linux
xdg-open .context/ui-test-report.html # Linux
6. Tell the user: `Report saved to .context/ui-test-report.html` and offer to open it.
**Rules:**
- Failures section comes before passes — reviewers care about what's broken first
- Failed cards are `open` by default; passed cards are collapsed
- Every STEP_FAIL card MUST have an embedded screenshot — if the screenshot file is missing, note it in the card
- Include the suggestion/fix in each failure card if one was provided
- The report must work offline — no CDN links, no external assets
- Keep the HTML under 5MB — if screenshots push it over, reduce image quality or skip baseline screenshots for passes
6. 告知用户:`Report saved to .context/ui-test-report.html`并主动询问是否打开。
**规则:**
- 失败部分放在通过部分之前 — 评审人员首先关心损坏的功能
- 失败卡片默认`open`,通过卡片默认折叠
- 每个STEP_FAIL卡片必须包含内嵌截图 — 如果截图文件缺失,在卡片中注明
- 每个失败卡片如果有修复建议则需包含
- 报告必须可离线使用 — 无CDN链接、无外部资源
- HTML大小保持在5MB以下 — 如果截图导致体积超标,降低图片质量或跳过通过用例的基线截图Adversarial Test Patterns
对抗测试模式
Apply these to every interactive element you test. Read references/adversarial-patterns.md for the full pattern library (forms, modals, navigation, error states, keyboard accessibility).
将这些模式应用到你测试的每个交互元素。完整模式库(表单、弹窗、导航、错误状态、键盘可访问性)请查看references/adversarial-patterns.md。
Deterministic Checks
确定性检查
These produce structured data, not judgment calls. Use them as the strongest form of assertion.
| Check | What it catches | Assertion |
|---|---|---|
| axe-core | WCAG violations | |
| Console errors | Runtime exceptions, failed requests | empty error array |
| Broken images | Missing/failed image loads | no images with |
| Form labels | Inputs without accessible labels | every input has |
For the exact recipes, read references/browser-recipes.md.
browse eval这些检查生成结构化数据,无主观判断,是最严谨的断言形式。
| 检查项 | 检测内容 | 断言 |
|---|---|---|
| axe-core | WCAG违规 | |
| 控制台错误 | 运行时异常、请求失败 | 错误数组为空 |
| 图片损坏 | 缺失/加载失败的图片 | 无 |
| 表单标签 | 无无障碍标签的输入框 | 每个输入框 |
具体的使用方式请查看references/browser-recipes.md。
browse evalWorkflow B: Exploratory Testing
工作流B:探索性测试
No diff, no plan — just open the app and try to break it. Use this when the user says "test my app", "find bugs", or "QA this site."
无diff、无计划,直接打开应用尝试破坏功能。适用于用户说「测试我的应用」、「查找Bug」或「对这个站点做QA」的场景。
Approach
方法
- Discover the app — read to detect the framework, then open the root URL and snapshot to see what's there
package.json - Navigate everything — click through nav links, visit every reachable page, note what exists
- Test what you find — for each page, apply the adversarial patterns below (forms, modals, navigation, keyboard, error states)
- Run deterministic checks — axe-core, console errors, broken images, form labels on every page
- Report findings — use STEP_PASS/STEP_FAIL markers, include reproduction steps for failures
Don't try to be systematic about coverage. Just explore like a user would, but with the intent to break things. The agent is good at this — let it roam.
- 了解应用 — 读取检测框架,打开根URL并快照查看内容
package.json - 遍历所有内容 — 点击所有导航链接,访问每个可到达的页面,记录存在的功能
- 测试发现的功能 — 对每个页面应用下方的对抗模式(表单、弹窗、导航、键盘、错误状态)
- 运行确定性检查 — 每个页面都运行axe-core、控制台错误、图片损坏、表单标签检查
- 报告发现 — 使用STEP_PASS/STEP_FAIL标记,失败场景包含复现步骤
不需要追求系统的覆盖度,像普通用户一样探索,但带有破坏功能的目的,Agent擅长这类工作,可以自由探索。
Tips for exploratory runs
探索性运行建议
- Start with the homepage, then follow the navigation naturally
- Try the 404 page () — is it custom or default?
/does-not-exist - Look for empty states (pages with no data)
- Test forms with garbage input before valid input
- Check mobile viewport (375px) on every page — does it overflow?
- If the app has auth, use cookie-sync first
- 从主页开始,自然跟随导航浏览
- 尝试404页面() — 是自定义还是默认样式?
/does-not-exist - 查找空状态(无数据的页面)
- 测试表单时先输入无效内容再输入有效内容
- 每个页面都检查移动端视口(375px) — 是否有内容溢出?
- 如果应用需要鉴权,先运行cookie同步
Workflow C: Parallel Testing
工作流C:并行测试
Run independent test groups concurrently using named sessions (). Each session gets its own browser. Works with both local and remote mode.
browseBROWSE_SESSION=<name>Use when testing multiple pages or categories and you want faster wall clock time.
Read references/parallel-testing.md for the full workflow: session setup, agent fan-out, cookie-sync for auth, and result merging.
使用命名会话()并发运行独立测试组,每个会话对应独立浏览器,支持本地和远程模式。
browseBROWSE_SESSION=<name>适用于测试多个页面或类别,需要缩短运行时间的场景。
完整工作流(会话设置、Agent分发、鉴权cookie同步、结果合并)请查看references/parallel-testing.md。
Design Consistency
设计一致性
Check whether changed UI matches the rest of the app visually. Read references/design-consistency.md when doing visual or design checks.
检查变更的UI是否与应用其他部分的视觉表现一致。进行视觉或设计检查时请查看references/design-consistency.md。
Test Categories
测试分类
| Category | How | Assertion type |
|---|---|---|
| Accessibility | axe-core + keyboard nav | Deterministic (violation count) |
| Visual Quality | Screenshot + heuristic evaluation | Visual judgment (weakest — note specifics) |
| Responsive | Viewport sweep + screenshots | Visual + deterministic (overflow check) |
| Console Health | Console capture eval | Deterministic (error count) |
| UX Heuristics | Snapshot + Laws of UX + Nielsen's | Structured judgment (cite specific heuristic) |
| Error States | Navigate to empty/error states | Before/after comparison |
| Data Display | Snapshot on tables/dashboards | Element match (column count, formatting) |
| Design Consistency | Screenshot baseline + changed page comparison | Visual judgment (cite specific property) |
| Exploratory | Free navigation + adversarial testing | Before/after + judgment |
Reference guides (load on demand):
- Adversarial patterns — references/adversarial-patterns.md — load when testing forms, modals, navigation, or keyboard a11y
- Browser recipes — references/browser-recipes.md — load when running deterministic checks (axe-core, console, images, form labels)
- Exploratory testing — references/exploratory-testing.md — load for Workflow B (no diff, open exploration)
- UX heuristics — references/ux-heuristics.md — load when evaluating UX quality or citing specific heuristics
- Design system — references/design-system.example.md — template for users to customize
- Design consistency — references/design-consistency.md — load when doing visual consistency checks
- Parallel testing — references/parallel-testing.md — load for Workflow C (concurrent sessions)
- Report template — references/report-template.html — HTML template for Phase 7 report generation
For worked examples with exact commands, read EXAMPLES.md if you need to see the assertion protocol in action.
| 分类 | 实现方式 | 断言类型 |
|---|---|---|
| 可访问性 | axe-core + 键盘导航 | 确定性(违规数量) |
| 视觉质量 | 截图 + 启发式评估 | 视觉判断(最不严谨 — 需说明具体内容) |
| 响应式 | 视口遍历 + 截图 | 视觉 + 确定性(溢出检查) |
| 控制台健康度 | 控制台捕获eval | 确定性(错误数量) |
| UX启发式 | 快照 + UX法则 + Nielsen规则 | 结构化判断(引用具体启发式规则) |
| 错误状态 | 导航到空/错误状态 | 前后对比 |
| 数据展示 | 表格/仪表盘快照 | 元素匹配(列数、格式) |
| 设计一致性 | 基线快照 + 变更页面对比 | 视觉判断(引用具体属性) |
| 探索性测试 | 自由导航 + 对抗测试 | 前后对比 + 判断 |
参考指南(按需加载):
- 对抗模式 — references/adversarial-patterns.md — 测试表单、弹窗、导航或键盘可访问性时加载
- 浏览器使用手册 — references/browser-recipes.md — 运行确定性检查(axe-core、控制台、图片、表单标签)时加载
- 探索性测试 — references/exploratory-testing.md — 工作流B(无diff、开放式探索)时加载
- UX启发式规则 — references/ux-heuristics.md — 评估UX质量或引用特定启发式规则时加载
- 设计系统 — references/design-system.example.md — 用户自定义模板
- 设计一致性 — references/design-consistency.md — 进行视觉一致性检查时加载
- 并行测试 — references/parallel-testing.md — 工作流C(并发会话)时加载
- 报告模板 — references/report-template.html — 阶段7生成报告的HTML模板
如需查看断言协议的实际运行示例和具体命令,可阅读EXAMPLES.md。
Best Practices
最佳实践
- Be adversarial — try to break things, don't just confirm they work
- Every assertion needs evidence — snapshot ref, eval result, or before/after diff
- Before/after for every interaction — snapshot, act, snapshot, compare
- Screenshot every failure — immediately on STEP_FAIL, save to
browse screenshot.context/ui-test-screenshots/<step-id>.png - Deterministic checks first — axe-core, console errors, form labels before visual judgment
- For localhost, start with clean local mode — use first for reproducible runs; use
browse env localonly when existing local state is required--auto-connect - Always when done — for parallel runs, stop every named session
browse stop - Report failures with reproduction steps — action, expected, actual, screenshot path, suggestion
- Parallelize independent tests — use Workflow C with named sessions when testing multiple pages or categories on a deployed site
- 保持对抗性 — 尝试破坏功能,不要仅验证功能正常
- 每个断言都要有证据 — 快照ref、eval结果或前后对比差异
- 每次交互都做前后对比 — 快照、执行操作、快照、对比
- 每个失败都要截图 — 出现STEP_FAIL时立即执行,保存到
browse screenshot.context/ui-test-screenshots/<step-id>.png - 优先使用确定性检查 — 先做axe-core、控制台错误、表单标签检查,再做视觉判断
- 测试localhost优先使用干净本地模式 — 先使用保证运行可复现,仅当需要现有本地状态时才使用
browse env local--auto-connect - 测试完成后始终执行— 并行运行时停止所有命名会话
browse stop - 报告失败时提供复现步骤 — 操作、预期结果、实际结果、截图路径、建议
- 独立测试并行化 — 测试已部署站点的多个页面或类别时,使用带命名会话的工作流C
Troubleshooting
故障排查
- "No active page": , retry. For zombies:
browse stoppkill -f "browse.*daemon" - Dev server not responding: — ask user to start it
curl http://localhost:<port> - with
browse evalfails: Useawaitinstead —.then()doesn't support top-level awaitbrowse eval - Element ref not found: again — refs change on page update
browse snapshot - Blank snapshot: or
browse wait loadbefore snapshottingbrowse wait selector ".expected" - SPA deep links 404: Navigate to first, then click through
/ - Remote auth fails: Re-run cookie-sync with , try
--context <id>--stealth - Parallel session conflicts: Ensure every command uses
browse— without it, commands go to the default sessionBROWSE_SESSION=<name> - Session not stopping: . For zombies:
BROWSE_SESSION=<name> browse stoppkill -f "browse.*<name>.*daemon"
- "No active page":执行后重试,僵尸进程可执行
browse stoppkill -f "browse.*daemon" - 开发服务无响应:执行检查,告知用户启动服务
curl http://localhost:<port> - 带的
await执行失败:改用browse eval—.then()不支持顶层awaitbrowse eval - 未找到元素ref:重新执行— 页面更新后ref会变化
browse snapshot - 空白快照:快照前执行或
browse wait loadbrowse wait selector ".expected" - SPA深层链接404:先导航到,再通过点击跳转
/ - 远程鉴权失败:使用重新运行cookie同步,尝试添加
--context <id>参数--stealth - 并行会话冲突:确保每个命令都使用
browse,未指定的命令会发送到默认会话BROWSE_SESSION=<name> - 会话无法停止:执行,僵尸进程可执行
BROWSE_SESSION=<name> browse stoppkill -f "browse.*<name>.*daemon"