ui-test

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

UI Test — Agentic UI Testing Skill

UI Test — 智能体UI测试技能

Test UI changes in a real browser. Your job is to try to break things, not confirm they work.

Three workflows:

Diff-driven — analyze a git diff, test only what changed
Exploratory — navigate the app, find bugs the developer didn't think about
Parallel — fan out independent test groups across multiple Browserbase browsers

在真实浏览器中测试UI变更。你的任务是尝试破坏功能，而非验证功能正常运行。

三种工作流：

差异驱动 — 分析git diff，仅测试变更内容
探索式 — 遍历应用，查找开发者未考虑到的Bug
并行 — 将独立测试组分发到多个Browserbase浏览器中运行

How Testing Works

测试工作原理

The main agent coordinates — it plans test strategy, delegates to sub-agents, and merges results. Sub-agents do the actual browser testing.

主Agent负责协调 — 制定测试策略、委派任务给子Agent、合并测试结果。子Agent负责执行实际的浏览器测试。

Planning: multiple angles, then execute once

规划阶段：多维度规划，一次性执行

You MUST complete all three planning rounds yourself and output them before launching any sub-agents. Planning happens in your own response — it is NOT delegated to sub-agents. Do not skip ahead to execution.

Round 1 — Functional: What are the core user flows? What should work? Write out each test as: action → expected result.

Round 2 — Adversarial: Re-read Round 1. What did you miss? Think about: different user types/roles, error paths, empty states, race conditions, edge inputs (empty, huge, special chars, rapid clicks).

Round 3 — Coverage gaps: Re-read Rounds 1–2. What about: accessibility (axe-core, keyboard-only), mobile viewports, console errors, visual consistency with the rest of the app?

Deduplicate: Merge all three rounds into one numbered list of tests. Remove overlaps. Assign each test to a group (e.g. Group A, Group B).

Then execute once — launch one sub-agent per group. Each sub-agent receives its specific list of tests to run, nothing more. Sub-agents do not explore or plan — they execute assigned tests and report results.

Output the three rounds, the merged plan, and the group assignments in your response before calling any Agent tool.

在启动任何子Agent之前，你必须自行完成全部三轮规划并输出结果。 规划由你自己在响应中完成，不得委派给子Agent，不得跳过规划直接进入执行阶段。

第一轮 — 功能测试： 核心用户流程有哪些？哪些功能应该正常运行？将每个测试用例写为：操作 → 预期结果。

第二轮 — 对抗测试： 重读第一轮规划，你遗漏了哪些场景？考虑：不同用户类型/角色、错误路径、空状态、竞态条件、边缘输入（空值、超长内容、特殊字符、快速点击）。

第三轮 — 覆盖缺口： 重读前两轮规划，还有哪些方面没覆盖？可访问性（axe-core、纯键盘操作）、移动端视口、控制台错误、与应用其他部分的视觉一致性？

去重： 将三轮规划的内容合并为一个编号的测试用例列表，移除重复内容，将每个测试用例分配到一个分组（例如A组、B组）。

然后一次性执行 — 每个分组启动一个子Agent，每个子Agent仅收到分配给自己的测试用例列表，无其他内容。子Agent不进行探索或规划，仅执行分配的测试并上报结果。

在调用任何Agent工具之前，你需要在响应中输出三轮规划内容、合并后的测试计划以及分组分配结果。

Principles for splitting work

任务拆分原则

Sub-agents run assigned tests, not open exploration. The main agent hands each sub-agent a specific numbered list of tests. Sub-agents do not plan, explore, or decide what to test — they execute the list and stop.
The bottleneck is the slowest agent — split work so no single agent has a disproportionate share. Many small agents > few large ones.
Size the effort to the change — a single component fix doesn't need many agents or many steps. A full-page redesign does. Let the scope of the diff drive the plan.
No early stopping on failures — find as many bugs as possible within the assigned tests.

子Agent仅执行分配的测试，不进行开放式探索：主Agent向每个子Agent提供明确的编号测试列表，子Agent不负责规划、探索或决定测试内容，仅执行列表内容后停止。
瓶颈取决于最慢的Agent：拆分任务时避免单个Agent分配到过多任务，多个小型Agent优于少数大型Agent。
工作量匹配变更规模：单个组件修复不需要太多Agent或测试步骤，整页重设计则需要，根据diff的范围制定测试计划。
遇到失败不提前终止：在分配的测试范围内尽可能多地查找Bug。

Giving sub-agents a step budget

为子Agent设置步骤预算

The main agent MUST include an explicit browse step limit in every sub-agent prompt. Sub-agents do not self-limit — they will run until done unless told otherwise.

As a rough heuristic: ~25 steps for a few targeted checks, ~40 for a full page with functional + adversarial + a11y, ~75 for multiple pages or a broad category. Adjust based on what the assigned tests actually require — these are starting points, not rules.

Every sub-agent prompt must include:

You have a budget of N browse steps (each `browse` command = 1 step). Count your steps as you go. When you reach N, stop immediately and report:
- STEP_PASS/STEP_FAIL for every test you completed
- STEP_SKIP|<test-id>|budget reached for every test you didn't get to

Do not retry or continue after hitting the budget.
Run only these tests: [numbered list from the merged plan]
Do not explore beyond the assigned tests.
Do NOT generate an HTML report or write any files. Return only step markers and your findings as text.

The main agent should NOT run

browse

commands itself (except to verify the dev server is up). All testing happens in sub-agents.

When a sub-agent hits its budget, the main agent accepts the partial results as-is. Do not re-run or retry the sub-agent. Include SKIPPED tests in the final report so the developer knows what wasn't covered.

主Agent必须在每个子Agent的提示词中明确指定browse步骤上限。 子Agent不会自行限制步骤，除非被明确告知，否则会一直运行到完成。

粗略参考规则：少量定向检查约25步，包含功能+对抗+可访问性的单页测试约40步，多页面或大范围测试约75步。根据实际分配的测试需求调整数值，以上仅为起点参考，非硬性规则。

每个子Agent的提示词必须包含：

You have a budget of N browse steps (each `browse` command = 1 step). Count your steps as you go. When you reach N, stop immediately and report:
- STEP_PASS/STEP_FAIL for every test you completed
- STEP_SKIP|<test-id>|budget reached for every test you didn't get to

Do not retry or continue after hitting the budget.
Run only these tests: [numbered list from the merged plan]
Do not explore beyond the assigned tests.
Do NOT generate an HTML report or write any files. Return only step markers and your findings as text.

主Agent不应自行运行

browse

命令（验证开发服务是否启动除外），所有测试都在子Agent中执行。

当子Agent耗尽步骤预算时，主Agent直接接受部分结果，无需重新运行或重试子Agent。 最终报告中需包含跳过的测试，方便开发者了解未覆盖的内容。

Reporting

报告规则

Every sub-agent reports back with:

Tests: 8 | Passed: 5 | Failed: 2 | Skipped: 1 | Pages visited: 2

The main agent merges into a final report with:

Tests: 20 | Passed: 14 | Failed: 4 | Skipped: 2 | Agents: 3 | Pass rate: 70%

Do not report "steps used" — browse command counts are implementation plumbing, not a meaningful metric for reviewers.

每个子Agent返回的结果格式：

Tests: 8 | Passed: 5 | Failed: 2 | Skipped: 1 | Pages visited: 2

主Agent合并后的最终报告格式：

Tests: 20 | Passed: 14 | Failed: 4 | Skipped: 2 | Agents: 3 | Pass rate: 70%

不要报告「使用的步骤数」—— browse命令计数是实现细节，对评审人员无参考意义。

Testing Philosophy

测试理念

You are an adversarial tester. Your goal is to find bugs, not prove correctness.

Try to break every feature you test. Don't just check "does the button exist?" — click it twice rapidly, submit empty forms, paste 500 characters, press Escape mid-flow.
Test what the developer didn't think about. Empty states, error recovery, keyboard-only navigation, mobile overflow.
Every assertion must be evidence-based. Compare before/after snapshots. Check specific elements by ref. Never report PASS without concrete evidence from the accessibility tree or a deterministic check.
Report failures with enough detail to reproduce. Include the exact action, what you expected, what you got, and a suggested fix.

你是一名对抗性测试人员，目标是查找Bug，而非证明功能正确。

尝试破坏你测试的每个功能：不要只检查「按钮是否存在」，要快速点击两次、提交空表单、粘贴500个字符、在流程中按Escape键。
测试开发者未考虑到的场景：空状态、错误恢复、纯键盘导航、移动端内容溢出。
每个断言都要有证据支撑：对比前后快照、通过ref检查特定元素，没有可访问性树或确定性检查的具体证据，不得报告通过。
报告失败时提供足够的复现细节：包含具体操作、预期结果、实际结果以及修复建议。

Assertion Protocol

断言协议

Every test step MUST produce a structured assertion. Do not write freeform "this looks good."

每个测试步骤必须输出结构化断言，不得写自由格式的「看起来没问题」。

Step markers

步骤标记

For each test step, emit exactly one marker:

STEP_PASS|<step-id>|<evidence>

STEP_FAIL|<step-id>|<expected> → <actual>|<screenshot-path>

step-id

: short identifier like

homepage-cta

form-validation-error

modal-cancel

```
evidence
```
: what you observed that proves the step passed (element ref, text content, URL, eval result)
```
expected → actual
```
: what you expected vs what you got
```
screenshot-path
```
: path to the saved screenshot (failures only — see Screenshot Capture below)

每个测试步骤必须输出恰好一个标记：

STEP_PASS|<step-id>|<evidence>

或

STEP_FAIL|<step-id>|<expected> → <actual>|<screenshot-path>

step-id

：短标识符，例如

homepage-cta

、

form-validation-error

、

modal-cancel

```
evidence
```
：证明步骤通过的观察结果（元素ref、文本内容、URL、eval执行结果）
```
expected → actual
```
：预期结果与实际结果的对比
```
screenshot-path
```
：保存的截图路径（仅失败步骤需要，见下文截图捕获规则）

Screenshot Capture for Failures

失败场景的截图捕获

Every STEP_FAIL MUST have an accompanying screenshot so the developer can see what went wrong visually.

When a test step fails:

bash

undefined

每个STEP_FAIL必须附带截图，方便开发者直观看到问题。

当测试步骤失败时：

bash

undefined

1. Take a screenshot immediately after observing the failure

1. 观察到失败后立即截图

browse screenshot --path .context/ui-test-screenshots/<step-id>.png

If --path is not supported, take the screenshot and save manually:

如果--path参数不支持，先截图再手动保存：

browse screenshot

The browse CLI will output the screenshot path — move/copy it:

browse CLI会输出截图路径，移动/复制文件到目标位置：

cp /tmp/browse-screenshot-*.png .context/ui-test-screenshots/<step-id>.png


Setup the screenshot directory at the start of any test run:

```bash
mkdir -p .context/ui-test-screenshots

Rules:

File name = step-id (e.g.,

double-submit.png

axe-audit.png

modal-focus-trap.png

)

Store in
```
.context/ui-test-screenshots/
```
— this directory is gitignored and accessible to the developer and other agents
For parallel runs, include the session name:
```
<session>-<step-id>.png
```
(e.g.,
```
signup-double-submit.png
```
)
Take the screenshot at the moment of failure — capture the broken state, not after recovery
For visual/layout bugs, also screenshot the baseline (working state) for comparison:
```
<step-id>-baseline.png
```

cp /tmp/browse-screenshot-*.png .context/ui-test-screenshots/<step-id>.png


测试运行开始时先创建截图目录：

```bash
mkdir -p .context/ui-test-screenshots

规则：

文件名 = step-id（例如

double-submit.png

、

axe-audit.png

、

modal-focus-trap.png

）

存储在
```
.context/ui-test-screenshots/
```
目录下，该目录已加入gitignore，开发者和其他Agent均可访问
并行运行时，在文件名中加入会话名：
```
<session>-<step-id>.png
```
（例如
```
signup-double-submit.png
```
）
失败瞬间立即截图，捕获故障状态，而非恢复后的状态
针对视觉/布局Bug，同时截图基线（正常状态）用于对比：
```
<step-id>-baseline.png
```

How to verify (in order of rigor)

验证方式（按严谨度排序）

Deterministic check (strongest) —
```
browse eval
```
returns structured data you can inspect. Examples: axe-core violation count,
```
document.title
```
, form field value, console error array, element count.
Snapshot element match — a specific element with a specific role and text exists in the accessibility tree. Check by ref:
```
@0-12 button "Save"
```
. An element either exists in the tree or it doesn't.
Before/after comparison — snapshot before action, act, snapshot after. Verify the tree changed in the expected way (element appeared, disappeared, text changed).
Screenshot + visual judgment (weakest) — only for visual-only properties (color, spacing, layout) that the accessibility tree cannot capture. Always accompany with what specifically you're evaluating.

确定性检查（最严谨） —
```
browse eval
```
返回可检查的结构化数据，例如：axe-core违规数量、
```
document.title
```
、表单字段值、控制台错误数组、元素数量。
快照元素匹配 — 可访问性树中存在指定role和文本的特定元素，通过ref检查：
```
@0-12 button "Save"
```
，元素要么存在要么不存在，判断无歧义。
前后对比 — 操作前快照、执行操作、操作后快照，验证树结构按预期变化（元素出现、消失、文本变更）。
截图+视觉判断（最不严谨） — 仅用于可访问性树无法捕获的纯视觉属性（颜色、间距、布局），必须同时说明具体评估的内容。

Before/after comparison pattern

前后对比模式

This is the core verification loop. Use it for every interaction:

bash

undefined

这是核心验证循环，每次交互都要使用：

bash

undefined

1. BEFORE: capture state

1. 操作前：捕获状态

browse snapshot

Record: what elements exist, their text, their refs

记录：存在哪些元素、元素文本、元素ref

2. ACT: perform the interaction

2. 执行操作

browse click @0-12

3. AFTER: capture new state

3. 操作后：捕获新状态

browse snapshot

Compare: what changed? What appeared? What disappeared?

对比：发生了什么变化？什么出现了？什么消失了？

4. ASSERT: emit marker based on comparison

4. 断言：根据对比结果输出标记

If dialog appeared: STEP_PASS|modal-open|dialog "Confirm" appeared at @0-20

如果弹窗出现：STEP_PASS|modal-open|dialog "Confirm" appeared at @0-20

If nothing changed:

如果无变化：

browse screenshot --path .context/ui-test-screenshots/modal-open.png

STEP_FAIL|modal-open|expected dialog to appear → snapshot unchanged|.context/ui-test-screenshots/modal-open.png

undefined

undefined

Setup

环境搭建

bash

which browse || npm install -g @browserbasehq/browse-cli

bash

which browse || npm install -g @browserbasehq/browse-cli

Avoid permission fatigue

避免权限反复确认

This skill runs many

browse

commands (snapshots, clicks, evals). To avoid approving each one, add

browse

to your allowed commands:

Add both patterns to

.claude/settings.json

(project-level) or

~/.claude/settings.json

(user-level):

json

{
  "permissions": {
    "allow": [
      "Bash(browse:*)",
      "Bash(BROWSE_SESSION=*)"
    ]
  }
}

The first pattern covers plain

browse

commands. The second covers parallel sessions (

BROWSE_SESSION=signup browse open ...

). Both are needed to avoid approval prompts.

该技能会运行大量

browse

命令（快照、点击、eval），为避免每次都需要确认，将

browse

加入允许命令列表：

将以下两个规则添加到

.claude/settings.json

（项目级别）或

~/.claude/settings.json

（用户级别）：

json

{
  "permissions": {
    "allow": [
      "Bash(browse:*)",
      "Bash(BROWSE_SESSION=*)"
    ]
  }
}

第一个规则覆盖普通

browse

命令，第二个规则覆盖并行会话（

BROWSE_SESSION=signup browse open ...

），两者都配置才能避免权限提示。

Mode Selection

模式选择

Target	Mode	Command	Auth
`localhost` / `127.0.0.1`	Local	`browse env local`	None needed (clean isolated local browser by default)
Deployed/staging site	Remote	`browse env remote`	cookie-sync → `--context-id`

Rule: If the target URL contains
localhost
or
127.0.0.1
, always use
browse env local
.

目标	模式	命令	鉴权
`localhost` / `127.0.0.1`	本地	`browse env local`	无需鉴权（默认使用干净隔离的本地浏览器）
已部署/预发站点	远程	`browse env remote`	cookie同步 → `--context-id`

规则：如果目标URL包含
localhost
或
127.0.0.1
，必须使用
browse env local
。

Local Mode (default for localhost)

本地模式（localhost默认模式）

bash

browse env local
browse open http://localhost:3000

browse env local

uses a clean isolated local browser by default, which is best for reproducible localhost QA runs.

Use local-mode variants only when needed:

```
browse env local --auto-connect
```
— auto-discover existing local Chrome, fallback to isolated. Use this only when the test explicitly needs existing local login/cookies/state.
```
browse env local <port|url>
```
— attach to a specific CDP target (explicit local browser attach).

bash

browse env local
browse open http://localhost:3000

browse env local

默认使用干净隔离的本地浏览器，最适合可复现的localhost QA运行。

仅在必要时使用本地模式变体：

```
browse env local --auto-connect
```
— 自动发现现有本地Chrome，无可用实例时回退到隔离浏览器，仅当测试明确需要现有本地登录态/cookie/状态时使用。
```
browse env local <port|url>
```
— 连接到指定CDP目标（显式连接本地浏览器）。

Remote Mode (deployed sites via cookie-sync)

远程模式（通过cookie同步测试已部署站点）

bash

undefined

bash

undefined

Step 1: Sync cookies from local Chrome to Browserbase

步骤1：将本地Chrome的cookie同步到Browserbase

node .claude/skills/cookie-sync/scripts/cookie-sync.mjs --domains your-app.com

Output: Context ID: ctx_abc123

输出：Context ID: ctx_abc123

Step 2: Switch to remote mode

步骤2：切换到远程模式

browse env remote browse open https://staging.your-app.com --context-id ctx_abc123 --persist browse snapshot

... run tests ...

... 运行测试 ...

browse stop


Cookie-sync flags: `--domains`, `--context`, `--stealth`, `--proxy "City,ST,US"`

browse stop


Cookie同步参数：`--domains`、`--context`、`--stealth`、`--proxy "City,ST,US"`

Workflow A: Diff-Driven Testing

工作流A：差异驱动测试

Phase 1: Analyze the diff

阶段1：分析diff

bash

git diff --name-only HEAD~1          # or: git diff --name-only / git diff --name-only main...HEAD
git diff HEAD~1 -- <file>            # read actual changes

Categorize changed files:

File pattern	UI impact	What to test
`.tsx` , `.jsx` , `.vue` , `.svelte`	Component	Render, interaction, state, edge cases
`pages/` , `app/` , `src/routes/**`	Route/page	Navigation, page load, content, 404 handling
`.css` , `.scss` , `*.module.css`	Style	Visual appearance (screenshot), responsive
`form` , `input` , `field`	Form	Validation, submission, empty input, long input, special chars
`modal` , `dialog` , `dropdown`	Interactive	Open/close, escape, focus trap, cancel vs confirm
`nav` , `menu` , `header`	Navigation	Links, active states, routing, keyboard nav
Non-UI files only	None	Skip — report "no UI tests needed"

bash

git diff --name-only HEAD~1          # 或：git diff --name-only / git diff --name-only main...HEAD
git diff HEAD~1 -- <file>            # 查看具体变更内容

对变更文件分类：

文件模式	UI影响	测试内容
`.tsx` , `.jsx` , `.vue` , `.svelte`	组件	渲染、交互、状态、边缘场景
`pages/` , `app/` , `src/routes/**`	路由/页面	导航、页面加载、内容、404处理
`.css` , `.scss` , `*.module.css`	样式	视觉表现（截图）、响应式
`form` , `input` , `field`	表单	验证、提交、空输入、长输入、特殊字符
`modal` , `dialog` , `dropdown`	交互组件	打开/关闭、Escape键、焦点陷阱、取消 vs 确认
`nav` , `menu` , `header`	导航	链接、激活状态、路由、键盘导航
仅非UI文件	无	跳过，报告「无需UI测试」

Phase 2: Map files to URLs

阶段2：文件映射到URL

Detect framework:

cat package.json | grep -E '"(next|react|vue|nuxt|svelte|@sveltejs|angular|vite)"'

Framework	Default port	File → URL pattern
Next.js App Router	3000	`app/dashboard/page.tsx` → `/dashboard`
Next.js Pages Router	3000	`pages/about.tsx` → `/about`
Vite	5173	Check router config
Nuxt	3000	`pages/index.vue` → `/`
SvelteKit	5173	`src/routes/+page.svelte` → `/`
Angular	4200	Check routing module

检测框架：

cat package.json | grep -E '"(next|react|vue|nuxt|svelte|@sveltejs|angular|vite)"'

框架	默认端口	文件→URL规则
Next.js App Router	3000	`app/dashboard/page.tsx` → `/dashboard`
Next.js Pages Router	3000	`pages/about.tsx` → `/about`
Vite	5173	查看路由配置
Nuxt	3000	`pages/index.vue` → `/`
SvelteKit	5173	`src/routes/+page.svelte` → `/`
Angular	4200	查看路由模块

Phase 3: Ensure the right code is running

阶段3：确保运行的是对应代码

Before testing, verify the dev server is serving the code from the diff — not a stale branch.

If testing a PR or specific branch:

bash

undefined

测试前，验证开发服务运行的是diff对应的代码，而非过时分支的代码。

如果测试PR或指定分支：

bash

undefined

Check what branch is currently checked out

查看当前检出的分支

git branch --show-current

If it's not the PR branch, switch to it

如果不是目标PR分支，切换分支

git fetch origin <branch> && git checkout <branch>

Install deps — the lockfile may differ between branches

安装依赖 — 不同分支的锁文件可能不同

yarn install # or npm install / pnpm install


If the dev server was already running on a different branch, restart it after checkout.

**Find a running dev server:**
```bash
for port in 3000 3001 5173 4200 8080 8000 5000; do
  s=$(curl -s -o /dev/null -w "%{http_code}" "http://localhost:$port" 2>/dev/null)
  if [ "$s" != "000" ]; then echo "Dev server on port $port (HTTP $s)"; fi
done

If nothing found: tell the user to start their dev server.

Verify it actually renders: After

browse open

browse snapshot

, check that the accessibility tree contains real page content (navigation, headings, interactive elements) — not just an error overlay or empty body. Next.js dev servers can return HTTP 200 while showing a full-screen build error dialog. If the snapshot is empty or dominated by an error dialog, the server is broken — fix the build before testing.

yarn install # 或 npm install / pnpm install


如果开发服务之前运行在其他分支上，切换分支后重启服务。

**查找运行中的开发服务：**
```bash
for port in 3000 3001 5173 4200 8080 8000 5000; do
  s=$(curl -s -o /dev/null -w "%{http_code}" "http://localhost:$port" 2>/dev/null)
  if [ "$s" != "000" ]; then echo "Dev server on port $port (HTTP $s)"; fi
done

如果未找到：告知用户启动开发服务。

验证页面可正常渲染： 执行

browse open

browse snapshot

后，检查可访问性树包含真实页面内容（导航、标题、交互元素），而非仅错误覆盖层或空body。Next.js开发服务可能返回HTTP 200但显示全屏构建错误弹窗，如果快照为空或被错误弹窗主导，说明服务已损坏，测试前先修复构建问题。

Phase 4: Generate test plan

阶段4：生成测试计划

For each changed area, plan both happy path AND adversarial tests:

Test Plan (based on git diff)
=============================
Changed: src/components/SignupForm.tsx (added email validation)

1. [happy] Valid email submits successfully
   URL: http://localhost:3000/signup
   Steps: fill valid email → submit → verify success message appears

2. [adversarial] Invalid email shows error
   Steps: fill "not-an-email" → submit → verify error message appears

3. [adversarial] Empty form submission
   Steps: click submit without filling anything → verify error, no crash

4. [adversarial] XSS in email field
   Steps: fill "<script>alert(1)</script>" → submit → verify sanitized/rejected

5. [adversarial] Rapid double-submit
   Steps: click submit twice quickly → verify no duplicate submission

6. [adversarial] Keyboard-only flow
   Steps: Tab to email → type → Tab to submit → Enter → verify success

针对每个变更区域，同时规划** happy path和对抗测试 **：

Test Plan (based on git diff)
=============================
Changed: src/components/SignupForm.tsx (added email validation)

1. [happy] Valid email submits successfully
   URL: http://localhost:3000/signup
   Steps: fill valid email → submit → verify success message appears

2. [adversarial] Invalid email shows error
   Steps: fill "not-an-email" → submit → verify error message appears

3. [adversarial] Empty form submission
   Steps: click submit without filling anything → verify error, no crash

4. [adversarial] XSS in email field
   Steps: fill "<script>alert(1)</script>" → submit → verify sanitized/rejected

5. [adversarial] Rapid double-submit
   Steps: click submit twice quickly → verify no duplicate submission

6. [adversarial] Keyboard-only flow
   Steps: Tab to email → type → Tab to submit → Enter → verify success

Phase 5: Execute tests

阶段5：执行测试

bash

browse stop 2>/dev/null
mkdir -p .context/ui-test-screenshots

bash

browse stop 2>/dev/null
mkdir -p .context/ui-test-screenshots

localhost/default QA → clean, reproducible local run

localhost/默认QA → 干净可复现的本地运行

browse env local


For each test, follow the **before/after pattern**:

```bash

browse env local


每个测试遵循**前后对比模式**：

```bash

Navigate

BEFORE snapshot

操作前快照

browse snapshot

Note the current state: elements, refs, text

记录当前状态：元素、ref、文本

ACT

执行操作

browse click @0-ref

or: browse fill "selector" "value"

或：browse fill "selector" "value"

or: browse type "text"

或：browse type "text"

or: browse press Enter

或：browse press Enter

AFTER snapshot

操作后快照

browse snapshot

Compare against BEFORE: what changed?

与操作前对比：发生了什么变化？

ASSERT with marker

带标记的断言

STEP_PASS|step-id|evidence OR STEP_FAIL|step-id|expected → actual

STEP_PASS|step-id|evidence 或 STEP_FAIL|step-id|expected → actual

undefined

undefined

Phase 6: Report results

阶段6：报告结果

undefined

undefined

UI Test Results

STEP_PASS|valid-email-submit|status "Thanks!" appeared at @0-42 after submit

URL: http://localhost:3000/signup
Before: form with email input @0-3, submit button @0-7
Action: filled "user@test.com", clicked @0-7
After: form replaced by status element with "Thanks! We'll be in touch."

URL: http://localhost:3000/signup
Before: form with email input @0-3, submit button @0-7
Action: filled "user@test.com", clicked @0-7
After: form replaced by status element with "Thanks! We'll be in touch."

STEP_FAIL|double-submit|expected single submission → form submitted twice|.context/ui-test-screenshots/double-submit.png

URL: http://localhost:3000/signup
Before: form with submit button @0-7
Action: clicked @0-7 twice rapidly
After: two success toasts appeared, suggesting duplicate submission
Screenshot: .context/ui-test-screenshots/double-submit.png
Suggestion: disable submit button after first click, or debounce the handler

Summary: 4/6 passed, 2 failed Failed: double-submit, xss-sanitization

Screenshots saved to

.context/ui-test-screenshots/

— open any failed step's screenshot to see the broken state.


Always `browse stop` when done.

URL: http://localhost:3000/signup
Before: form with submit button @0-7
Action: clicked @0-7 twice rapidly
After: two success toasts appeared, suggesting duplicate submission
Screenshot: .context/ui-test-screenshots/double-submit.png
Suggestion: disable submit button after first click, or debounce the handler

Summary: 4/6 passed, 2 failed Failed: double-submit, xss-sanitization

Screenshots saved to

.context/ui-test-screenshots/

— open any failed step's screenshot to see the broken state.


测试完成后始终执行`browse stop`。

Phase 7: Generate HTML report

阶段7：生成HTML报告

After producing the text report, generate a standalone HTML report that a reviewer can open in a browser. The report embeds screenshots inline (base64) so it works as a single file — no external dependencies.

Why: Text reports are good for the agent conversation, but reviewers (PMs, designers, other engineers) want a visual artifact they can open, scan, and share. Screenshots inline make failures immediately obvious.

生成文本报告后，生成可在浏览器中打开的独立HTML报告。报告将截图内嵌为base64格式，单个文件即可运行，无外部依赖。

原因： 文本报告适合Agent会话，但评审人员（产品经理、设计师、其他工程师）需要可打开、浏览、分享的可视化产物，内嵌截图可以让失败问题一目了然。

How to generate

生成方式

Read the HTML template at references/report-template.html
Build the report by replacing the template placeholders with actual test data:

Placeholder	Value
`{{TITLE}}`	Report title for `<title>` tag (e.g., "UI Test: PR #1234 — OAuth Settings")
`{{TITLE_HTML}}`	Report title for the visible `<h1>` . If a PR URL is available, wrap the PR reference in an `<a>` tag so it's clickable (e.g., `UI Test: <a href="https://github.com/org/repo/pull/1234">PR #1234</a> — OAuth Settings` ). If no URL, use plain text same as `{{TITLE}}` .
`{{META}}`	One-line context: date, app URL, user, branch
`{{TOTAL_TESTS}}`	Total STEP_PASS + STEP_FAIL count
`{{AGENT_COUNT}}`	Number of sub-agents that ran
`{{PASS_COUNT}}`	Number of STEP_PASS
`{{FAIL_COUNT}}`	Number of STEP_FAIL
`{{PASS_RATE}}`	Integer percentage (e.g., "92")
`{{RATE_CLASS}}`	`good` (≥90%), `warn` (70–89%), `bad` (<70%)
`{{FAILURES_SECTION}}`	HTML for failed test cards (see below)
`{{PASSES_SECTION}}`	HTML for passed test cards (see below)

For each test result, generate a
```
<details>
```
card. Failed tests should be open by default so reviewers see them immediately:

html

<!-- Failed test card (open by default) -->
<div class="section">
  <h2>Failures <span class="count">{{FAIL_COUNT}}</span></h2>
  <details class="test-card fail" open>
    <summary>
      <span class="badge fail">FAIL</span>
      <span class="step-id">step-id-here</span>
      <span class="evidence">expected → actual</span>
    </summary>
    <div class="body">
      <dl>
        <dt>URL</dt><dd>http://localhost:3000/path</dd>
        <dt>Action</dt><dd>What was done</dd>
        <dt>Expected</dt><dd>What should have happened</dd>
        <dt>Actual</dt><dd>What happened instead</dd>
      </dl>
      <div class="suggestion">Fix: description of suggested fix</div>
      <div class="screenshot">
        <img src="data:image/png;base64,..." alt="Screenshot of failure">
        <div class="caption">step-id.png — captured at moment of failure</div>
      </div>
    </div>
  </details>
</div>

<!-- Passed test card (collapsed by default) -->
<div class="section">
  <h2>Passed <span class="count">{{PASS_COUNT}}</span></h2>
  <details class="test-card pass">
    <summary>
      <span class="badge pass">PASS</span>
      <span class="step-id">step-id-here</span>
      <span class="evidence">evidence summary</span>
    </summary>
    <div class="body">
      <dl>
        <dt>URL</dt><dd>http://localhost:3000/path</dd>
        <dt>Evidence</dt><dd>What was observed</dd>
      </dl>
    </div>
  </details>
</div>

Embed screenshots as base64 so the HTML is fully self-contained:

bash

undefined

读取references/report-template.html的HTML模板
用实际测试数据替换模板占位符生成报告：

占位符	取值
`{{TITLE}}`	`<title>` 标签的报告标题（例如"UI Test: PR #1234 — OAuth Settings"）
`{{TITLE_HTML}}`	页面可见 `<h1>` 的报告标题。如果有PR URL，将PR引用包裹在 `<a>` 标签中实现点击跳转（例如 `UI Test: <a href="https://github.com/org/repo/pull/1234">PR #1234</a> — OAuth Settings` ），无URL则使用与 `{{TITLE}}` 相同的纯文本。
`{{META}}`	单行上下文：日期、应用URL、用户、分支
`{{TOTAL_TESTS}}`	STEP_PASS + STEP_FAIL总数
`{{AGENT_COUNT}}`	运行的子Agent数量
`{{PASS_COUNT}}`	STEP_PASS数量
`{{FAIL_COUNT}}`	STEP_FAIL数量
`{{PASS_RATE}}`	整数百分比（例如"92"）
`{{RATE_CLASS}}`	`good` (≥90%), `warn` (70–89%), `bad` (<70%)
`{{FAILURES_SECTION}}`	失败测试卡片的HTML（见下文）
`{{PASSES_SECTION}}`	通过测试卡片的HTML（见下文）

每个测试结果生成一个
```
<details>
```
卡片，失败测试默认展开，方便评审人员立即看到：

html

<!-- 失败测试卡片（默认展开） -->
<div class="section">
  <h2>Failures <span class="count">{{FAIL_COUNT}}</span></h2>
  <details class="test-card fail" open>
    <summary>
      <span class="badge fail">FAIL</span>
      <span class="step-id">step-id-here</span>
      <span class="evidence">expected → actual</span>
    </summary>
    <div class="body">
      <dl>
        <dt>URL</dt><dd>http://localhost:3000/path</dd>
        <dt>Action</dt><dd>What was done</dd>
        <dt>Expected</dt><dd>What should have happened</dd>
        <dt>Actual</dt><dd>What happened instead</dd>
      </dl>
      <div class="suggestion">Fix: description of suggested fix</div>
      <div class="screenshot">
        <img src="data:image/png;base64,..." alt="Screenshot of failure">
        <div class="caption">step-id.png — captured at moment of failure</div>
      </div>
    </div>
  </details>
</div>

<!-- 通过测试卡片（默认折叠） -->
<div class="section">
  <h2>Passed <span class="count">{{PASS_COUNT}}</span></h2>
  <details class="test-card pass">
    <summary>
      <span class="badge pass">PASS</span>
      <span class="step-id">step-id-here</span>
      <span class="evidence">evidence summary</span>
    </summary>
    <div class="body">
      <dl>
        <dt>URL</dt><dd>http://localhost:3000/path</dd>
        <dt>Evidence</dt><dd>What was observed</dd>
      </dl>
    </div>
  </details>
</div>

将截图内嵌为base64，确保HTML完全独立：

bash

undefined

Convert screenshot to base64 data URI

将截图转换为base64数据URI

base64 -i .context/ui-test-screenshots/step-id.png | tr -d '\n'

Use as: src="data:image/png;base64,<output>"

用法：src="data:image/png;base64,<output>"


Read each screenshot file referenced in STEP_FAIL markers, base64-encode it, and embed it as an `<img src="data:image/png;base64,...">` in the corresponding test card. For STEP_PASS, only embed a screenshot if one was explicitly taken (e.g., baseline screenshots).

5. Write the final HTML to `.context/ui-test-report.html`:

```bash


读取STEP_FAIL标记中引用的每个截图文件，base64编码后内嵌到对应测试卡片的`<img src="data:image/png;base64,...">`中。STEP_PASS仅在明确截图时（例如基线截图）才内嵌。

5. 将最终HTML写入`.context/ui-test-report.html`：

```bash

Write the generated HTML

写入生成的HTML

cat > .context/ui-test-report.html << 'REPORT_EOF'

<!DOCTYPE html>

...generated report... REPORT_EOF

cat > .context/ui-test-report.html << 'REPORT_EOF'

<!DOCTYPE html>

...generated report... REPORT_EOF

Open it for the reviewer

为评审人员打开报告

open .context/ui-test-report.html # macOS

xdg-open .context/ui-test-report.html # Linux


6. Tell the user: `Report saved to .context/ui-test-report.html` and offer to open it.

**Rules:**
- Failures section comes before passes — reviewers care about what's broken first
- Failed cards are `open` by default; passed cards are collapsed
- Every STEP_FAIL card MUST have an embedded screenshot — if the screenshot file is missing, note it in the card
- Include the suggestion/fix in each failure card if one was provided
- The report must work offline — no CDN links, no external assets
- Keep the HTML under 5MB — if screenshots push it over, reduce image quality or skip baseline screenshots for passes


6. 告知用户：`Report saved to .context/ui-test-report.html`并主动询问是否打开。

**规则：**
- 失败部分放在通过部分之前 — 评审人员首先关心损坏的功能
- 失败卡片默认`open`，通过卡片默认折叠
- 每个STEP_FAIL卡片必须包含内嵌截图 — 如果截图文件缺失，在卡片中注明
- 每个失败卡片如果有修复建议则需包含
- 报告必须可离线使用 — 无CDN链接、无外部资源
- HTML大小保持在5MB以下 — 如果截图导致体积超标，降低图片质量或跳过通过用例的基线截图

Adversarial Test Patterns

对抗测试模式

Apply these to every interactive element you test. Read references/adversarial-patterns.md for the full pattern library (forms, modals, navigation, error states, keyboard accessibility).

将这些模式应用到你测试的每个交互元素。完整模式库（表单、弹窗、导航、错误状态、键盘可访问性）请查看references/adversarial-patterns.md。

Deterministic Checks

确定性检查

These produce structured data, not judgment calls. Use them as the strongest form of assertion.

Check	What it catches	Assertion
axe-core	WCAG violations	`violations.length === 0`
Console errors	Runtime exceptions, failed requests	empty error array
Broken images	Missing/failed image loads	no images with `naturalWidth === 0`
Form labels	Inputs without accessible labels	every input has `hasLabel: true`

For the exact

browse eval

recipes, read references/browser-recipes.md.

这些检查生成结构化数据，无主观判断，是最严谨的断言形式。

检查项	检测内容	断言
axe-core	WCAG违规	`violations.length === 0`
控制台错误	运行时异常、请求失败	错误数组为空
图片损坏	缺失/加载失败的图片	无 `naturalWidth === 0` 的图片
表单标签	无无障碍标签的输入框	每个输入框 `hasLabel: true`

具体的

browse eval

使用方式请查看references/browser-recipes.md。

Workflow B: Exploratory Testing

工作流B：探索性测试

No diff, no plan — just open the app and try to break it. Use this when the user says "test my app", "find bugs", or "QA this site."

无diff、无计划，直接打开应用尝试破坏功能。适用于用户说「测试我的应用」、「查找Bug」或「对这个站点做QA」的场景。

Approach

方法

Discover the app — read
```
package.json
```
to detect the framework, then open the root URL and snapshot to see what's there
Navigate everything — click through nav links, visit every reachable page, note what exists
Test what you find — for each page, apply the adversarial patterns below (forms, modals, navigation, keyboard, error states)
Run deterministic checks — axe-core, console errors, broken images, form labels on every page
Report findings — use STEP_PASS/STEP_FAIL markers, include reproduction steps for failures

Don't try to be systematic about coverage. Just explore like a user would, but with the intent to break things. The agent is good at this — let it roam.

了解应用 — 读取
```
package.json
```
检测框架，打开根URL并快照查看内容
遍历所有内容 — 点击所有导航链接，访问每个可到达的页面，记录存在的功能
测试发现的功能 — 对每个页面应用下方的对抗模式（表单、弹窗、导航、键盘、错误状态）
运行确定性检查 — 每个页面都运行axe-core、控制台错误、图片损坏、表单标签检查
报告发现 — 使用STEP_PASS/STEP_FAIL标记，失败场景包含复现步骤

不需要追求系统的覆盖度，像普通用户一样探索，但带有破坏功能的目的，Agent擅长这类工作，可以自由探索。

Tips for exploratory runs

探索性运行建议

Start with the homepage, then follow the navigation naturally
Try the 404 page (
```
/does-not-exist
```
) — is it custom or default?
Look for empty states (pages with no data)
Test forms with garbage input before valid input
Check mobile viewport (375px) on every page — does it overflow?
If the app has auth, use cookie-sync first

从主页开始，自然跟随导航浏览
尝试404页面（
```
/does-not-exist
```
） — 是自定义还是默认样式？
查找空状态（无数据的页面）
测试表单时先输入无效内容再输入有效内容
每个页面都检查移动端视口（375px） — 是否有内容溢出？
如果应用需要鉴权，先运行cookie同步

Workflow C: Parallel Testing

工作流C：并行测试

Run independent test groups concurrently using named

browse

sessions (

BROWSE_SESSION=<name>

). Each session gets its own browser. Works with both local and remote mode.

Use when testing multiple pages or categories and you want faster wall clock time.

Read references/parallel-testing.md for the full workflow: session setup, agent fan-out, cookie-sync for auth, and result merging.

使用命名

browse

会话（

BROWSE_SESSION=<name>

）并发运行独立测试组，每个会话对应独立浏览器，支持本地和远程模式。

适用于测试多个页面或类别，需要缩短运行时间的场景。

完整工作流（会话设置、Agent分发、鉴权cookie同步、结果合并）请查看references/parallel-testing.md。

Design Consistency

设计一致性

Check whether changed UI matches the rest of the app visually. Read references/design-consistency.md when doing visual or design checks.

检查变更的UI是否与应用其他部分的视觉表现一致。进行视觉或设计检查时请查看references/design-consistency.md。

Test Categories

测试分类

Category	How	Assertion type
Accessibility	axe-core + keyboard nav	Deterministic (violation count)
Visual Quality	Screenshot + heuristic evaluation	Visual judgment (weakest — note specifics)
Responsive	Viewport sweep + screenshots	Visual + deterministic (overflow check)
Console Health	Console capture eval	Deterministic (error count)
UX Heuristics	Snapshot + Laws of UX + Nielsen's	Structured judgment (cite specific heuristic)
Error States	Navigate to empty/error states	Before/after comparison
Data Display	Snapshot on tables/dashboards	Element match (column count, formatting)
Design Consistency	Screenshot baseline + changed page comparison	Visual judgment (cite specific property)
Exploratory	Free navigation + adversarial testing	Before/after + judgment

Reference guides (load on demand):

Adversarial patterns — references/adversarial-patterns.md — load when testing forms, modals, navigation, or keyboard a11y
Browser recipes — references/browser-recipes.md — load when running deterministic checks (axe-core, console, images, form labels)
Exploratory testing — references/exploratory-testing.md — load for Workflow B (no diff, open exploration)
UX heuristics — references/ux-heuristics.md — load when evaluating UX quality or citing specific heuristics
Design system — references/design-system.example.md — template for users to customize
Design consistency — references/design-consistency.md — load when doing visual consistency checks
Parallel testing — references/parallel-testing.md — load for Workflow C (concurrent sessions)
Report template — references/report-template.html — HTML template for Phase 7 report generation

For worked examples with exact commands, read EXAMPLES.md if you need to see the assertion protocol in action.

分类	实现方式	断言类型
可访问性	axe-core + 键盘导航	确定性（违规数量）
视觉质量	截图 + 启发式评估	视觉判断（最不严谨 — 需说明具体内容）
响应式	视口遍历 + 截图	视觉 + 确定性（溢出检查）
控制台健康度	控制台捕获eval	确定性（错误数量）
UX启发式	快照 + UX法则 + Nielsen规则	结构化判断（引用具体启发式规则）
错误状态	导航到空/错误状态	前后对比
数据展示	表格/仪表盘快照	元素匹配（列数、格式）
设计一致性	基线快照 + 变更页面对比	视觉判断（引用具体属性）
探索性测试	自由导航 + 对抗测试	前后对比 + 判断

参考指南（按需加载）：

对抗模式 — references/adversarial-patterns.md — 测试表单、弹窗、导航或键盘可访问性时加载
浏览器使用手册 — references/browser-recipes.md — 运行确定性检查（axe-core、控制台、图片、表单标签）时加载
探索性测试 — references/exploratory-testing.md — 工作流B（无diff、开放式探索）时加载
UX启发式规则 — references/ux-heuristics.md — 评估UX质量或引用特定启发式规则时加载
设计系统 — references/design-system.example.md — 用户自定义模板
设计一致性 — references/design-consistency.md — 进行视觉一致性检查时加载
并行测试 — references/parallel-testing.md — 工作流C（并发会话）时加载
报告模板 — references/report-template.html — 阶段7生成报告的HTML模板

如需查看断言协议的实际运行示例和具体命令，可阅读EXAMPLES.md。

Best Practices

最佳实践

Be adversarial — try to break things, don't just confirm they work
Every assertion needs evidence — snapshot ref, eval result, or before/after diff
Before/after for every interaction — snapshot, act, snapshot, compare
Screenshot every failure —
```
browse screenshot
```
immediately on STEP_FAIL, save to
```
.context/ui-test-screenshots/<step-id>.png
```
Deterministic checks first — axe-core, console errors, form labels before visual judgment
For localhost, start with clean local mode — use
```
browse env local
```
first for reproducible runs; use
```
--auto-connect
```
only when existing local state is required
Always
browse stop
when done — for parallel runs, stop every named session
Report failures with reproduction steps — action, expected, actual, screenshot path, suggestion
Parallelize independent tests — use Workflow C with named sessions when testing multiple pages or categories on a deployed site

保持对抗性 — 尝试破坏功能，不要仅验证功能正常
每个断言都要有证据 — 快照ref、eval结果或前后对比差异
每次交互都做前后对比 — 快照、执行操作、快照、对比
每个失败都要截图 — 出现STEP_FAIL时立即执行
```
browse screenshot
```
，保存到
```
.context/ui-test-screenshots/<step-id>.png
```
优先使用确定性检查 — 先做axe-core、控制台错误、表单标签检查，再做视觉判断
测试localhost优先使用干净本地模式 — 先使用
```
browse env local
```
保证运行可复现，仅当需要现有本地状态时才使用
```
--auto-connect
```
测试完成后始终执行
browse stop
— 并行运行时停止所有命名会话
报告失败时提供复现步骤 — 操作、预期结果、实际结果、截图路径、建议
独立测试并行化 — 测试已部署站点的多个页面或类别时，使用带命名会话的工作流C

Troubleshooting

故障排查

"No active page":
```
browse stop
```
, retry. For zombies:
```
pkill -f "browse.*daemon"
```
Dev server not responding:
```
curl http://localhost:<port>
```
— ask user to start it
browse eval
with
await
fails: Use
```
.then()
```
instead —
```
browse eval
```
doesn't support top-level await
Element ref not found:
```
browse snapshot
```
again — refs change on page update

Blank snapshot:

browse wait load

browse wait selector ".expected"

before snapshotting

SPA deep links 404: Navigate to
```
/
```
first, then click through
Remote auth fails: Re-run cookie-sync with
```
--context <id>
```
, try
```
--stealth
```
Parallel session conflicts: Ensure every
```
browse
```
command uses
```
BROWSE_SESSION=<name>
```
— without it, commands go to the default session

Session not stopping:

BROWSE_SESSION=<name> browse stop

. For zombies:

pkill -f "browse.*<name>.*daemon"

"No active page"：执行
```
browse stop
```
后重试，僵尸进程可执行
```
pkill -f "browse.*daemon"
```
开发服务无响应：执行
```
curl http://localhost:<port>
```
检查，告知用户启动服务
带
await
的
browse eval
执行失败：改用
```
.then()
```
—
```
browse eval
```
不支持顶层await
未找到元素ref：重新执行
```
browse snapshot
```
— 页面更新后ref会变化

空白快照：快照前执行

browse wait load

或

browse wait selector ".expected"

SPA深层链接404：先导航到
```
/
```
，再通过点击跳转
远程鉴权失败：使用
```
--context <id>
```
重新运行cookie同步，尝试添加
```
--stealth
```
参数
并行会话冲突：确保每个
```
browse
```
命令都使用
```
BROWSE_SESSION=<name>
```
，未指定的命令会发送到默认会话

会话无法停止：执行

BROWSE_SESSION=<name> browse stop

，僵尸进程可执行

pkill -f "browse.*<name>.*daemon"