web-testing

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Web Testing Protocol (L2)

Web测试协议（L2）

Exploratory, AI-driven validation of dashboard UI changes — not regression testing. Regression coverage is L1's job (smoke + e2e suites under the dashboard's test directories). L2 catches things L1 misses: layout bugs, mobile regressions, interaction flows that only fail in a real browser.

Adopting in another repo: the procedure (read diff → map to routes → run agent-browser at two viewports → screenshot → verdict) is repo-agnostic. The route table, test paths, and verdict schema below are examples from
onsager-ai/onsager
. Fork the skill and replace those concrete bits for your own dashboard.

针对仪表盘UI变更的探索式AI驱动验证——并非回归测试。回归测试是L1的职责（仪表盘测试目录下的冒烟测试+端到端测试套件）。L2能捕捉L1遗漏的问题：布局bug、移动端回归、仅在真实浏览器中才会失败的交互流程。

在其他仓库中采用： 流程（读取差异→映射到路由 →在两个视口运行agent-browser→截图→判定）是仓库无关的。下方的路由表、测试路径和判定 schema 是来自
onsager-ai/onsager
的示例。复刻该技能并将这些具体内容替换为你自己的仪表盘内容即可。

When to invoke

调用时机

A PR touches
```
apps/dashboard/**
```
L1 e2e fails and you need to know if it's a real regression, flaky, or env
Someone says "validate the UI" / "dogfood this change"

PR修改了
```
apps/dashboard/**
```
路径下的内容
L1端到端测试失败，你需要确认这是真实回归、不稳定测试还是环境问题
有人提出“验证UI”/“体验该变更”的需求

The app under test

被测应用

The CI pipeline builds

crates/stiglab/deploy/Dockerfile

— a single image bundling the Rust backends (

stiglab

synodic

) and the prebuilt dashboard SPA. It listens on

http://localhost:3000

Primary routes:

Route	Page	Heading
`/`	Factory overview	`Factory`
`/sessions`	Sessions list	`Sessions`
`/sessions/:id`	Session detail	— (dynamic)
`/nodes`	Nodes list	`Nodes`
`/artifacts`	Artifacts list	`Artifacts`
`/spine`	Event spine viewer	— (dynamic)
`/governance`	Governance	`Governance`
`/settings`	Settings + credentials	`Settings`

CI流水线构建

crates/stiglab/deploy/Dockerfile

——这是一个打包了Rust后端（

stiglab

synodic

）和预构建仪表盘SPA的镜像。它监听

http://localhost:3000

。

主要路由：

Route	页面名称	页面标题
`/`	工厂概览	`Factory`
`/sessions`	会话列表	`Sessions`
`/sessions/:id`	会话详情	—（动态生成）
`/nodes`	节点列表	`Nodes`
`/artifacts`	制品列表	`Artifacts`
`/spine`	事件流查看器	—（动态生成）
`/governance`	治理模块	`Governance`
`/settings`	设置与凭据	`Settings`

Viewports (always test both)

视口（需同时测试两种）

Desktop:
```
agent-browser set viewport 1280 720
```
Mobile:
```
agent-browser set viewport 375 812
```

Mobile matters — the dashboard ships with a responsive layout (see the

md:

breakpoints throughout). Horizontal overflow and hidden nav are the top-two regression classes.

桌面端：
```
agent-browser set viewport 1280 720
```
移动端：
```
agent-browser set viewport 375 812
```

移动端至关重要——仪表盘采用响应式布局（可查看各处的

md:

断点）。横向溢出和导航栏遮挡内容是最常见的两类回归问题。

Procedure

流程步骤

Read the diff (
```
git diff $DIFF_RANGE
```
) — you will receive
```
DIFF_RANGE
```
as an env var from CI.
Map changes to routes. A change in
```
src/pages/SessionsPage.tsx
```
⇒
```
/sessions
```
. A change in
```
src/components/layout/**
```
⇒ every route.
For each affected route, at each viewport:
- ```
agent-browser open http://localhost:3000<route>
```
- Snapshot the page; verify the heading + key elements render.
- Actively exercise interactive elements — don't just check markup:
  - Click buttons, submit forms, open dialogs.
  - Verify the result — did the UI state change, did the dialog close, did new data appear? Presence of markup is not proof of working.
- Check for layout bugs. On mobile especially: horizontal scroll is a failure; a nav that blocks content is a failure.
- ```
agent-browser screenshot --screenshot-dir /tmp/l2-screenshots
```
  then rename the output to
```
{route-slug}-{desktop|mobile}.png
```
  .
Crystallize findings. When you validate new behavior or catch a bug whose fix you can describe, write a deterministic L1 test under
```
apps/dashboard/tests/smoke/
```
(component-level) or
```
apps/dashboard/tests/e2e/
```
(browser-level). This is how L2 discoveries become permanent L1 coverage.
Emit the verdict. Return JSON matching
```
tests/l2-verdict-schema.json
```
:
- ```
PASS
```
  if every affected route passes at both viewports.
- ```
FAIL
```
  if any route fails; include the specific failure in
```
viewports[].issues[]
```
  .

读取差异（
```
git diff $DIFF_RANGE
```
）——你会从CI环境变量中获取
```
DIFF_RANGE
```
参数。
将变更映射到路由。若修改了
```
src/pages/SessionsPage.tsx
```
⇒ 对应路由
```
/sessions
```
。若修改了
```
src/components/layout/**
```
⇒ 对应所有路由。
针对每个受影响的路由，在每个视口执行以下操作：
- ```
agent-browser open http://localhost:3000<route>
```
- 对页面进行快照；验证标题和关键元素是否渲染。
- 主动测试交互元素——不要仅检查标记：
  - 点击按钮、提交表单、打开对话框。
  - 验证结果——UI状态是否变化、对话框是否关闭、是否出现新数据？标记存在不代表功能正常。
- 检查布局bug。尤其在移动端：横向滚动属于失败情况；导航栏遮挡内容也属于失败情况。
- 执行
```
agent-browser screenshot --screenshot-dir /tmp/l2-screenshots
```
  ，然后将输出重命名为
```
{route-slug}-{desktop|mobile}.png
```
  。
固化测试成果。当你验证了新行为或发现可描述修复方案的bug时，在
```
apps/dashboard/tests/smoke/
```
（组件级）或
```
apps/dashboard/tests/e2e/
```
（浏览器级）下编写确定性的L1测试。这是将L2发现转化为永久L1测试覆盖的方式。
输出判定结果。返回符合
```
tests/l2-verdict-schema.json
```
格式的JSON：
- 若所有受影响路由在两个视口均通过，则返回
```
PASS
```
  。
- 若任意路由失败，则返回
```
FAIL
```
  ；在
```
viewports[].issues[]
```
  中包含具体失败信息。

Triage mode

排查模式

When invoked after an L1 e2e failure, your job is different:

Read the failing test file(s) under
```
apps/dashboard/tests/e2e/
```
.
Reproduce against
```
http://localhost:3000
```
with agent-browser.
For each failure, classify: regression (real bug), flaky (intermittent / timing), or environment (test harness or CI issue).
Return JSON matching
```
tests/l2-triage-schema.json
```
with root cause and suggested fix.

当在L1端到端测试失败后被调用时，你的职责有所不同：

阅读
```
apps/dashboard/tests/e2e/
```
下的失败测试文件。
使用agent-browser在
```
http://localhost:3000
```
上复现问题。
对每个失败案例进行分类：回归（真实bug）、不稳定（间歇性/时序问题）或环境（测试工具或CI问题）。
返回符合
```
tests/l2-triage-schema.json
```
格式的JSON，包含根本原因和修复建议。

Guardrails

约束规则

Scope to the diff. Don't re-test the whole app on a one-line change.
Screenshots are required evidence — no screenshot, the viewport didn't run.
Don't invent routes. If a new route was added in the diff, use that one; otherwise stick to the table above.
Keep it cheap. One browser session per viewport is plenty.

聚焦差异范围。不要因一行代码变更而重新测试整个应用。
必须提供截图作为证据——未提供截图则视为该视口未执行测试。
不要新增路由。若差异中添加了新路由，则使用该路由；否则遵循上方的路由表。
控制测试成本。每个视口只需一个浏览器会话即可。