web-testing
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseWeb Testing Protocol (L2)
Web测试协议(L2)
Exploratory, AI-driven validation of dashboard UI changes — not
regression testing. Regression coverage is L1's job (smoke + e2e suites
under the dashboard's test directories). L2 catches things L1 misses:
layout bugs, mobile regressions, interaction flows that only fail in a
real browser.
Adopting in another repo: the procedure (read diff → map to routes → run agent-browser at two viewports → screenshot → verdict) is repo-agnostic. The route table, test paths, and verdict schema below are examples from. Fork the skill and replace those concrete bits for your own dashboard.onsager-ai/onsager
针对仪表盘UI变更的探索式AI驱动验证——并非
回归测试。回归测试是L1的职责(仪表盘测试目录下的冒烟测试+端到端测试套件)。L2能捕捉L1遗漏的问题:
布局bug、移动端回归、仅在真实浏览器中才会失败的交互流程。
在其他仓库中采用: 流程(读取差异→映射到路由 →在两个视口运行agent-browser→截图→判定)是 仓库无关的。下方的路由表、测试路径和判定 schema 是来自的示例。复刻该技能并将这些具体内容替换为你自己的仪表盘内容即可。onsager-ai/onsager
When to invoke
调用时机
- A PR touches
apps/dashboard/** - L1 e2e fails and you need to know if it's a real regression, flaky, or env
- Someone says "validate the UI" / "dogfood this change"
- PR修改了路径下的内容
apps/dashboard/** - L1端到端测试失败,你需要确认这是真实回归、不稳定测试还是环境问题
- 有人提出“验证UI”/“体验该变更”的需求
The app under test
被测应用
The CI pipeline builds — a single image
bundling the Rust backends ( + ) and the prebuilt dashboard
SPA. It listens on .
crates/stiglab/deploy/Dockerfilestiglabsynodichttp://localhost:3000Primary routes:
| Route | Page | Heading |
|---|---|---|
| Factory overview | |
| Sessions list | |
| Session detail | — (dynamic) |
| Nodes list | |
| Artifacts list | |
| Event spine viewer | — (dynamic) |
| Governance | |
| Settings + credentials | |
CI流水线构建——这是一个打包了Rust后端( + )和预构建仪表盘SPA的镜像。它监听。
crates/stiglab/deploy/Dockerfilestiglabsynodichttp://localhost:3000主要路由:
| Route | 页面名称 | 页面标题 |
|---|---|---|
| 工厂概览 | |
| 会话列表 | |
| 会话详情 | —(动态生成) |
| 节点列表 | |
| 制品列表 | |
| 事件流查看器 | —(动态生成) |
| 治理模块 | |
| 设置与凭据 | |
Viewports (always test both)
视口(需同时测试两种)
- Desktop:
agent-browser set viewport 1280 720 - Mobile:
agent-browser set viewport 375 812
Mobile matters — the dashboard ships with a responsive layout (see the
breakpoints throughout). Horizontal overflow and hidden nav are the
top-two regression classes.
md:- 桌面端:
agent-browser set viewport 1280 720 - 移动端:
agent-browser set viewport 375 812
移动端至关重要——仪表盘采用响应式布局(可查看各处的断点)。横向溢出和导航栏遮挡内容是最常见的两类回归问题。
md:Procedure
流程步骤
- Read the diff () — you will receive
git diff $DIFF_RANGEas an env var from CI.DIFF_RANGE - Map changes to routes. A change in ⇒
src/pages/SessionsPage.tsx. A change in/sessions⇒ every route.src/components/layout/** - For each affected route, at each viewport:
agent-browser open http://localhost:3000<route>- Snapshot the page; verify the heading + key elements render.
- Actively exercise interactive elements — don't just check markup:
- Click buttons, submit forms, open dialogs.
- Verify the result — did the UI state change, did the dialog close, did new data appear? Presence of markup is not proof of working.
- Check for layout bugs. On mobile especially: horizontal scroll is a failure; a nav that blocks content is a failure.
- then rename the output to
agent-browser screenshot --screenshot-dir /tmp/l2-screenshots.{route-slug}-{desktop|mobile}.png
- Crystallize findings. When you validate new behavior or catch a bug
whose fix you can describe, write a deterministic L1 test under
(component-level) or
apps/dashboard/tests/smoke/(browser-level). This is how L2 discoveries become permanent L1 coverage.apps/dashboard/tests/e2e/ - Emit the verdict. Return JSON matching :
tests/l2-verdict-schema.json- if every affected route passes at both viewports.
PASS - if any route fails; include the specific failure in
FAIL.viewports[].issues[]
- 读取差异()——你会从CI环境变量中获取
git diff $DIFF_RANGE参数。DIFF_RANGE - 将变更映射到路由。若修改了⇒ 对应路由
src/pages/SessionsPage.tsx。若修改了/sessions⇒ 对应所有路由。src/components/layout/** - 针对每个受影响的路由,在每个视口执行以下操作:
agent-browser open http://localhost:3000<route>- 对页面进行快照;验证标题和关键元素是否渲染。
- 主动测试交互元素——不要仅检查标记:
- 点击按钮、提交表单、打开对话框。
- 验证结果——UI状态是否变化、对话框是否关闭、是否出现新数据?标记存在不代表功能正常。
- 检查布局bug。尤其在移动端:横向滚动属于失败情况;导航栏遮挡内容也属于失败情况。
- 执行,然后将输出重命名为
agent-browser screenshot --screenshot-dir /tmp/l2-screenshots。{route-slug}-{desktop|mobile}.png
- 固化测试成果。当你验证了新行为或发现可描述修复方案的bug时,在(组件级)或
apps/dashboard/tests/smoke/(浏览器级)下编写确定性的L1测试。这是将L2发现转化为永久L1测试覆盖的方式。apps/dashboard/tests/e2e/ - 输出判定结果。返回符合格式的JSON:
tests/l2-verdict-schema.json- 若所有受影响路由在两个视口均通过,则返回。
PASS - 若任意路由失败,则返回;在
FAIL中包含具体失败信息。viewports[].issues[]
- 若所有受影响路由在两个视口均通过,则返回
Triage mode
排查模式
When invoked after an L1 e2e failure, your job is different:
- Read the failing test file(s) under .
apps/dashboard/tests/e2e/ - Reproduce against with agent-browser.
http://localhost:3000 - For each failure, classify: regression (real bug), flaky (intermittent / timing), or environment (test harness or CI issue).
- Return JSON matching with root cause and suggested fix.
tests/l2-triage-schema.json
当在L1端到端测试失败后被调用时,你的职责有所不同:
- 阅读下的失败测试文件。
apps/dashboard/tests/e2e/ - 使用agent-browser在上复现问题。
http://localhost:3000 - 对每个失败案例进行分类:回归(真实bug)、不稳定(间歇性/时序问题)或环境(测试工具或CI问题)。
- 返回符合格式的JSON,包含根本原因和修复建议。
tests/l2-triage-schema.json
Guardrails
约束规则
- Scope to the diff. Don't re-test the whole app on a one-line change.
- Screenshots are required evidence — no screenshot, the viewport didn't run.
- Don't invent routes. If a new route was added in the diff, use that one; otherwise stick to the table above.
- Keep it cheap. One browser session per viewport is plenty.
- 聚焦差异范围。不要因一行代码变更而重新测试整个应用。
- 必须提供截图作为证据——未提供截图则视为该视口未执行测试。
- 不要新增路由。若差异中添加了新路由,则使用该路由;否则遵循上方的路由表。
- 控制测试成本。每个视口只需一个浏览器会话即可。