web-testing

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Web Testing Protocol (L2)

Web测试协议(L2)

Exploratory, AI-driven validation of dashboard UI changes — not regression testing. Regression coverage is L1's job (smoke + e2e suites under the dashboard's test directories). L2 catches things L1 misses: layout bugs, mobile regressions, interaction flows that only fail in a real browser.
Adopting in another repo: the procedure (read diff → map to routes → run agent-browser at two viewports → screenshot → verdict) is repo-agnostic. The route table, test paths, and verdict schema below are examples from
onsager-ai/onsager
. Fork the skill and replace those concrete bits for your own dashboard.
针对仪表盘UI变更的探索式AI驱动验证——并非 回归测试。回归测试是L1的职责(仪表盘测试目录下的冒烟测试+端到端测试套件)。L2能捕捉L1遗漏的问题: 布局bug、移动端回归、仅在真实浏览器中才会失败的交互流程。
在其他仓库中采用: 流程(读取差异→映射到路由 →在两个视口运行agent-browser→截图→判定)是 仓库无关的。下方的路由表、测试路径和判定 schema 是来自
onsager-ai/onsager
示例。复刻该技能并将这些具体内容替换为你自己的仪表盘内容即可。

When to invoke

调用时机

  • A PR touches
    apps/dashboard/**
  • L1 e2e fails and you need to know if it's a real regression, flaky, or env
  • Someone says "validate the UI" / "dogfood this change"
  • PR修改了
    apps/dashboard/**
    路径下的内容
  • L1端到端测试失败,你需要确认这是真实回归、不稳定测试还是环境问题
  • 有人提出“验证UI”/“体验该变更”的需求

The app under test

被测应用

The CI pipeline builds
crates/stiglab/deploy/Dockerfile
— a single image bundling the Rust backends (
stiglab
+
synodic
) and the prebuilt dashboard SPA. It listens on
http://localhost:3000
.
Primary routes:
RoutePageHeading
/
Factory overview
Factory
/sessions
Sessions list
Sessions
/sessions/:id
Session detail— (dynamic)
/nodes
Nodes list
Nodes
/artifacts
Artifacts list
Artifacts
/spine
Event spine viewer— (dynamic)
/governance
Governance
Governance
/settings
Settings + credentials
Settings
CI流水线构建
crates/stiglab/deploy/Dockerfile
——这是一个打包了Rust后端(
stiglab
+
synodic
)和预构建仪表盘SPA的镜像。它监听
http://localhost:3000
主要路由:
Route页面名称页面标题
/
工厂概览
Factory
/sessions
会话列表
Sessions
/sessions/:id
会话详情—(动态生成)
/nodes
节点列表
Nodes
/artifacts
制品列表
Artifacts
/spine
事件流查看器—(动态生成)
/governance
治理模块
Governance
/settings
设置与凭据
Settings

Viewports (always test both)

视口(需同时测试两种)

  • Desktop:
    agent-browser set viewport 1280 720
  • Mobile:
    agent-browser set viewport 375 812
Mobile matters — the dashboard ships with a responsive layout (see the
md:
breakpoints throughout). Horizontal overflow and hidden nav are the top-two regression classes.
  • 桌面端:
    agent-browser set viewport 1280 720
  • 移动端:
    agent-browser set viewport 375 812
移动端至关重要——仪表盘采用响应式布局(可查看各处的
md:
断点)。横向溢出和导航栏遮挡内容是最常见的两类回归问题。

Procedure

流程步骤

  1. Read the diff (
    git diff $DIFF_RANGE
    ) — you will receive
    DIFF_RANGE
    as an env var from CI.
  2. Map changes to routes. A change in
    src/pages/SessionsPage.tsx
    /sessions
    . A change in
    src/components/layout/**
    ⇒ every route.
  3. For each affected route, at each viewport:
    • agent-browser open http://localhost:3000<route>
    • Snapshot the page; verify the heading + key elements render.
    • Actively exercise interactive elements — don't just check markup:
      • Click buttons, submit forms, open dialogs.
      • Verify the result — did the UI state change, did the dialog close, did new data appear? Presence of markup is not proof of working.
    • Check for layout bugs. On mobile especially: horizontal scroll is a failure; a nav that blocks content is a failure.
    • agent-browser screenshot --screenshot-dir /tmp/l2-screenshots
      then rename the output to
      {route-slug}-{desktop|mobile}.png
      .
  4. Crystallize findings. When you validate new behavior or catch a bug whose fix you can describe, write a deterministic L1 test under
    apps/dashboard/tests/smoke/
    (component-level) or
    apps/dashboard/tests/e2e/
    (browser-level). This is how L2 discoveries become permanent L1 coverage.
  5. Emit the verdict. Return JSON matching
    tests/l2-verdict-schema.json
    :
    • PASS
      if every affected route passes at both viewports.
    • FAIL
      if any route fails; include the specific failure in
      viewports[].issues[]
      .
  1. 读取差异
    git diff $DIFF_RANGE
    )——你会从CI环境变量中获取
    DIFF_RANGE
    参数。
  2. 将变更映射到路由。若修改了
    src/pages/SessionsPage.tsx
    ⇒ 对应路由
    /sessions
    。若修改了
    src/components/layout/**
    ⇒ 对应所有路由。
  3. 针对每个受影响的路由,在每个视口执行以下操作:
    • agent-browser open http://localhost:3000<route>
    • 对页面进行快照;验证标题和关键元素是否渲染。
    • 主动测试交互元素——不要仅检查标记:
      • 点击按钮、提交表单、打开对话框。
      • 验证结果——UI状态是否变化、对话框是否关闭、是否出现新数据?标记存在不代表功能正常。
    • 检查布局bug。尤其在移动端:横向滚动属于失败情况;导航栏遮挡内容也属于失败情况。
    • 执行
      agent-browser screenshot --screenshot-dir /tmp/l2-screenshots
      ,然后将输出重命名为
      {route-slug}-{desktop|mobile}.png
  4. 固化测试成果。当你验证了新行为或发现可描述修复方案的bug时,在
    apps/dashboard/tests/smoke/
    (组件级)或
    apps/dashboard/tests/e2e/
    (浏览器级)下编写确定性的L1测试。这是将L2发现转化为永久L1测试覆盖的方式。
  5. 输出判定结果。返回符合
    tests/l2-verdict-schema.json
    格式的JSON:
    • 若所有受影响路由在两个视口均通过,则返回
      PASS
    • 若任意路由失败,则返回
      FAIL
      ;在
      viewports[].issues[]
      中包含具体失败信息。

Triage mode

排查模式

When invoked after an L1 e2e failure, your job is different:
  1. Read the failing test file(s) under
    apps/dashboard/tests/e2e/
    .
  2. Reproduce against
    http://localhost:3000
    with agent-browser.
  3. For each failure, classify: regression (real bug), flaky (intermittent / timing), or environment (test harness or CI issue).
  4. Return JSON matching
    tests/l2-triage-schema.json
    with root cause and suggested fix.
当在L1端到端测试失败后被调用时,你的职责有所不同:
  1. 阅读
    apps/dashboard/tests/e2e/
    下的失败测试文件。
  2. 使用agent-browser在
    http://localhost:3000
    上复现问题。
  3. 对每个失败案例进行分类:回归(真实bug)、不稳定(间歇性/时序问题)或环境(测试工具或CI问题)。
  4. 返回符合
    tests/l2-triage-schema.json
    格式的JSON,包含根本原因和修复建议。

Guardrails

约束规则

  • Scope to the diff. Don't re-test the whole app on a one-line change.
  • Screenshots are required evidence — no screenshot, the viewport didn't run.
  • Don't invent routes. If a new route was added in the diff, use that one; otherwise stick to the table above.
  • Keep it cheap. One browser session per viewport is plenty.
  • 聚焦差异范围。不要因一行代码变更而重新测试整个应用。
  • 必须提供截图作为证据——未提供截图则视为该视口未执行测试。
  • 不要新增路由。若差异中添加了新路由,则使用该路由;否则遵循上方的路由表。
  • 控制测试成本。每个视口只需一个浏览器会话即可。