playwright-in-sandbox
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePlaywright In Sandbox
Playwright沙箱环境使用指南
This is the primary Playwright skill for sandbox browser verification and deterministic end-to-end coverage.
Use it in two explicit modes:
- Interactive Sandbox Mode as final browser verification after a task's implementation is in a plausibly correct state.
- Deterministic E2E Mode before finishing a task when the changed flow should be protected by durable regression coverage.
This skill is intentionally generic. It should work for:
- task-level screenshot-driven verification after an agent has implemented UI work and needs browser proof
- formal Playwright E2E authoring or rewrite work in downstream application repos
- task flows where a final browser verification should happen before the task is considered complete
Do not use this skill for backend-only work, one-off page operations that do not justify browser automation, or broad failure storms before you understand the workflow inventory and root causes.
这是用于沙箱浏览器验证和确定性端到端测试覆盖的核心Playwright技能。
它有两种明确的使用模式:
- 交互式沙箱模式:当任务实现处于看似正确的状态后,作为最终的浏览器验证环节。
- 确定性端到端模式:在任务完成前使用,用于为变更后的流程提供持久的回归测试覆盖保护。
本技能设计得具备通用性,适用于以下场景:
- 当Agent完成UI工作后,需要浏览器验证的任务级截图驱动验证
- 下游应用仓库中的正式Playwright端到端测试编写或重写工作
- 在任务被视为完成前需要进行最终浏览器验证的任务流程
请勿将本技能用于纯后端工作、无需浏览器自动化的一次性页面操作,或在未了解工作流清单及根本原因之前处理大范围故障问题。
Core Workflow
核心工作流程
- Write a brief QA inventory before touching the browser.
- Decide the mode first: Interactive Sandbox Mode or Deterministic E2E Mode.
- Start or confirm the app in a persistent session.
- Implement the change and get the functionality into a plausibly correct state before using Playwright as signoff.
- Run the changed flow interactively and inspect screenshots as evidence, not just DOM state.
- Record the contracts you learned:
- route-ready signals
- modal open and close signals
- action-enabled conditions
- save-complete signals
- durable or semantic selectors
data-testid
- If the flow is bug-fix, workflow, regression-critical, or meaningfully changed, graduate it into deterministic E2E coverage.
- If interactive proof shows the product behavior is wrong, fix the product code or the data contract. Do not make a bad behavior look green by weakening the test.
- Before finishing the task, ensure the changed flow has both:
- successful interactive proof
- durable E2E coverage or an explicit rationale why it stays interactive-only
- 在操作浏览器前编写简短的QA清单。
- 先确定使用模式:交互式沙箱模式或确定性端到端模式。
- 启动或确认应用处于持久会话状态。
- 在使用Playwright进行签核前,先完成变更实现并确保功能处于看似正确的状态。
- 交互式运行变更后的流程,并将截图作为验证依据,而不仅仅依赖DOM状态。
- 记录你所掌握的契约:
- 路由就绪信号
- 模态框打开和关闭信号
- 操作启用条件
- 保存完成信号
- 持久化的或语义选择器
data-testid
- 如果该流程是 bug 修复、关键工作流、易回归流程,或有重大变更,则将其升级为确定性端到端测试覆盖。
- 如果交互式验证显示产品行为存在问题,修复产品代码或数据契约。不要通过弱化测试来掩盖不良行为。
- 在完成任务前,确保变更后的流程同时具备:
- 成功的交互式验证
- 持久化的端到端测试覆盖,或明确说明仅保留交互式验证的理由
Common Rules
通用规则
These rules apply to both Interactive Sandbox Mode and Deterministic E2E Mode.
- Interactive verification is a post-change signoff step. Do not treat it as random mid-task poking while the implementation is still half-built.
- If the browser proves the product behavior is wrong, fix the functionality or the underlying data contract. Do not invent clever ways to make the test green around a bug.
- Use the repo's canonical E2E database contract strictly. If the repo standard is , use
e2e.db. Ife2e.dbis missing and the repo expects one, create or provisione2e.dband keep using that contract. Do not silently fall back to the normal application database.e2e.db - Prefer querying the canonical E2E database or seeded business data to derive expected values, statuses, assignments, and aggregates. When the repo uses , it is acceptable to inspect the DB directly to confirm the real expected value before asserting the UI.
better-sqlite3 - Prefer selectors in this order:
- explicit ,
id,data-*, or other owned semantic contractsdata-testid - accessible role plus stable accessible name
- label/control association
- stable URL, pathname, or query contract
- text-only selectors only when the text itself is the product contract
- CSS, XPath, or DOM-order selectors only for deliberate structure checks
- explicit
- Remove stale screenshots, traces, and temporary artifacts from failed or superseded runs before signoff.
- If the repo has a maintained full-suite run, nightly QA run, or automated health check, keep visibility on whether it actually ran and whether it stayed green. Targeted checks do not replace suite health forever.
- Some migrated or one-off client applications may temporarily need broader migration-verification coverage than a typical greenfield app. That is allowed, but the quality bar stays the same: deterministic selectors, owned data, real user contracts, and no fake greens.
这些规则适用于交互式沙箱模式和确定性端到端模式。
- 交互式验证是变更后的签核步骤。不要在实现还未完成时将其当作随机的中间任务测试。
- 如果浏览器验证显示产品行为错误,修复功能或底层数据契约。不要为了让测试通过而想出各种投机取巧的办法。
- 严格遵循仓库的标准端到端数据库契约。如果仓库标准是,就使用
e2e.db。如果e2e.db缺失且仓库要求必须有该文件,则创建或配置e2e.db并持续使用该契约。不要默认回退到普通应用数据库。e2e.db - 优先查询标准端到端数据库或预填充的业务数据,以获取预期值、状态、分配信息和聚合数据。当仓库使用时,可以直接检查数据库以确认真实的预期值,再对UI进行断言。
better-sqlite3 - 优先按以下顺序使用选择器:
- 明确的、
id、data-*或其他自有语义契约data-testid - 可访问角色加上稳定的可访问名称
- 标签/控件关联
- 稳定的URL、路径名或查询契约
- 仅当文本本身是产品契约时使用纯文本选择器
- 仅在故意进行结构检查时使用CSS、XPath或DOM顺序选择器
- 明确的
- 在签核前,删除失败或已过时运行产生的陈旧截图、跟踪记录和临时产物。
- 如果仓库有维护的全套件运行、夜间QA运行或自动化健康检查,请持续关注其是否实际运行以及是否保持正常。针对性检查不能永久替代全套件健康状况。
- 一些迁移或一次性客户端应用可能暂时需要比典型新项目更广泛的迁移验证覆盖。这是允许的,但质量标准保持不变:确定性选择器、自有数据、真实用户契约,以及不掩盖问题的测试结果。
Mode Selection
模式选择
Use Interactive Sandbox Mode when
选择交互式沙箱模式的场景
- the implementation is already in a plausibly correct state
- you need final browser proof that the changed flow really works for a user
- you need screenshot evidence to judge whether the UI is actually correct
- you need to learn or confirm readiness gates, modal behavior, or durable selectors before writing or updating automated coverage
- 实现已处于看似正确的状态
- 需要最终的浏览器验证,确认变更后的流程确实能为用户正常工作
- 需要截图证据来判断UI是否真正正确
- 在编写或更新自动化测试覆盖前,需要了解或确认就绪条件、模态框行为或持久化选择器
Use Deterministic E2E Mode when
选择确定性端到端模式的场景
- the change fixes a bug
- the task creates or materially changes a user workflow
- the flow is business-critical or likely to regress
- legacy Playwright coverage is being rewritten, consolidated, or retired
- the task should not be considered complete without regression protection
- 变更用于修复bug
- 任务创建或实质性变更了用户工作流
- 该流程是业务关键流程或容易出现回归
- 正在重写、合并或淘汰旧版Playwright测试覆盖
- 任务必须具备回归保护才能视为完成
Stay in Interactive Mode only when
仅保留交互式模式的场景
- the change is exploratory or temporary
- the flow is not durable enough yet to encode as regression coverage
- the task does not meaningfully change a maintained workflow
- a migrated or one-off app needs a temporary verification pass that is not yet stable enough to convert into durable E2E coverage
If you choose not to graduate to committed E2E coverage, be explicit about why.
- 变更是探索性的或临时的
- 该流程还不够稳定,无法编码为回归测试覆盖
- 任务未对维护中的工作流产生实质性变更
- 迁移或一次性应用需要临时验证,但还不稳定到可以转换为持久化端到端测试覆盖
如果你选择不升级到正式的端到端测试覆盖,请明确说明原因。
Shared Environment Contract
共享环境契约
- Prefer over
127.0.0.1unless the repo defines something else explicitly.localhost - Use the repo's explicit server contract first. If the repo does not define one, is the common sandbox default.
4444 - In sandbox environments, Playwright browsers may live under ; do not assume the default cache path.
/ms-playwright - In sandbox environments, launch Chromium explicitly in headless mode: .
chromium.launch({ headless: true }) - Confirm the Playwright browser path when there is any doubt about the runtime payload:
bash
echo "$PLAYWRIGHT_BROWSERS_PATH"
ls -al /ms-playwright- Before , verify the target port is actually listening and the app responds.
page.goto(...) - For standard runs, use the repo's canonical E2E database contract. If the repo standard is , always use
e2e.db.e2e.db - If the repo expects and it is missing, create or provision
e2e.dbbefore running tests.e2e.db - Only use alternate DB names or paths when the repo explicitly supports isolated validation lanes and you are intentionally isolating worker runs.
- Keep interactive artifacts separate from committed regression assets.
- scratch scripts and screenshots belong in temp or dedicated artifact folders
- committed regression coverage belongs in or the repo's formal test location
tests/e2e/
- Remove stale screenshots, traces, and temporary artifacts from failed or superseded runs before signoff.
- When running multiple rewrite or validation lanes in parallel, isolate runtime resources:
- port
- database or seed state
- output folder
- screenshots and traces
- 优先使用而非
127.0.0.1,除非仓库有明确的其他定义。localhost - 首先使用仓库明确的服务器契约。如果仓库未定义,是沙箱环境的常见默认端口。
4444 - 在沙箱环境中,Playwright浏览器可能位于路径下;不要假设默认缓存路径。
/ms-playwright - 在沙箱环境中,以无头模式显式启动Chromium:。
chromium.launch({ headless: true }) - 当对运行时负载有任何疑问时,确认Playwright浏览器路径:
bash
echo "$PLAYWRIGHT_BROWSERS_PATH"
ls -al /ms-playwright- 在调用前,验证目标端口是否正在监听且应用能响应。
page.goto(...) - 对于标准运行,使用仓库的标准端到端数据库契约。如果仓库标准是,请始终使用
e2e.db。e2e.db - 如果仓库需要但该文件缺失,请在运行测试前创建或配置
e2e.db。e2e.db - 仅当仓库明确支持隔离验证通道且你有意隔离工作进程运行时,才使用替代的数据库名称或路径。
- 将交互式产物与已提交的回归资产分开存放。
- 临时脚本和截图应放在临时或专用产物文件夹中
- 已提交的回归测试覆盖应放在或仓库的正式测试位置
tests/e2e/
- 在签核前,删除失败或已过时运行产生的陈旧截图、跟踪记录和临时产物。
- 当并行运行多个重写或验证通道时,隔离运行时资源:
- 端口
- 数据库或预填充状态
- 输出文件夹
- 截图和跟踪记录
Interactive Sandbox Mode
交互式沙箱模式
Use this mode to prove a changed user flow works right now after the implementation is done enough to verify.
Interactive mode is not permission to poke until something happens to pass once. Use it after implementing the change and after you believe the functionality should work, then use the browser as final visual and functional verification of the real user flow.
当实现足够成熟可以验证后,使用此模式来证明变更后的用户流当前能正常工作。
交互式模式不是允许你随意测试直到某次通过。在完成变更实现并确认功能应该正常工作后,再使用浏览器对真实用户流进行最终的视觉和功能验证。
QA Inventory
QA清单
Build the inventory from three sources:
- the user's requested requirements
- the user-visible behavior you implemented or changed
- the claims you expect to make in the final response
Anything that appears in any of those three sources must map to at least one QA check before signoff.
List:
- the user-visible claims you intend to sign off on
- every meaningful control, mode switch, or implemented interactive behavior
- the state changes or view changes each control can cause
- at least two exploratory or off-happy-path probes
从三个来源构建清单:
- 用户提出的需求
- 你实现或变更的用户可见行为
- 你期望在最终响应中做出的声明
上述三个来源中出现的任何内容,在签核前都必须对应至少一项QA检查。
列出:
- 你打算签核的用户可见声明
- 每个有意义的控件、模式切换或已实现的交互行为
- 每个控件可能导致的状态变化或视图变化
- 至少两项探索性或非 happy path 的测试
Desktop Verification Script
桌面验证脚本
Set to the app you are debugging. Prefer over .
TARGET_URL127.0.0.1localhostjavascript
import { chromium } from "playwright";
const TARGET_URL = "http://127.0.0.1:4444";
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext({
viewport: { width: 1600, height: 900 },
});
const page = await context.newPage();
try {
await page.goto(TARGET_URL, { waitUntil: "domcontentloaded" });
console.log("Loaded:", await page.title());
// Add the task-specific interactions and assertions here.
await page.screenshot({ path: "playwright-desktop.png", type: "png" });
} finally {
await context.close().catch(() => {});
await browser.close().catch(() => {});
}将设置为你正在调试的应用。优先使用而非。
TARGET_URL127.0.0.1localhostjavascript
import { chromium } from "playwright";
const TARGET_URL = "http://127.0.0.1:4444";
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext({
viewport: { width: 1600, height: 900 },
});
const page = await context.newPage();
try {
await page.goto(TARGET_URL, { waitUntil: "domcontentloaded" });
console.log("Loaded:", await page.title());
// 在此添加任务特定的交互和断言。
await page.screenshot({ path: "playwright-desktop.png", type: "png" });
} finally {
await context.close().catch(() => {});
await browser.close().catch(() => {});
}Mobile Verification Script
移动端验证脚本
Use a separate mobile script when the task affects responsive layout or touch behavior.
javascript
import { chromium } from "playwright";
const TARGET_URL = "http://127.0.0.1:4444";
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext({
viewport: { width: 390, height: 844 },
isMobile: true,
hasTouch: true,
});
const page = await context.newPage();
try {
await page.goto(TARGET_URL, { waitUntil: "domcontentloaded" });
console.log("Loaded mobile:", await page.title());
// Add the task-specific interactions and assertions here.
await page.screenshot({ path: "playwright-mobile.png", type: "png" });
} finally {
await context.close().catch(() => {});
await browser.close().catch(() => {});
}当任务影响响应式布局或触摸行为时,使用单独的移动端脚本。
javascript
import { chromium } from "playwright";
const TARGET_URL = "http://127.0.0.1:4444";
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext({
viewport: { width: 390, height: 844 },
isMobile: true,
hasTouch: true,
});
const page = await context.newPage();
try {
await page.goto(TARGET_URL, { waitUntil: "domcontentloaded" });
console.log("Loaded mobile:", await page.title());
// 在此添加任务特定的交互和断言。
await page.screenshot({ path: "playwright-mobile.png", type: "png" });
} finally {
await context.close().catch(() => {});
await browser.close().catch(() => {});
}Iteration Model
迭代模型
- Use one standalone Node.js verification script per focused flow.
- After code changes, rerun the verification script from a clean process instead of trying to preserve state across runs.
- Keep each script narrow: one changed flow, its main assertions, and its screenshot artifacts.
- If desktop and mobile both matter, run separate scripts or separate invocations.
- Screenshot review is part of the contract. Do not sign off from DOM assertions alone.
- 每个聚焦流程使用一个独立的Node.js验证脚本。
- 代码变更后,从干净的进程重新运行验证脚本,而不是尝试在多次运行间保留状态。
- 保持每个脚本的范围狭窄:一个变更后的流程、其主要断言和截图产物。
- 如果桌面端和移动端都重要,运行单独的脚本或单独调用。
- 截图审查是契约的一部分。不要仅通过DOM断言进行签核。
Interactive Checklists
交互式检查清单
Session Loop
会话循环
- write the QA inventory
- make the code change
- run the Playwright verification script for the current flow as final browser verification
- rerun functional QA with real user input
- rerun visual QA separately
- capture final artifacts only after the UI is in the state you are actually evaluating
- record the selectors and readiness gates you would trust in formal E2E
- 编写QA清单
- 进行代码变更
- 运行当前流程的Playwright验证脚本作为最终浏览器验证
- 使用真实用户输入重新运行功能QA
- 单独重新运行视觉QA
- 仅在UI处于你实际评估的状态时捕获最终产物
- 记录你在正式端到端测试中会信任的选择器和就绪条件
Functional QA
功能QA
- Use real user controls for signoff: keyboard, mouse, click, touch, or equivalent Playwright input APIs.
- Verify at least one end-to-end critical flow.
- Confirm the visible result of that flow, not just internal state.
- Work through the shared QA inventory rather than ad hoc spot checks.
- Cover every obvious visible control at least once before signoff, not only the happy path.
- After the scripted checks pass, do a short exploratory pass using normal input.
- may inspect or stage state, but it does not count as signoff input.
page.evaluate(...)
- 使用真实用户控件进行签核:键盘、鼠标、点击、触摸或等效的Playwright输入API。
- 验证至少一条端到端关键流程。
- 确认该流程的可见结果,而不仅仅是内部状态。
- 按照共享的QA清单进行检查,而非临时的抽查。
- 签核前至少覆盖每个明显可见的控件一次,而不仅仅是happy path。
- 脚本检查通过后,使用正常输入进行简短的探索性测试。
- 可用于检查或设置状态,但不能作为签核输入。
page.evaluate(...)
Visual QA
视觉QA
- Treat visual QA as separate from functional QA.
- Verify each visible claim explicitly in the state where it matters.
- Inspect the initial viewport before scrolling.
- Inspect the densest realistic state you can reach during QA.
- Look for clipping, overflow, distortion, weak contrast, broken layering, awkward motion, and stale overlays.
- If the UI only "works" because a hidden blocker was not noticed, the flow is not ready for signoff.
- 将视觉QA视为独立于功能QA的环节。
- 在相关状态下明确验证每个可见声明。
- 滚动前检查初始视口。
- 检查QA过程中能达到的最密集的真实状态。
- 查找裁剪、溢出、变形、对比度不足、分层错误、动画异常和陈旧覆盖层等问题。
- 如果UI仅在未注意到隐藏阻塞的情况下“正常工作”,则该流程尚未准备好进行签核。
Interactive Signoff
交互式签核
- the functional path passed with normal user input
- the visual pass covered the whole relevant interface
- the viewport-fit checks passed for the intended initial view
- the final screenshots match the claims being signed off on
- the exploratory pass is called out in the final response
- the durable selectors and readiness gates are written down for E2E authoring
- stale screenshots or temporary artifacts from failed iterations are removed
- 功能流程通过了正常用户输入测试
- 视觉检查覆盖了整个相关界面
- 针对预期初始视图的视口适配检查通过
- 最终截图与签核声明一致
- 最终响应中提及了探索性测试
- 记录了用于端到端测试编写的持久化选择器和就绪条件
- 删除了失败迭代产生的陈旧截图或临时产物
Deterministic E2E Mode
确定性端到端模式
Use this mode when you are authoring or rewriting real Playwright regression coverage.
The job is to protect real functionality. If the app behavior is wrong, fix the app. Do not preserve a bug by adjusting the test around it.
当你编写或重写真实的Playwright回归测试覆盖时,使用此模式。
工作目标是保护真实功能。如果应用行为错误,修复应用。不要通过调整测试来保留bug。
Workflow
工作流程
- Start from what interactive validation already proved.
- Translate the learned UI contracts into durable automated checks.
- Keep each spec focused on one workflow family or one coherent user journey.
- Make each test own or explicitly receive its setup.
- Prefer canonical seeded data or database-backed expectations for values, statuses, and aggregates instead of magic literals the test does not own.
- Validate the changed flow before deleting or consolidating legacy coverage.
- If rewriting a brittle suite, map every removed scenario to one of:
- retained
- consolidated with justification
- intentionally obsolete with rationale
- quarantined with explicit explanation
- 从交互式验证已证明的内容开始。
- 将学到的UI契约转换为持久化的自动化检查。
- 保持每个测试用例聚焦于一个工作流系列或一个连贯的用户旅程。
- 让每个测试拥有或明确接收其设置。
- 优先使用标准预填充数据或基于数据库的预期值、状态和聚合数据,而非测试不拥有的魔法字面量。
- 在删除或合并旧版测试覆盖前,验证变更后的流程。
- 如果重写脆弱的测试套件,将每个移除的场景映射到以下情况之一:
- 保留
- 合并并说明理由
- 明确标记为过时并说明原因
- 隔离并给出明确解释
Selector Hierarchy
选择器层级
Do not build durable tests on incidental text, generic combobox patterns, visual styling, or unstable overlay-sensitive DOM structure if a better contract exists.
Text-only selectors are a last resort unless the text itself is the true product contract.
如果存在更好的契约,不要将持久化测试建立在偶然文本、通用组合框模式、视觉样式或不稳定的覆盖层敏感DOM结构上。纯文本选择器是最后的选择,除非文本本身是真正的产品契约。
Waiting and Readiness Rules
等待和就绪规则
- Do not default to .
waitForLoadState("networkidle") - For navigation, wait on URL change plus a page-ready sentinel, or response contract plus a ready sentinel.
- For dialogs, wait on explicit open-state before interaction and close-state before the next dependent step.
- For saves, wait on an observable save-complete contract.
- For async controls, assert what makes the control enabled before clicking it.
- For flows with overlays, drawers, or command palettes, prove they are closed before interacting with the next surface.
- If an interaction fails because another surface still owns focus or pointer events, fix the readiness model instead of layering retries forever.
- 不要默认使用。
waitForLoadState("networkidle") - 对于导航,等待URL变更加上页面就绪标记,或响应契约加上就绪标记。
- 对于对话框,在交互前等待明确的打开状态,在下一步依赖步骤前等待关闭状态。
- 对于保存操作,等待可观察的保存完成契约。
- 对于异步控件,在点击前断言控件启用的条件。
- 对于包含覆盖层、抽屉或命令面板的流程,在与下一个界面交互前确认它们已关闭。
- 如果交互失败是因为另一个界面仍拥有焦点或指针事件,修复就绪模型而非不断添加重试。
Navigation and Assertion Rules
导航和断言规则
- Assert behavior, not implementation.
- Prefer user-visible outcomes over internal implementation details.
- When numeric values, statuses, roster assignments, or aggregates come from seeded business data, derive the expected value from the test-owned setup or canonical DB contract first, then assert the UI matches it.
- Use text assertions only when the text itself is the contract.
- Avoid broad page-level text matching when the real contract lives in a specific card, section, dialog, or row.
- Do not keep multiple redundant assertions for the same workflow outcome just to make a spec feel thorough.
- Do not make a failing test green by broadening selectors, weakening expectations, or accepting the current bug unless that weaker contract is the real intended product behavior.
- 断言行为,而非实现细节。
- 优先用户可见的结果,而非内部实现细节。
- 当数值、状态、人员分配或聚合数据来自预填充业务数据时,先从测试拥有的设置或标准数据库契约中获取预期值,再断言UI与之匹配。
- 仅当文本本身是契约时使用文本断言。
- 当真实契约存在于特定卡片、区域、对话框或行中时,避免使用宽泛的页面级文本匹配。
- 不要为同一个工作流结果保留多个冗余断言,只是为了让测试用例看起来更全面。
- 不要通过扩大选择器、弱化预期或接受当前bug来让失败的测试通过,除非那个更弱的契约是真正的预期产品行为。
Hard Bans
严格禁止
- default as the primary readiness strategy
networkidle - to assert behavior the UI can expose directly
page.evaluate(...) - giant omnibus specs spanning multiple workflows
- generic selectors like as the primary contract
button[role="combobox"] - silently reusing leftover state from prior tests
- deleting or consolidating legacy tests before the workflow mapping is explicit
- 将默认的作为主要就绪策略
networkidle - 使用断言UI可直接暴露的行为
page.evaluate(...) - 跨越多个工作流的巨型综合测试用例
- 将通用选择器如作为主要契约
button[role="combobox"] - 静默重用之前测试留下的状态
- 在明确映射工作流前删除或合并旧版测试
Failure Convergence Protocol
故障收敛协议
When many tests fail, do not patch them one by one by default.
- Cluster failures by root cause first.
- auth or bootstrap
- stale server or shared port contention
- modal or overlay state
- selector contract gaps
- readiness gaps
- data or setup nondeterminism
- obsolete workflow assumptions
- Fix shared contracts before individual tests.
- If the spec shape is wrong, stop patching and rewrite that workflow family.
- If a spec keeps re-breaking because it mixes too many workflows, retire the omnibus and split it.
当大量测试失败时,默认不要逐个修补。
- 首先按根本原因对故障进行分类:
- 认证或启动问题
- 服务器陈旧或共享端口冲突
- 模态框或覆盖层状态问题
- 选择器契约缺口
- 就绪条件缺口
- 数据或设置的不确定性
- 过时的工作流假设
- 在修复单个测试前先修复共享契约。
- 如果测试用例结构错误,停止修补并重写该工作流系列。
- 如果某个测试用例因混合太多工作流而不断失败,淘汰该综合测试并拆分。
Rewrite Governance
重写管理
- Coverage loss must be explicit, never accidental.
- Every removed legacy assertion must have a mapped replacement or a written obsolescence rationale.
- Prefer meaningful consolidation over duplicative green checks, but do not silently reduce user-journey coverage.
- Leave quarantined coverage only when:
fixme- the behavior is genuinely blocked
- the quarantine is explicit
- the rest of the family can still move forward safely
- Keep formal E2E assets in committed test directories and exploratory scripts or screenshots out of those directories.
- 测试覆盖丢失必须是明确的,绝不能是意外的。
- 每个移除的旧版断言必须有对应的替代方案或书面的过时理由。
- 优先进行有意义的合并,而非重复的通过测试,但不要静默减少用户旅程覆盖。
- 仅在以下情况下保留隔离的测试覆盖:
fixme- 行为确实被阻塞
- 隔离是明确的
- 该系列的其余部分仍能安全推进
- 将正式的端到端资产放在已提交的测试目录中,探索性脚本或截图放在这些目录之外。
Parallel Rewrite Validation
并行重写验证
Parallel rewrite work is allowed, but only when ownership and runtime isolation are real.
- Parallelize by workflow family or disjoint file ownership.
- Do not let multiple workers edit the same omnibus spec or shared helper without explicit ownership.
- If workers validate in parallel, isolate:
- port
- database or seed path
- output folder
- screenshots, traces, and artifacts
- If runtime isolation is not available, serialize Playwright validation even if rewrite coding stays parallel.
允许并行重写工作,但必须确保所有权和运行时隔离是真实的。
- 按工作流系列或不相交的文件所有权进行并行处理。
- 不要让多个工作者编辑同一个综合测试用例或共享助手,除非有明确的所有权。
- 如果工作者并行验证,隔离以下资源:
- 端口
- 数据库或预填充路径
- 输出文件夹
- 截图、跟踪记录和产物
- 如果无法实现运行时隔离,即使重写代码保持并行,也要序列化Playwright验证。
Full-Suite Freshness
全套件新鲜度
- Keep targeted validation fast, but do not let the maintained full-suite contract rot.
- If the repo or automation stack supports daily, nightly, or pre-release full-suite runs, treat that as part of quality visibility.
- If automated agents are expected to keep QA green, there should be observable evidence that they ran, what they ran, and whether they stayed green.
- A passing targeted spec is not a substitute for maintaining the broader suite over time.
- 保持针对性验证快速,但不要让维护的全套件契约失效。
- 如果仓库或自动化栈支持每日、夜间或预发布的全套件运行,将其视为质量可见性的一部分。
- 如果期望自动化Agent保持QA正常,应有可观察的证据证明它们已运行、运行了什么以及是否保持正常。
- 通过的针对性测试用例不能替代长期维护更广泛的测试套件。
Dev Server
开发服务器
For local web debugging, keep the app running in a persistent TTY session. Do not rely on one-shot background commands from a short-lived shell.
Use the repo's documented startup flow first. If there is no explicit contract, the common sandbox pattern is:
bash
npm run build
PORT=4444 npm run startBefore , verify the target port is listening and the app responds.
page.goto(...)After interactive verification is complete, stop the server process you started so the sandbox stays clean for the rest of the task.
对于本地Web调试,保持应用在持久的TTY会话中运行。不要依赖短暂shell中的一次性后台命令。
首先使用仓库文档化的启动流程。如果没有明确的契约,常见的沙箱模式是:
bash
npm run build
PORT=4444 npm run start在调用前,验证目标端口正在监听且应用能响应。
page.goto(...)交互式验证完成后,停止你启动的服务器进程,以便沙箱在任务剩余阶段保持干净。
Common Failure Modes
常见故障模式
- The browser flow passes only because an overlay or modal blocker was never actually closed.
- A test proves the DOM changed but does not prove the user-visible flow works.
- A spec uses the wrong roster, seed data, or identity assumptions and ends up "testing" fake state.
- A test is made green by adapting to the current bug instead of fixing the app or data contract.
- A test hard-codes display text, counts, or values that should have been derived from the test-owned DB or seed state.
- A worker rewrites files in parallel but validation still shares one fixed port and one mutable database.
- A suite looks green because duplicate tests were kept rather than properly consolidated.
- A large omnibus spec keeps hiding unique coverage because nobody mapped which extracted spec now owns each workflow.
- 浏览器流程通过只是因为覆盖层或模态框阻塞从未真正关闭。
- 测试证明DOM已变更,但未证明用户可见流程正常工作。
- 测试用例使用了错误的人员名单、预填充数据或身份假设,最终“测试”的是虚假状态。
- 通过适配当前bug让测试通过,而非修复应用或数据契约。
- 测试硬编码了显示文本、计数或数值,这些本应从测试拥有的数据库或预填充状态中获取。
- 工作者并行重写文件,但验证仍使用同一个固定端口和同一个可变数据库。
- 测试套件看起来正常是因为保留了重复测试,而非正确合并。
- 大型综合测试用例不断隐藏独特的测试覆盖,因为没人映射哪个提取的测试用例现在负责每个工作流。
Signoff Expectations
签核期望
- Interactive proof exists for the changed flow.
- If the flow matters for regression, durable E2E coverage exists too.
- The browser evidence matches the claims being made in the final response.
- The selectors, waits, and assertions are tied to real UI contracts.
- Any skipped, quarantined, consolidated, or retired coverage is called out explicitly.
- 变更后的流程存在交互式验证证据。
- 如果该流程对回归重要,则存在持久化的端到端测试覆盖。
- 浏览器证据与最终响应中的声明一致。
- 选择器、等待条件和断言与真实的UI契约相关联。
- 任何跳过、隔离、合并或淘汰的测试覆盖都被明确提及。