ce-dogfood-beta

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Dogfood (Beta)

内部测试(Dogfood,测试版)

Act as a QA engineer who dogfoods the active branch end-to-end: understand every change, test every change in a real browser as a user would, and fix what's broken — autonomously — until the branch is genuinely ready.
This is diff-scoped, not whole-app exploration. You test what this branch introduced or modified versus
main
. (For full-app exploratory QA, use the
dogfood
skill instead.)
以QA工程师的身份对当前分支进行端到端内部测试:理解每一处变更,像真实用户一样在浏览器中测试所有变更,并自主修复问题——直到分支真正就绪。
这是基于差异范围的测试,而非全应用探索。仅测试该分支相对于
main
分支新增或修改的内容。(如需全应用探索式QA,请使用
dogfood
技能。)

Use
agent-browser
Only For Browser Automation

仅使用
agent-browser
进行浏览器自动化

This workflow drives the browser exclusively through the
agent-browser
CLI. Do not use Chrome MCP tools (
mcp__claude-in-chrome__*
), any browser MCP integration, or other built-in browser-control tools. If the platform offers multiple ways to control a browser, always choose
agent-browser
. Use the direct binary, never
npx agent-browser
(the direct binary uses the fast Rust client).
该工作流仅通过
agent-browser
CLI驱动浏览器。请勿使用Chrome MCP工具(
mcp__claude-in-chrome__*
)、任何浏览器MCP集成或其他内置浏览器控制工具。若平台提供多种浏览器控制方式,请始终选择
agent-browser
。使用直接二进制文件,切勿使用
npx agent-browser
(直接二进制文件使用快速Rust客户端)。

Prerequisites

前置条件

  • A local dev server you can start (
    bin/dev
    ,
    rails server
    ,
    npm run dev
    , etc.).
  • agent-browser
    installed. Check:
    bash
    command -v agent-browser >/dev/null 2>&1 && echo "Ready" || echo "NOT INSTALLED"
    If not installed, run the
    ce-setup
    skill to install dependencies, then resume. Do not continue without it.
  • 可启动的本地开发服务器(
    bin/dev
    rails server
    npm run dev
    等)。
  • 已安装
    agent-browser
    。检查方式:
    bash
    command -v agent-browser >/dev/null 2>&1 && echo "Ready" || echo "NOT INSTALLED"
    若未安装,请运行
    ce-setup
    技能安装依赖,之后再继续。未安装
    agent-browser
    请勿继续。

Reusing Compound-Engineering Skills

复用复合工程技能

ce-dogfood-beta
is an orchestrator. Prefer delegating to existing CE skills over re-deriving their behavior:
WhenSkillWhy
Phase 0 isolation
ce-worktree
Run the dogfood in an isolated worktree so the main checkout stays clean.
agent-browser missing
ce-setup
Installs
agent-browser
and other deps.
A failure's root cause is non-obvious
ce-debug
Systematic root-cause analysis instead of guess-and-check.
Committing each fix
ce-commit
Consistent, well-scoped commit messages.
A bug reveals a reusable lesson
ce-compound
Capture the learning so the team compounds knowledge.
Reuse
ce-test-browser
's mechanics for port detection and dev-server startup (see Phase 3) rather than reinventing them.
ce-dogfood-beta
是一个编排器。优先复用现有CE技能,而非重新实现其逻辑:
场景技能原因
阶段0隔离
ce-worktree
在独立工作树中运行测试,保持主 checkout 干净。
缺少agent-browser
ce-setup
安装
agent-browser
及其他依赖。
故障根因不明确
ce-debug
系统性根因分析,而非猜测验证。
提交修复
ce-commit
生成一致、范围清晰的提交信息。
漏洞揭示可复用经验
ce-compound
记录经验,帮助团队积累知识。
复用
ce-test-browser
的端口检测和开发服务器启动机制(见阶段3),而非重新开发。

Workflow

工作流程

0. Scope        Pick the branch, get onto it (offer worktree), never touch main
1. Analyze      Diff branch vs main, understand every change
2. Map+Matrix   Map user flows as Mermaid flowcharts, then derive the test matrix as a task list
3. Serve        Detect port, start dev server, open agent-browser
4. Execute      Work the matrix one item at a time with agent-browser
5. Fix loop     On failure: fix -> add regression test -> commit -> continue
6. Report       Write durable doc to docs/dogfood-reports/ (flows, matrix, fixes, learnings, verdict)
0. 范围确定        选择分支,切换到该分支(提供工作树选项),绝不触碰main分支
1. 变更分析      对比分支与main的差异,理解每一处变更
2. 流程映射与测试矩阵   将用户流程映射为Mermaid流程图,然后从流程中生成测试矩阵任务列表
3. 启动服务        检测端口,启动开发服务器,打开agent-browser
4. 执行测试      逐个完成测试矩阵中的任务,使用agent-browser执行
5. 修复循环     测试失败时:修复问题 -> 添加回归测试 -> 提交修复 -> 继续测试
6. 生成报告       将持久化文档写入docs/dogfood-reports/(包含流程、矩阵、修复内容、经验总结、最终结论)

Phase 0: Scope and Get on the Right Branch

阶段0:确定范围并切换到目标分支

Parse
$ARGUMENTS
: a PR number, a branch name, or blank (use current branch). Strip
--port PORT
if present.
  1. Resolve the target branch:
    • PR number:
      gh pr checkout <number>
      (probe for an existing worktree first).
    • Branch name: check it out (probe for an existing worktree first).
    • Blank: use the current branch.
  2. Refuse to run on
    main
    /
    master
    .
    If the resolved branch is the trunk, stop and tell the user — there is no diff to dogfood.
  3. Offer isolation. Ask whether to run in a git worktree so the main checkout stays untouched (use the platform's blocking question tool). If yes, hand off to
    ce-worktree
    ; if no, continue in place.
  4. Resume if a prior run exists. Look for an existing report at
    docs/dogfood-reports/*-<branch-slug>-dogfood.md
    . If one is found with unfinished scenarios, ask whether to resume it or start fresh. To resume, re-hydrate the task list from its matrix (Pass/Fixed/Skipped stay done; Pending/Blocked/in-progress become the remaining work) and continue from there.
解析
$ARGUMENTS
:PR编号、分支名称或空值(使用当前分支)。若存在
--port PORT
则剥离该参数。
  1. 解析目标分支:
    • PR编号
      gh pr checkout <number>
      (先检查是否存在现有工作树)。
    • 分支名称:检出该分支(先检查是否存在现有工作树)。
    • 空值:使用当前分支。
  2. 拒绝在
    main
    /
    master
    分支运行
    。若解析后的分支为主干分支,请停止并告知用户——无差异可测试。
  3. 提供隔离选项。询问是否在git工作树中运行测试,以保持主 checkout 不受影响(使用平台的阻塞式提问工具)。若同意,移交至
    ce-worktree
    ;若不同意,直接在当前环境继续。
  4. 恢复之前的测试运行。查找
    docs/dogfood-reports/*-<branch-slug>-dogfood.md
    路径下的现有报告。若找到包含未完成场景的报告,询问是恢复测试还是重新开始。若恢复,从报告的矩阵中重新加载任务列表(已通过/已修复/已跳过的任务保持完成状态;待处理/阻塞/进行中的任务作为剩余工作),然后继续测试。

Resumability (stop and return at any point)

可恢复性(可随时停止并返回)

This workflow is designed to be interrupted and resumed. Two pieces of state make that safe:
  • The task list (
    TaskCreate
    /
    TaskUpdate
    ) is the live to-do — one task per matrix scenario. Mark each
    in_progress
    when you start it and
    completed
    only when it genuinely passes.
  • The report doc at
    docs/dogfood-reports/<YYYY-MM-DD>-<branch-slug>-dogfood.md
    is the durable checkpoint that survives across sessions. Create it as soon as the matrix exists (end of Phase 2) with every scenario listed as
    Pending
    , and update it incrementally — after each scenario is judged and after each fix is committed — not only at the end.
Because tasks are session-scoped but the report doc is on disk, the report is the source of truth for resuming. Always keep the two in sync so a later run (or a teammate) can pick up exactly where this one stopped.
该工作流支持中断后恢复。以下两种状态确保恢复安全:
  • 任务列表
    TaskCreate
    /
    TaskUpdate
    )是实时待办事项——每个矩阵场景对应一个任务。开始任务时标记为
    in_progress
    ,仅当真正通过时标记为
    completed
  • 报告文档位于
    docs/dogfood-reports/<YYYY-MM-DD>-<branch-slug>-dogfood.md
    ,是跨会话持久化的检查点。在阶段2结束(矩阵生成后)立即创建报告,将所有场景标记为
    Pending
    ,并逐步更新——在每个场景评估完成后、每个修复提交后更新,而非仅在最后更新。
由于任务是会话范围的,而报告文档存储在磁盘上,因此报告是恢复测试的唯一可信来源。始终保持两者同步,以便后续运行(或团队成员)可以从当前停止的位置继续。

Phase 1: Analyze Changes

阶段1:分析变更

Pull the full diff against
main
and read it carefully — you cannot test what you don't understand.
bash
git diff --name-only main...HEAD     # what changed
git diff main...HEAD                 # how it changed
Build a mental model of every change: new features, modified behavior, new routes/views/components, touched data flows. Note anything that produces user-visible behavior — that is what the matrix must cover.
Ground in the product's personas and vision. Look for persona and vision context so flows can be judged from real users' eyes, not just "does it work." Check, in order:
STRATEGY.md
(its "Who it's for" section names the primary persona and their job-to-be-done),
VISION.md
, and any persona docs (e.g.
docs/personas/
,
PERSONAS.md
). Capture the 1-3 primary personas and what each cares about. If none exist, infer a reasonable primary persona from the product and the diff, and say so in the report.
拉取与
main
分支的完整差异并仔细阅读——无法测试未理解的内容。
bash
git diff --name-only main...HEAD     # 变更了哪些文件
git diff main...HEAD                 # 具体变更内容
建立对所有变更的认知模型:新功能、修改的行为、新路由/视图/组件、涉及的数据流程。记录所有产生用户可见行为的内容——这些是测试矩阵必须覆盖的部分。
结合产品角色与愿景。查找角色和愿景相关内容,以便从真实用户的角度评估流程,而非仅关注“功能是否可用”。按以下顺序检查:
STRATEGY.md
(其“目标用户”部分列出主要角色及其核心需求)、
VISION.md
,以及任何角色文档(如
docs/personas/
PERSONAS.md
)。记录1-3个主要角色及其关注点。若没有相关文档,从产品和差异内容中推断合理的主要角色,并在报告中说明。

Phase 2: Map the Flows, Then Build the Matrix

阶段2:映射用户流程,构建测试矩阵

The quality of the whole dogfood depends on this phase. Do not jump straight to a flat list of pages. First understand the user flows the diff touches, then derive the matrix from them. A matrix built without a flow model tests pages in isolation and misses the journey — the email that "sends" but lands in the wrong thread.
整个内部测试的质量取决于此阶段。不要直接生成扁平的页面列表。首先理解差异涉及的用户流程,然后从中生成测试矩阵。未基于流程模型构建的矩阵会孤立测试页面,遗漏完整流程——例如邮件“发送成功”但进入错误线程的情况。

2a. Map the user flows (required)

2a. 映射用户流程(必填)

For every user-visible change, trace the complete journey end to end and draw it. Map each flow as a Mermaid
flowchart
so the journey is explicit and reviewable before any testing happens — entry point, each user action, branch points (success / validation error / empty / permission-denied), side effects (emails, jobs, notifications), and the true end state.
Email example: it's not enough that "an email sends." Does it go to the right recipient? When the user clicks through, does the app land on and scroll to the right message? Does the content make sense? Does the whole flow align with the product's vision and UX? The flowchart must carry the click-through and its destination, not stop at "email sent."
mermaid
flowchart TD
    A[User opens /threads] --> B[Clicks 'Reply']
    B --> C{Form valid?}
    C -->|No| D[Inline validation error shown]
    C -->|Yes| E[Reply saved]
    E --> F[Notification email sent to thread participants]
    E --> G[UI scrolls to new reply, focus on it]
    F --> H[Recipient clicks email link]
    H --> I{Lands on correct thread + scrolls to the reply?}
Produce one flowchart per distinct journey. Cover the happy path and the branch points (error, empty, boundary, permission). These diagrams ARE the understanding — they become the spine of the matrix and belong in the final report.
对于每一处用户可见的变更,追踪完整的端到端流程并绘制流程图。将每个流程映射为Mermaid
flowchart
,以便在测试前明确且可审查整个流程——包括入口点、每个用户操作、分支点(成功/验证错误/空状态/权限拒绝)、副作用(邮件、任务、通知)以及最终状态。
邮件示例:仅测试“邮件发送成功”是不够的。邮件是否发送给正确的收件人?用户点击链接后,应用是否跳转到正确的消息并滚动到对应位置?内容是否合理?整个流程是否符合产品愿景和UX设计?流程图必须包含点击跳转及其目标,不能在“邮件发送”处停止。
mermaid
flowchart TD
    A[User opens /threads] --> B[Clicks 'Reply']
    B --> C{Form valid?}
    C -->|No| D[Inline validation error shown]
    C -->|Yes| E[Reply saved]
    E --> F[Notification email sent to thread participants]
    E --> G[UI scrolls to new reply, focus on it]
    F --> H[Recipient clicks email link]
    H --> I{Lands on correct thread + scrolls to the reply?}
为每个不同的流程生成一个流程图。覆盖正常路径以及分支点(错误、空状态、边界情况、权限问题)。这些图表是对流程的理解——它们将成为测试矩阵的核心,并纳入最终报告。

2b. Derive the matrix from the flows

2b. 从流程中生成测试矩阵

Walk each flowchart and turn every node and branch into one or more test scenarios. Read
references/test-matrix-taxonomy.md
for the full set of dimensions (journeys, functional checks, experiential checks, edge/error/empty states, accessibility, responsiveness). Cover both functional ("does it work?") and experiential ("does it feel right and align with the product?").
Map changed files to concrete routes (views -> their pages, components -> pages rendering them, layouts -> all pages, stylesheets -> visual regression on key pages) and attach those routes to the flows that exercise them.
Load the matrix as a task list (
TaskCreate
), one task per scenario, so progress is tracked and nothing is skipped. Order tasks by flow, following the flowcharts, not by file.
遍历每个流程图,将每个节点和分支转换为一个或多个测试场景。阅读
references/test-matrix-taxonomy.md
获取完整的维度列表(流程、功能检查、体验检查、边缘/错误/空状态、可访问性、响应性)。同时覆盖功能(“是否可用?”)和体验(“是否符合产品设计且体验良好?”)。
将变更的文件映射到具体路由(视图对应页面、组件对应渲染它们的页面、布局对应所有页面、样式表对应关键页面的视觉回归),并将这些路由关联到对应的测试流程。
将矩阵加载为任务列表
TaskCreate
),每个场景对应一个任务,以便跟踪进度,避免遗漏。按流程顺序排列任务,而非按文件顺序。

Phase 3: Detect Port and Start the Dev Server

阶段3:检测端口并启动开发服务器

Determine the port (priority: explicit
--port
>
AGENTS.md
/
CLAUDE.md
>
package.json
dev script >
.env*
PORT=
> default
3000
). If a server is already listening, reuse it; otherwise start the project's dev command in the background and wait for the port to come up. This is the same mechanism
ce-test-browser
uses — follow its Phase 5–6 logic.
bash
agent-browser open "http://localhost:${PORT}"
agent-browser snapshot -i
确定端口(优先级:显式
--port
参数 >
AGENTS.md
/
CLAUDE.md
>
package.json
中的dev脚本 >
.env*
中的
PORT=
配置 > 默认
3000
)。若端口已被监听,则复用该服务器;否则在后台启动项目的开发命令,等待端口就绪。此机制与
ce-test-browser
使用的逻辑相同——遵循其阶段5-6的逻辑。
bash
agent-browser open "http://localhost:${PORT}"
agent-browser snapshot -i

Phase 4: Execute the Matrix

阶段4:执行测试矩阵

Work the task list one item at a time. For each scenario, mark the task
in_progress
, then:
  1. Document what you're testing (the journey and the expected outcome).
  2. Drive it with agent-browser — navigate, snapshot for interactive refs, click, fill, submit, follow the journey to its real end state:
    bash
    agent-browser open "http://localhost:${PORT}/<route>"
    agent-browser snapshot -i
    agent-browser click @e1
    agent-browser fill @e2 "value"
    agent-browser screenshot <scenario>.png
    agent-browser errors      # check console/page errors
  3. Judge both correctness and experience: right data, right destination, sensible content, no console errors, and does it feel aligned with the product?
  4. Walk it as each persona. Re-run the journey in your head from each primary persona's perspective (from Phase 1) and ask where they'd feel a paper cut — a small friction that wouldn't fail a functional test but degrades the experience: a confusing label, an extra click, an unexpected jump, a slow-feeling step, missing feedback, copy that doesn't match how that persona thinks. A scenario can be functionally
    Pass
    yet still carry paper cuts. Note each paper cut, which persona feels it, and its severity.
  5. Record pass/fail plus any paper cuts, with specifics. Mark the task
    completed
    only when it genuinely passes (paper cuts are logged, not blockers — fix the sharp ones in Phase 5, surface the rest in the report).
External-interaction flows (OAuth, real email delivery, payments, SMS) can't be fully driven headlessly — pause and ask the user to verify that leg, then continue.
逐个处理任务列表中的任务。对于每个场景,先将任务标记为
in_progress
,然后:
  1. 记录测试内容(流程和预期结果)。
  2. 通过agent-browser驱动测试——导航、生成交互式快照、点击、填写、提交,跟踪流程至真实最终状态:
    bash
    agent-browser open "http://localhost:${PORT}/<route>"
    agent-browser snapshot -i
    agent-browser click @e1
    agent-browser fill @e2 "value"
    agent-browser screenshot <scenario>.png
    agent-browser errors      # 检查控制台/页面错误
  3. 判断正确性和体验:数据正确、跳转目标正确、内容合理、无控制台错误,且符合产品设计?
  4. 从每个角色的角度重新审视。从阶段1确定的每个主要角色的角度重新运行流程,找出他们可能遇到的小问题——不会导致功能测试失败但会降低体验的小摩擦:标签混淆、多余点击、意外跳转、步骤卡顿、缺少反馈、文案不符合角色认知。场景可能功能上
    Pass
    但仍存在小问题。记录每个小问题、受影响的角色及其严重程度。
  5. 记录测试结果(通过/失败)及所有小问题的细节。仅当场景真正通过时标记任务为
    completed
    (小问题仅记录,不阻塞测试——在阶段5修复严重问题,其余问题在报告中说明)。
外部交互流程(OAuth、真实邮件投递、支付、短信)无法完全无头驱动——暂停并请求用户验证该部分,然后继续测试。

Phase 5: Fix Loop (Autonomous)

阶段5:修复循环(自主执行)

When a scenario fails, fix it and prove it — but first decide whether the fix is yours to make autonomously or a human's to decide.
Judge the size of the fix before touching code. Auto-fix when the change is small, well-understood, and low-risk: a clear bug with an obvious correct fix, contained to a few files, no schema/architecture/product trade-off. Do not auto-fix when the change is large or ambiguous — it requires an architectural or schema decision, changes product behavior or UX intent, spans many files, has plausible competing solutions, or you're not confident the "right" answer is unambiguous. Forcing a big judgment call autonomously is worse than escalating it.
For autonomous fixes:
  1. Investigate the root cause. If it's non-obvious, use
    ce-debug
    .
  2. Apply the fix in the code.
  3. Add an automated regression test that fails before the fix and passes after, so the bug can't return.
  4. Commit the fix with a clear message (use
    ce-commit
    ). One logical fix per commit.
  5. Re-run the failing scenario in the browser to confirm it now passes; then continue the matrix.
  6. If the bug carried a reusable lesson, capture it with
    ce-compound
    .
For changes too big to make autonomously: do not implement. Record it in the report's Decisions for a human section with: what's broken, why it's not a safe autonomous fix, the options you see (with trade-offs), and your recommendation. Mark the scenario
Blocked (human decision)
in the matrix, then continue with the rest. Never make a large, irreversible, or product-altering change just to clear a matrix item.
Keep iterating until every task is
completed
or explicitly
Blocked (human decision)
. Re-test anything a fix might have affected (watch for regressions in adjacent journeys).
当场景测试失败时,修复问题并验证——但首先判断是否可以自主修复,还是需要人工决策。
在修改代码前判断修复规模。当变更小、易理解且低风险时,自动修复:明确的漏洞且有明显正确的修复方案,仅涉及少量文件,无需架构/产品权衡。请勿自动修复大规模或模糊的变更——需要架构或方案决策、改变产品行为或UX意图、涉及多个文件、存在多种可行解决方案,或无法确定“正确”答案时。自主做出重大判断比上报更糟糕。
自主修复步骤:
  1. 调查根因。若根因不明确,使用
    ce-debug
  2. 在代码中应用修复。
  3. 添加自动化回归测试——修复前测试失败,修复后测试通过,防止漏洞再次出现。
  4. 使用
    ce-commit
    提交修复,提交信息清晰。每个提交对应一个逻辑修复。
  5. 在浏览器中重新运行失败的场景,确认已通过;然后继续测试矩阵中的其他任务。
  6. 若漏洞揭示了可复用经验,使用
    ce-compound
    记录。
对于无法自主修复的变更: 请勿实现。在报告的人工决策事项部分记录:问题内容、无法自主修复的原因、可行方案(含权衡)、你的建议。在矩阵中将场景标记为
Blocked (human decision)
,然后继续测试其他任务。切勿为了通过测试矩阵而做出大规模、不可逆或改变产品的变更。
持续迭代,直到所有任务标记为
completed
或明确标记为
Blocked (human decision)
。重新测试修复可能影响的内容(注意相邻流程的回归问题)。

Phase 6: Write the Report Artifact

阶段6:生成报告文档

The report doc was created at the end of Phase 2 and updated incrementally throughout (see Resumability). When the matrix is green (or every remaining item is explicitly blocked), finalize it at
docs/dogfood-reports/<YYYY-MM-DD>-<branch-slug>-dogfood.md
in the repo under test, then surface a short summary in chat with the file path.
Use
references/dogfood-report-template.md
as the shape — the same way plans and brainstorms are captured from a template. The finalized artifact must include:
  1. Diff Summary — what changed between the branch and
    main
    .
  2. Personas — the primary personas evaluated against (and their source: STRATEGY.md / VISION.md / inferred).
  3. Flows tested — the Mermaid flowcharts from Phase 2a, so the journeys are preserved.
  4. Test Matrix & Results — every scenario: what was tested, pass/fail, issue found, fix applied, commit SHA.
  5. What was fixed — each bug, its root cause, the fix, the regression test added, and the commit.
  6. Paper cuts (by persona) — experiential friction found, which persona feels each, severity, and whether fixed or deferred.
  7. Decisions for a human — issues too big to fix autonomously: what's broken, why it was escalated, options with trade-offs, and a recommendation.
  8. Learnings — reusable lessons worth carrying forward (feed substantial ones to
    ce-compound
    ).
  9. Final Status — readiness verdict, plus anything still blocked or needing human verification.
Use repo-relative paths in the doc, never absolute paths, so it stays portable.
报告文档在阶段2结束时创建,并在整个测试过程中逐步更新(见可恢复性部分)。当测试矩阵全部通过(或剩余任务均明确标记为阻塞)时,最终定稿并保存到被测仓库的
docs/dogfood-reports/<YYYY-MM-DD>-<branch-slug>-dogfood.md
路径下,然后在聊天中提供简短摘要和文件路径。
使用
references/dogfood-report-template.md
作为模板——与计划和头脑风暴文档的记录方式一致。最终文档必须包含:
  1. 差异摘要——分支与
    main
    分支的变更内容。
  2. 用户角色——评估所针对的主要角色(及其来源:STRATEGY.md / VISION.md / 推断)。
  3. 测试流程——阶段2a中的Mermaid流程图,保留完整流程记录。
  4. 测试矩阵与结果——每个场景:测试内容、通过/失败状态、发现的问题、应用的修复、提交SHA。
  5. 修复内容——每个漏洞、根因、修复方案、添加的回归测试、提交信息。
  6. 小问题(按角色分类)——发现的体验摩擦、受影响的角色、严重程度、是否已修复或延期处理。
  7. 人工决策事项——无法自主修复的问题:问题内容、上报原因、可行方案(含权衡)、建议。
  8. 经验总结——值得复用的经验(重要经验提交给
    ce-compound
    )。
  9. 最终状态——就绪结论,以及仍阻塞或需要人工验证的内容。
文档中使用仓库相对路径,切勿使用绝对路径,确保文档可移植。