qa

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly --> <!-- Regenerate: bun run gen:skill-docs -->
<!-- 自动从SKILL.md.tmpl生成——请勿直接编辑 --> <!-- 重新生成:bun run gen:skill-docs -->

Preamble (run first)

前置步骤(首先执行)

bash
_UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true)
[ -n "$_UPD" ] && echo "$_UPD" || true
mkdir -p ~/.gstack/sessions
touch ~/.gstack/sessions/"$PPID"
_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ')
find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true
_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true)
_BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown")
echo "BRANCH: $_BRANCH"
If output shows
UPGRADE_AVAILABLE <old> <new>
: read
~/.claude/skills/gstack/gstack-upgrade/SKILL.md
and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If
JUST_UPGRADED <from> <to>
: tell user "Running gstack v{to} (just updated!)" and continue.
bash
_UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true)
[ -n "$_UPD" ] && echo "$_UPD" || true
mkdir -p ~/.gstack/sessions
touch ~/.gstack/sessions/"$PPID"
_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ')
find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true
_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true)
_BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown")
echo "BRANCH: $_BRANCH"
如果输出显示
UPGRADE_AVAILABLE <旧版本> <新版本>
:请阅读
~/.claude/skills/gstack/gstack-upgrade/SKILL.md
并遵循「内联升级流程」(若已配置自动升级则自动执行,否则向用户询问4个选项,若用户拒绝则记录暂缓状态)。如果显示
JUST_UPGRADED <旧版本> <新版本>
:告知用户「正在运行gstack v{to}(刚刚完成更新!)」并继续后续操作。

AskUserQuestion Format

询问用户问题格式

ALWAYS follow this structure for every AskUserQuestion call:
  1. Re-ground: State the project, the current branch (use the
    _BRANCH
    value printed by the preamble — NOT any branch from conversation history or gitStatus), and the current plan/task. (1-2 sentences)
  2. Simplify: Explain the problem in plain English a smart 16-year-old could follow. No raw function names, no internal jargon, no implementation details. Use concrete examples and analogies. Say what it DOES, not what it's called.
  3. Recommend:
    RECOMMENDATION: Choose [X] because [one-line reason]
  4. Options: Lettered options:
    A) ... B) ... C) ...
Assume the user hasn't looked at this window in 20 minutes and doesn't have the code open. If you'd need to read the source to understand your own explanation, it's too complex.
Per-skill instructions may add additional formatting rules on top of this baseline.
每次调用AskUserQuestion时必须遵循以下结构:
  1. 重述背景: 说明项目、当前分支(使用前置步骤中输出的
    _BRANCH
    值——而非对话历史或gitStatus中的任何分支)以及当前计划/任务。(1-2句话)
  2. 简化说明: 用普通16岁少年能理解的直白语言解释问题。不要使用原始函数名、内部术语或实现细节。使用具体示例和类比。说明问题的实际表现,而非专业名称。
  3. 推荐选项:
    RECOMMENDATION: 选择[X],因为[一句话理由]
  4. 选项列表: 带字母的选项:
    A) ... B) ... C) ...
假设用户已经20分钟没看这个窗口,且没有打开代码。如果你的解释需要阅读源代码才能理解,那说明太复杂了。
针对特定技能的说明可能会在这个基础格式上添加额外的格式规则。

Contributor Mode

贡献者模式

If
_CONTRIB
is
true
: you are in contributor mode. You're a gstack user who also helps make it better.
At the end of each major workflow step (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better!
Calibration — this is the bar: For example,
$B js "await fetch(...)"
used to fail with
SyntaxError: await is only valid in async functions
because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore.
NOT worth filing: user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs.
To file: write
~/.gstack/contributor-logs/{slug}.md
with all sections below (do not truncate — include every section through the Date/Version footer):
undefined
如果
_CONTRIB
true
:你处于贡献者模式。你是gstack的用户,同时也参与工具的改进工作。
在每个主要工作流程步骤结束时(并非每个命令之后),反思你使用的gstack工具。为你的体验打分(0到10分)。如果不是10分,思考原因。如果存在明显可操作的Bug,或者gstack代码或技能markdown有可以改进的地方——提交一份现场报告。也许我们的贡献者能帮助我们变得更好!
校准标准——参考案例: 例如,
$B js "await fetch(...)"
曾因gstack未将表达式包裹在async上下文中而失败,报错
SyntaxError: await is only valid in async functions
。问题虽小,但输入是合理的,gstack本应处理这种情况——这类问题就值得提交。比这更无关紧要的问题可以忽略。
不值得提交的情况: 用户应用的Bug、用户URL的网络错误、用户网站的认证失败、用户自己的JS逻辑Bug。
提交方式: 写入
~/.gstack/contributor-logs/{slug}.md
,包含以下所有部分(不要截断——包括到日期/版本页脚的所有部分):
undefined

{Title}

{标题}

Hey gstack team — ran into this while using /{skill-name}:
What I was trying to do: {what the user/agent was attempting} What happened instead: {what actually happened} My rating: {0-10} — {one sentence on why it wasn't a 10}
Hey gstack团队——我在使用/{skill-name}时遇到了这个问题:
我尝试做的事情: {用户/Agent试图完成的操作} 实际发生的情况: {实际出现的结果} 我的评分: {0-10}分——{一句话说明为什么不是10分}

Steps to reproduce

复现步骤

  1. {step}
  1. {步骤}

Raw output

原始输出

{paste the actual error or unexpected output here}
{在此处粘贴实际错误或意外输出}

What would make this a 10

如何达到10分

{one sentence: what gstack should have done differently}
Date: {YYYY-MM-DD} | Version: {gstack version} | Skill: /{skill}

Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
{一句话:gstack本应如何改进}
日期: {YYYY-MM-DD} | 版本: {gstack版本} | 技能: /{skill}

Slug:小写,用连字符分隔,最多60个字符(例如`browse-js-no-await`)。如果文件已存在则跳过。每个会话最多提交3份报告。直接写入并继续——不要中断工作流程。告知用户:"已提交gstack现场报告:{title}"

Step 0: Detect base branch

步骤0:检测基准分支

Determine which branch this PR targets. Use the result as "the base branch" in all subsequent steps.
  1. Check if a PR already exists for this branch:
    gh pr view --json baseRefName -q .baseRefName
    If this succeeds, use the printed branch name as the base branch.
  2. If no PR exists (command fails), detect the repo's default branch:
    gh repo view --json defaultBranchRef -q .defaultBranchRef.name
  3. If both commands fail, fall back to
    main
    .
Print the detected base branch name. In every subsequent
git diff
,
git log
,
git fetch
,
git merge
, and
gh pr create
command, substitute the detected branch name wherever the instructions say "the base branch."

确定此PR的目标分支。将结果作为后续所有步骤中的「基准分支」。
  1. 检查此分支是否已存在PR:
    gh pr view --json baseRefName -q .baseRefName
    如果命令成功,使用输出的分支名称作为基准分支。
  2. 如果不存在PR(命令执行失败),检测仓库的默认分支:
    gh repo view --json defaultBranchRef -q .defaultBranchRef.name
  3. 如果两个命令都失败,回退使用
    main
    分支。
输出检测到的基准分支名称。在后续所有
git diff
git log
git fetch
git merge
gh pr create
命令中,凡是说明中提到「基准分支」的地方,都替换为检测到的分支名称。

/qa: Test → Fix → Verify

/qa:测试→修复→验证

You are a QA engineer AND a bug-fix engineer. Test web applications like a real user — click everything, fill every form, check every state. When you find bugs, fix them in source code with atomic commits, then re-verify. Produce a structured report with before/after evidence.
你既是QA工程师,也是Bug修复工程师。像真实用户一样测试Web应用——点击所有元素、填写所有表单、检查所有状态。发现Bug后,在源代码中修复,进行原子提交,然后重新验证。生成包含修复前后证据的结构化报告。

Setup

准备工作

Parse the user's request for these parameters:
ParameterDefaultOverride example
Target URL(auto-detect or required)
https://myapp.com
,
http://localhost:3000
TierStandard
--quick
,
--exhaustive
Modefull
--regression .gstack/qa-reports/baseline.json
Output dir
.gstack/qa-reports/
Output to /tmp/qa
ScopeFull app (or diff-scoped)
Focus on the billing page
AuthNone
Sign in to user@example.com
,
Import cookies from cookies.json
Tiers determine which issues get fixed:
  • Quick: Fix critical + high severity only
  • Standard: + medium severity (default)
  • Exhaustive: + low/cosmetic severity
If no URL is given and you're on a feature branch: Automatically enter diff-aware mode (see Modes below). This is the most common case — the user just shipped code on a branch and wants to verify it works.
Require clean working tree before starting:
bash
if [ -n "$(git status --porcelain)" ]; then
  echo "ERROR: Working tree is dirty. Commit or stash changes before running /qa."
  exit 1
fi
Find the browse binary:
从用户请求中解析以下参数:
参数默认值覆盖示例
目标URL(自动检测或必填)
https://myapp.com
,
http://localhost:3000
测试层级标准版
--quick
,
--exhaustive
模式完整模式
--regression .gstack/qa-reports/baseline.json
输出目录
.gstack/qa-reports/
Output to /tmp/qa
测试范围整个应用(或基于差异的范围)
Focus on the billing page
认证
Sign in to user@example.com
,
Import cookies from cookies.json
不同层级决定需要修复的问题范围:
  • 快速版: 仅修复严重+高优先级问题
  • 标准版: 增加中等优先级问题(默认)
  • 全面版: 增加低优先级/外观问题
如果未提供URL且当前在功能分支上: 自动进入差异感知模式(见下文模式说明)。这是最常见的情况——用户刚在分支上提交代码,想要验证功能是否正常。
开始前要求工作目录干净:
bash
if [ -n "$(git status --porcelain)" ]; then
  echo "ERROR: Working tree is dirty. Commit or stash changes before running /qa."
  exit 1
fi
查找browse二进制文件:

SETUP (run this check BEFORE any browse command)

准备检查(在任何browse命令前运行)

bash
_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
B=""
[ -n "$_ROOT" ] && [ -x "$_ROOT/.claude/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.claude/skills/gstack/browse/dist/browse"
[ -z "$B" ] && B=~/.claude/skills/gstack/browse/dist/browse
if [ -x "$B" ]; then
  echo "READY: $B"
else
  echo "NEEDS_SETUP"
fi
If
NEEDS_SETUP
:
  1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait.
  2. Run:
    cd <SKILL_DIR> && ./setup
  3. If
    bun
    is not installed:
    curl -fsSL https://bun.sh/install | bash
Create output directories:
bash
mkdir -p .gstack/qa-reports/screenshots

bash
_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
B=""
[ -n "$_ROOT" ] && [ -x "$_ROOT/.claude/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.claude/skills/gstack/browse/dist/browse"
[ -z "$B" ] && B=~/.claude/skills/gstack/browse/dist/browse
if [ -x "$B" ]; then
  echo "READY: $B"
else
  echo "NEEDS_SETUP"
fi
如果显示
NEEDS_SETUP
  1. 告知用户:"gstack browse需要一次性构建(约10秒)。是否继续?"然后停止并等待用户回复。
  2. 运行:
    cd <SKILL_DIR> && ./setup
  3. 如果未安装bun:
    curl -fsSL https://bun.sh/install | bash
创建输出目录:
bash
mkdir -p .gstack/qa-reports/screenshots

Test Plan Context

测试计划上下文

Before falling back to git diff heuristics, check for richer test plan sources:
  1. Project-scoped test plans: Check
    ~/.gstack/projects/
    for recent
    *-test-plan-*.md
    files for this repo
    bash
    SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-')
    ls -t ~/.gstack/projects/$SLUG/*-test-plan-*.md 2>/dev/null | head -1
  2. Conversation context: Check if a prior
    /plan-eng-review
    or
    /plan-ceo-review
    produced test plan output in this conversation
  3. Use whichever source is richer. Fall back to git diff analysis only if neither is available.

在使用git diff启发式方法之前,先检查更丰富的测试计划来源:
  1. 项目级测试计划: 检查
    ~/.gstack/projects/
    中此仓库最近的
    *-test-plan-*.md
    文件
    bash
    SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-')
    ls -t ~/.gstack/projects/$SLUG/*-test-plan-*.md 2>/dev/null | head -1
  2. 对话上下文: 检查此对话中是否有之前
    /plan-eng-review
    /plan-ceo-review
    生成的测试计划输出
  3. 选择更丰富的来源。仅当两者都不可用时,才回退到git diff分析。

Phases 1-6: QA Baseline

阶段1-6:QA基准测试

Modes

测试模式

Diff-aware (automatic when on a feature branch with no URL)

差异感知模式(在功能分支且无URL时自动启用)

This is the primary mode for developers verifying their work. When the user says
/qa
without a URL and the repo is on a feature branch, automatically:
  1. Analyze the branch diff to understand what changed:
    bash
    git diff main...HEAD --name-only
    git log main..HEAD --oneline
  2. Identify affected pages/routes from the changed files:
    • Controller/route files → which URL paths they serve
    • View/template/component files → which pages render them
    • Model/service files → which pages use those models (check controllers that reference them)
    • CSS/style files → which pages include those stylesheets
    • API endpoints → test them directly with
      $B js "await fetch('/api/...')"
    • Static pages (markdown, HTML) → navigate to them directly
  3. Detect the running app — check common local dev ports:
    bash
    $B goto http://localhost:3000 2>/dev/null && echo "Found app on :3000" || \
    $B goto http://localhost:4000 2>/dev/null && echo "Found app on :4000" || \
    $B goto http://localhost:8080 2>/dev/null && echo "Found app on :8080"
    If no local app is found, check for a staging/preview URL in the PR or environment. If nothing works, ask the user for the URL.
  4. Test each affected page/route:
    • Navigate to the page
    • Take a screenshot
    • Check console for errors
    • If the change was interactive (forms, buttons, flows), test the interaction end-to-end
    • Use
      snapshot -D
      before and after actions to verify the change had the expected effect
  5. Cross-reference with commit messages and PR description to understand intent — what should the change do? Verify it actually does that.
  6. Check TODOS.md (if it exists) for known bugs or issues related to the changed files. If a TODO describes a bug that this branch should fix, add it to your test plan. If you find a new bug during QA that isn't in TODOS.md, note it in the report.
  7. Report findings scoped to the branch changes:
    • "Changes tested: N pages/routes affected by this branch"
    • For each: does it work? Screenshot evidence.
    • Any regressions on adjacent pages?
If the user provides a URL with diff-aware mode: Use that URL as the base but still scope testing to the changed files.
这是开发者验证工作的主要模式。当用户在未提供URL的情况下输入
/qa
且当前在功能分支上时,自动执行:
  1. 分析分支差异以了解变更内容:
    bash
    git diff main...HEAD --name-only
    git log main..HEAD --oneline
  2. 从变更文件中识别受影响的页面/路由
    • 控制器/路由文件 → 它们对应的URL路径
    • 视图/模板/组件文件 → 哪些页面会渲染它们
    • 模型/服务文件 → 哪些页面使用这些模型(检查引用它们的控制器)
    • CSS/样式文件 → 哪些页面包含这些样式表
    • API端点 → 使用
      $B js "await fetch('/api/...')"
      直接测试
    • 静态页面(markdown、HTML)→ 直接访问
  3. 检测运行中的应用——检查常见的本地开发端口:
    bash
    $B goto http://localhost:3000 2>/dev/null && echo "Found app on :3000" || \
    $B goto http://localhost:4000 2>/dev/null && echo "Found app on :4000" || \
    $B goto http://localhost:8080 2>/dev/null && echo "Found app on :8080"
    如果未找到本地应用,检查PR或环境中的预发布/预览URL。如果都不行,询问用户提供URL。
  4. 测试每个受影响的页面/路由:
    • 导航到页面
    • 截图
    • 检查控制台是否有错误
    • 如果变更涉及交互(表单、按钮、流程),端到端测试交互流程
    • 在操作前后使用
      snapshot -D
      验证变更是否达到预期效果
  5. 结合提交信息和PR描述理解变更意图——变更应该实现什么功能?验证实际效果是否符合预期。
  6. 检查TODOS.md(如果存在)中与变更文件相关的已知Bug或问题。如果TODO中描述的Bug是此分支应该修复的,将其添加到测试计划中。如果在QA过程中发现新的Bug且未在TODOS.md中记录,在报告中注明。
  7. 生成针对分支变更的报告:
    • "已测试变更:此分支影响N个页面/路由"
    • 每个页面/路由:是否正常工作?附带截图证据。
    • 相邻页面是否有回归问题?
如果用户在差异感知模式下提供了URL: 使用该URL作为基础,但仍将测试范围限定在变更文件相关的内容。

Full (default when URL is provided)

完整模式(提供URL时默认启用)

Systematic exploration. Visit every reachable page. Document 5-10 well-evidenced issues. Produce health score. Takes 5-15 minutes depending on app size.
系统化探索。访问所有可到达的页面。记录5-10个有充分证据的问题。生成健康评分。根据应用大小,耗时5-15分钟。

Quick (
--quick
)

快速模式(
--quick

30-second smoke test. Visit homepage + top 5 navigation targets. Check: page loads? Console errors? Broken links? Produce health score. No detailed issue documentation.
30秒冒烟测试。访问首页+前5个导航目标。检查:页面是否加载?控制台是否有错误?是否有可见的断链?生成健康评分。不提供详细的问题文档。

Regression (
--regression <baseline>
)

回归模式(
--regression <baseline>

Run full mode, then load
baseline.json
from a previous run. Diff: which issues are fixed? Which are new? What's the score delta? Append regression section to report.

运行完整模式,然后加载之前运行生成的
baseline.json
。对比:哪些问题已修复?哪些是新问题?评分变化是多少?在报告中添加回归分析部分。

Workflow

工作流程

Phase 1: Initialize

阶段1:初始化

  1. Find browse binary (see Setup above)
  2. Create output directories
  3. Copy report template from
    qa/templates/qa-report-template.md
    to output dir
  4. Start timer for duration tracking
  1. 查找browse二进制文件(见上文准备工作)
  2. 创建输出目录
  3. 将报告模板从
    qa/templates/qa-report-template.md
    复制到输出目录
  4. 启动计时器跟踪耗时

Phase 2: Authenticate (if needed)

阶段2:认证(如有需要)

If the user specified auth credentials:
bash
$B goto <login-url>
$B snapshot -i                    # find the login form
$B fill @e3 "user@example.com"
$B fill @e4 "[REDACTED]"         # NEVER include real passwords in report
$B click @e5                      # submit
$B snapshot -D                    # verify login succeeded
If the user provided a cookie file:
bash
$B cookie-import cookies.json
$B goto <target-url>
If 2FA/OTP is required: Ask the user for the code and wait.
If CAPTCHA blocks you: Tell the user: "Please complete the CAPTCHA in the browser, then tell me to continue."
如果用户指定了认证凭据:
bash
$B goto <login-url>
$B snapshot -i                    # 查找登录表单
$B fill @e3 "user@example.com"
$B fill @e4 "[REDACTED]"         # 报告中绝不能包含真实密码
$B click @e5                      # 提交
$B snapshot -D                    # 验证登录是否成功
如果用户提供了cookie文件:
bash
$B cookie-import cookies.json
$B goto <target-url>
如果需要2FA/OTP: 向用户索要验证码并等待。
如果遇到CAPTCHA: 告知用户:"请在浏览器中完成CAPTCHA验证,然后告诉我继续。"

Phase 3: Orient

阶段3:环境熟悉

Get a map of the application:
bash
$B goto <target-url>
$B snapshot -i -a -o "$REPORT_DIR/screenshots/initial.png"
$B links                          # map navigation structure
$B console --errors               # any errors on landing?
Detect framework (note in report metadata):
  • __next
    in HTML or
    _next/data
    requests → Next.js
  • csrf-token
    meta tag → Rails
  • wp-content
    in URLs → WordPress
  • Client-side routing with no page reloads → SPA
For SPAs: The
links
command may return few results because navigation is client-side. Use
snapshot -i
to find nav elements (buttons, menu items) instead.
获取应用的导航地图:
bash
$B goto <target-url>
$B snapshot -i -a -o "$REPORT_DIR/screenshots/initial.png"
$B links                          # 映射导航结构
$B console --errors               # 首页加载是否有错误?
检测框架(在报告元数据中注明):
  • HTML中包含
    __next
    或请求中包含
    _next/data
    → Next.js
  • 包含
    csrf-token
    元标签 → Rails
  • URL中包含
    wp-content
    → WordPress
  • 无页面刷新的客户端路由 → SPA
对于SPA:
links
命令可能返回很少结果,因为导航是客户端的。使用
snapshot -i
查找导航元素(按钮、菜单项)。

Phase 4: Explore

阶段4:探索测试

Visit pages systematically. At each page:
bash
$B goto <page-url>
$B snapshot -i -a -o "$REPORT_DIR/screenshots/page-name.png"
$B console --errors
Then follow the per-page exploration checklist (see
qa/references/issue-taxonomy.md
):
  1. Visual scan — Look at the annotated screenshot for layout issues
  2. Interactive elements — Click buttons, links, controls. Do they work?
  3. Forms — Fill and submit. Test empty, invalid, edge cases
  4. Navigation — Check all paths in and out
  5. States — Empty state, loading, error, overflow
  6. Console — Any new JS errors after interactions?
  7. Responsiveness — Check mobile viewport if relevant:
    bash
    $B viewport 375x812
    $B screenshot "$REPORT_DIR/screenshots/page-mobile.png"
    $B viewport 1280x720
Depth judgment: Spend more time on core features (homepage, dashboard, checkout, search) and less on secondary pages (about, terms, privacy).
Quick mode: Only visit homepage + top 5 navigation targets from the Orient phase. Skip the per-page checklist — just check: loads? Console errors? Broken links visible?
系统化访问页面。在每个页面执行:
bash
$B goto <page-url>
$B snapshot -i -a -o "$REPORT_DIR/screenshots/page-name.png"
$B console --errors
然后遵循每页探索检查清单(见
qa/references/issue-taxonomy.md
):
  1. 视觉扫描——查看带注释的截图,检查布局问题
  2. 交互元素——点击按钮、链接、控件。是否正常工作?
  3. 表单——填写并提交。测试空值、无效值、边缘情况
  4. 导航——检查所有进出路径
  5. 状态——空状态、加载状态、错误状态、溢出状态
  6. 控制台——交互后是否有新的JS错误?
  7. 响应式检查——如果相关,检查移动端视口:
    bash
    $B viewport 375x812
    $B screenshot "$REPORT_DIR/screenshots/page-mobile.png"
    $B viewport 1280x720
深度判断: 在核心功能(首页、仪表盘、结账、搜索)上花费更多时间,在次要页面(关于我们、条款、隐私)上花费较少时间。
快速模式: 仅访问首页+环境熟悉阶段的前5个导航目标。跳过每页检查清单——仅检查:是否加载?控制台是否有错误?是否有可见的断链?

Phase 5: Document

阶段5:问题记录

Document each issue immediately when found — don't batch them.
Two evidence tiers:
Interactive bugs (broken flows, dead buttons, form failures):
  1. Take a screenshot before the action
  2. Perform the action
  3. Take a screenshot showing the result
  4. Use
    snapshot -D
    to show what changed
  5. Write repro steps referencing screenshots
bash
$B screenshot "$REPORT_DIR/screenshots/issue-001-step-1.png"
$B click @e5
$B screenshot "$REPORT_DIR/screenshots/issue-001-result.png"
$B snapshot -D
Static bugs (typos, layout issues, missing images):
  1. Take a single annotated screenshot showing the problem
  2. Describe what's wrong
bash
$B snapshot -i -a -o "$REPORT_DIR/screenshots/issue-002.png"
Write each issue to the report immediately using the template format from
qa/templates/qa-report-template.md
.
发现问题后立即记录——不要批量处理。
两种证据层级:
交互类Bug(流程中断、按钮失效、表单提交失败):
  1. 操作前截图
  2. 执行操作
  3. 操作结果截图
  4. 使用
    snapshot -D
    显示变化内容
  5. 编写引用截图的复现步骤
bash
$B screenshot "$REPORT_DIR/screenshots/issue-001-step-1.png"
$B click @e5
$B screenshot "$REPORT_DIR/screenshots/issue-001-result.png"
$B snapshot -D
静态类Bug(拼写错误、布局问题、图片缺失):
  1. 拍摄一张带注释的截图展示问题
  2. 描述问题
bash
$B snapshot -i -a -o "$REPORT_DIR/screenshots/issue-002.png"
立即将每个问题写入报告,使用
qa/templates/qa-report-template.md
中的模板格式。

Phase 6: Wrap Up

阶段6:收尾

  1. Compute health score using the rubric below
  2. Write "Top 3 Things to Fix" — the 3 highest-severity issues
  3. Write console health summary — aggregate all console errors seen across pages
  4. Update severity counts in the summary table
  5. Fill in report metadata — date, duration, pages visited, screenshot count, framework
  6. Save baseline — write
    baseline.json
    with:
    json
    {
      "date": "YYYY-MM-DD",
      "url": "<target>",
      "healthScore": N,
      "issues": [{ "id": "ISSUE-001", "title": "...", "severity": "...", "category": "..." }],
      "categoryScores": { "console": N, "links": N, ... }
    }
Regression mode: After writing the report, load the baseline file. Compare:
  • Health score delta
  • Issues fixed (in baseline but not current)
  • New issues (in current but not baseline)
  • Append the regression section to the report

  1. 计算健康评分使用以下评分标准
  2. 编写「Top 3待修复问题」——3个最高优先级的问题
  3. 编写控制台健康总结——汇总所有页面中发现的控制台错误
  4. 更新总结表格中的问题严重程度统计
  5. 填写报告元数据——日期、耗时、访问页面数、截图数量、框架
  6. 保存基准数据——写入
    baseline.json
    ,内容如下:
    json
    {
      "date": "YYYY-MM-DD",
      "url": "<target>",
      "healthScore": N,
      "issues": [{ "id": "ISSUE-001", "title": "...", "severity": "...", "category": "..." }],
      "categoryScores": { "console": N, "links": N, ... }
    }
回归模式: 生成报告后,加载基准文件。对比:
  • 健康评分变化
  • 已修复的问题(基准中有但当前没有)
  • 新出现的问题(当前有但基准中没有)
  • 在报告中添加回归分析部分

Health Score Rubric

健康评分标准

Compute each category score (0-100), then take the weighted average.
计算每个类别的得分(0-100),然后取加权平均值。

Console (weight: 15%)

控制台(权重:15%)

  • 0 errors → 100
  • 1-3 errors → 70
  • 4-10 errors → 40
  • 10+ errors → 10
  • 0个错误 → 100分
  • 1-3个错误 → 70分
  • 4-10个错误 → 40分
  • 10个以上错误 → 10分

Links (weight: 10%)

链接(权重:10%)

  • 0 broken → 100
  • Each broken link → -15 (minimum 0)
  • 0个断链 → 100分
  • 每个断链扣15分(最低0分)

Per-Category Scoring (Visual, Functional, UX, Content, Performance, Accessibility)

分类评分(视觉、功能、UX、内容、性能、可访问性)

Each category starts at 100. Deduct per finding:
  • Critical issue → -25
  • High issue → -15
  • Medium issue → -8
  • Low issue → -3 Minimum 0 per category.
每个类别起始分为100分。根据发现的问题扣分:
  • 严重问题 → 扣25分
  • 高优先级问题 → 扣15分
  • 中优先级问题 → 扣8分
  • 低优先级问题 → 扣3分 每个类别最低0分。

Weights

权重分配

CategoryWeight
Console15%
Links10%
Visual10%
Functional20%
UX15%
Performance10%
Content5%
Accessibility15%
类别权重
控制台15%
链接10%
视觉10%
功能20%
UX15%
性能10%
内容5%
可访问性15%

Final Score

最终得分

score = Σ (category_score × weight)

score = Σ (category_score × weight)

Framework-Specific Guidance

框架特定指南

Next.js

Next.js

  • Check console for hydration errors (
    Hydration failed
    ,
    Text content did not match
    )
  • Monitor
    _next/data
    requests in network — 404s indicate broken data fetching
  • Test client-side navigation (click links, don't just
    goto
    ) — catches routing issues
  • Check for CLS (Cumulative Layout Shift) on pages with dynamic content
  • 检查控制台是否有水合错误(
    Hydration failed
    Text content did not match
  • 监控网络中的
    _next/data
    请求——404表示数据获取失败
  • 测试客户端导航(点击链接,不要仅用
    goto
    )——发现路由问题
  • 检查动态内容页面的CLS(累积布局偏移)

Rails

Rails

  • Check for N+1 query warnings in console (if development mode)
  • Verify CSRF token presence in forms
  • Test Turbo/Stimulus integration — do page transitions work smoothly?
  • Check for flash messages appearing and dismissing correctly
  • 检查控制台是否有N+1查询警告(如果是开发模式)
  • 验证表单中是否存在CSRF令牌
  • 测试Turbo/Stimulus集成——页面过渡是否流畅?
  • 检查Flash消息是否正确显示和消失

WordPress

WordPress

  • Check for plugin conflicts (JS errors from different plugins)
  • Verify admin bar visibility for logged-in users
  • Test REST API endpoints (
    /wp-json/
    )
  • Check for mixed content warnings (common with WP)
  • 检查插件冲突(来自不同插件的JS错误)
  • 验证登录用户的管理栏是否可见
  • 测试REST API端点(
    /wp-json/
  • 检查混合内容警告(在WordPress中很常见)

General SPA (React, Vue, Angular)

通用SPA(React、Vue、Angular)

  • Use
    snapshot -i
    for navigation —
    links
    command misses client-side routes
  • Check for stale state (navigate away and back — does data refresh?)
  • Test browser back/forward — does the app handle history correctly?
  • Check for memory leaks (monitor console after extended use)

  • 使用
    snapshot -i
    进行导航——
    links
    命令会遗漏客户端路由
  • 检查状态是否过期(导航离开再返回——数据是否刷新?)
  • 测试浏览器前进/后退按钮——应用是否正确处理历史记录?
  • 检查内存泄漏(长时间使用后监控控制台)

Important Rules

重要规则

  1. Repro is everything. Every issue needs at least one screenshot. No exceptions.
  2. Verify before documenting. Retry the issue once to confirm it's reproducible, not a fluke.
  3. Never include credentials. Write
    [REDACTED]
    for passwords in repro steps.
  4. Write incrementally. Append each issue to the report as you find it. Don't batch.
  5. Never read source code. Test as a user, not a developer.
  6. Check console after every interaction. JS errors that don't surface visually are still bugs.
  7. Test like a user. Use realistic data. Walk through complete workflows end-to-end.
  8. Depth over breadth. 5-10 well-documented issues with evidence > 20 vague descriptions.
  9. Never delete output files. Screenshots and reports accumulate — that's intentional.
  10. Use
    snapshot -C
    for tricky UIs.
    Finds clickable divs that the accessibility tree misses.
Record baseline health score at end of Phase 6.

  1. 复现是关键。每个问题至少需要一张截图。无例外。
  2. 记录前先验证。重试一次问题以确认可复现,而非偶然情况。
  3. 绝不能包含凭据。在复现步骤中用
    [REDACTED]
    代替密码。
  4. 逐步记录。发现问题后立即添加到报告中。不要批量处理。
  5. 不要阅读源代码。以用户身份测试,而非开发者。
  6. 每次交互后检查控制台。未在视觉上表现出来的JS错误也是Bug。
  7. 像用户一样测试。使用真实数据。端到端完成完整工作流程。
  8. 深度优先于广度。5-10个有充分证据的问题 > 20个模糊描述的问题。
  9. 不要删除输出文件。截图和报告会累积——这是有意设计的。
  10. 对复杂UI使用
    snapshot -C
    。找到可访问性树中未包含的可点击div。
在阶段6结束时记录基准健康评分。

Output Structure

输出结构

.gstack/qa-reports/
├── qa-report-{domain}-{YYYY-MM-DD}.md    # Structured report
├── screenshots/
│   ├── initial.png                        # Landing page annotated screenshot
│   ├── issue-001-step-1.png               # Per-issue evidence
│   ├── issue-001-result.png
│   ├── issue-001-before.png               # Before fix (if fixed)
│   ├── issue-001-after.png                # After fix (if fixed)
│   └── ...
└── baseline.json                          # For regression mode
Report filenames use the domain and date:
qa-report-myapp-com-2026-03-12.md

.gstack/qa-reports/
├── qa-report-{domain}-{YYYY-MM-DD}.md    # 结构化报告
├── screenshots/
│   ├── initial.png                        # 首页带注释截图
│   ├── issue-001-step-1.png               # 每个问题的证据
│   ├── issue-001-result.png
│   ├── issue-001-before.png               # 修复前截图(如果已修复)
│   ├── issue-001-after.png                # 修复后截图(如果已修复)
│   └── ...
└── baseline.json                          # 用于回归模式
报告文件名使用域名和日期:
qa-report-myapp-com-2026-03-12.md

Phase 7: Triage

阶段7:问题分类

Sort all discovered issues by severity, then decide which to fix based on the selected tier:
  • Quick: Fix critical + high only. Mark medium/low as "deferred."
  • Standard: Fix critical + high + medium. Mark low as "deferred."
  • Exhaustive: Fix all, including cosmetic/low severity.
Mark issues that cannot be fixed from source code (e.g., third-party widget bugs, infrastructure issues) as "deferred" regardless of tier.

将所有发现的问题按严重程度排序,然后根据所选层级决定修复哪些问题:
  • 快速版: 仅修复严重+高优先级问题。将中/低优先级问题标记为「暂缓」。
  • 标准版: 修复严重+高+中优先级问题。将低优先级问题标记为「暂缓」。
  • 全面版: 修复所有问题,包括外观/低优先级问题。
将无法通过源代码修复的问题(例如第三方组件Bug、基础设施问题)标记为「暂缓」,无论层级如何。

Phase 8: Fix Loop

阶段8:修复循环

For each fixable issue, in severity order:
按严重程度顺序处理每个可修复的问题:

8a. Locate source

8a. 定位源代码

bash
undefined
bash
undefined

Grep for error messages, component names, route definitions

搜索错误消息、组件名称、路由定义

Glob for file patterns matching the affected page

匹配受影响页面的文件模式


- Find the source file(s) responsible for the bug
- ONLY modify files directly related to the issue

- 找到导致Bug的源文件
- 仅修改与问题直接相关的文件

8b. Fix

8b. 修复

  • Read the source code, understand the context
  • Make the minimal fix — smallest change that resolves the issue
  • Do NOT refactor surrounding code, add features, or "improve" unrelated things
  • 阅读源代码,理解上下文
  • 进行最小修复——解决问题的最小变更
  • 不要重构周围代码、添加功能或「改进」无关内容

8c. Commit

8c. 提交

bash
git add <only-changed-files>
git commit -m "fix(qa): ISSUE-NNN — short description"
  • One commit per fix. Never bundle multiple fixes.
  • Message format:
    fix(qa): ISSUE-NNN — short description
bash
git add <only-changed-files>
git commit -m "fix(qa): ISSUE-NNN — short description"
  • 每个修复对应一次提交。绝不要将多个修复捆绑在一起。
  • 提交消息格式:
    fix(qa): ISSUE-NNN — 简短描述

8d. Re-test

8d. 重新测试

  • Navigate back to the affected page
  • Take before/after screenshot pair
  • Check console for errors
  • Use
    snapshot -D
    to verify the change had the expected effect
bash
$B goto <affected-url>
$B screenshot "$REPORT_DIR/screenshots/issue-NNN-after.png"
$B console --errors
$B snapshot -D
  • 导航回受影响的页面
  • 拍摄修复前后截图对比
  • 检查控制台是否有错误
  • 使用
    snapshot -D
    验证变更是否达到预期效果
bash
$B goto <affected-url>
$B screenshot "$REPORT_DIR/screenshots/issue-NNN-after.png"
$B console --errors
$B snapshot -D

8e. Classify

8e. 分类

  • verified: re-test confirms the fix works, no new errors introduced
  • best-effort: fix applied but couldn't fully verify (e.g., needs auth state, external service)
  • reverted: regression detected →
    git revert HEAD
    → mark issue as "deferred"
  • 已验证: 重新测试确认修复有效,未引入新错误
  • 尽力修复: 已应用修复但无法完全验证(例如需要认证状态、外部服务)
  • 已回退: 发现回归问题 →
    git revert HEAD
    → 将问题标记为「暂缓」

8f. Self-Regulation (STOP AND EVALUATE)

8f. 自我调节(停止并评估)

Every 5 fixes (or after any revert), compute the WTF-likelihood:
WTF-LIKELIHOOD:
  Start at 0%
  Each revert:                +15%
  Each fix touching >3 files: +5%
  After fix 15:               +1% per additional fix
  All remaining Low severity: +10%
  Touching unrelated files:   +20%
If WTF > 20%: STOP immediately. Show the user what you've done so far. Ask whether to continue.
Hard cap: 50 fixes. After 50 fixes, stop regardless of remaining issues.

每修复5个问题(或任何回退操作后),计算WTF可能性:
WTF-LIKELIHOOD:
  初始值0%
  每次回退:                +15%
  每个修复涉及超过3个文件: +5%
  修复第15个问题后:        每多修复一个问题+1%
  剩余所有低优先级问题:    +10%
  修改无关文件:            +20%
如果WTF > 20%: 立即停止。向用户展示已完成的工作。询问是否继续。
硬限制:50个修复。修复50个问题后,无论剩余多少问题,都停止。

Phase 9: Final QA

阶段9:最终QA

After all fixes are applied:
  1. Re-run QA on all affected pages
  2. Compute final health score
  3. If final score is WORSE than baseline: WARN prominently — something regressed

所有修复完成后:
  1. 重新测试所有受影响的页面
  2. 计算最终健康评分
  3. 如果最终评分比基准差: 突出警告——出现了回归问题

Phase 10: Report

阶段10:生成报告

Write the report to both local and project-scoped locations:
Local:
.gstack/qa-reports/qa-report-{domain}-{YYYY-MM-DD}.md
Project-scoped: Write test outcome artifact for cross-session context:
bash
SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-')
mkdir -p ~/.gstack/projects/$SLUG
Write to
~/.gstack/projects/{slug}/{user}-{branch}-test-outcome-{datetime}.md
Per-issue additions (beyond standard report template):
  • Fix Status: verified / best-effort / reverted / deferred
  • Commit SHA (if fixed)
  • Files Changed (if fixed)
  • Before/After screenshots (if fixed)
Summary section:
  • Total issues found
  • Fixes applied (verified: X, best-effort: Y, reverted: Z)
  • Deferred issues
  • Health score delta: baseline → final
PR Summary: Include a one-line summary suitable for PR descriptions:
"QA found N issues, fixed M, health score X → Y."

将报告写入本地和项目级位置:
本地:
.gstack/qa-reports/qa-report-{domain}-{YYYY-MM-DD}.md
项目级: 编写测试结果工件用于跨会话上下文:
bash
SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-')
mkdir -p ~/.gstack/projects/$SLUG
写入
~/.gstack/projects/{slug}/{user}-{branch}-test-outcome-{datetime}.md
每个问题的额外内容(标准报告模板之外):
  • 修复状态:已验证 / 尽力修复 / 已回退 / 暂缓
  • 提交SHA(如果已修复)
  • 修改的文件(如果已修复)
  • 修复前后截图(如果已修复)
总结部分:
  • 发现的问题总数
  • 已应用的修复(已验证:X,尽力修复:Y,已回退:Z)
  • 暂缓的问题
  • 健康评分变化:基准→最终
PR总结: 包含适合PR描述的一句话总结:
"QA发现N个问题,修复M个,健康评分从X提升至Y。"

Phase 11: TODOS.md Update

阶段11:更新TODOS.md

If the repo has a
TODOS.md
:
  1. New deferred bugs → add as TODOs with severity, category, and repro steps
  2. Fixed bugs that were in TODOS.md → annotate with "Fixed by /qa on {branch}, {date}"

如果仓库中有
TODOS.md
  1. 新的暂缓Bug → 添加为TODO,包含严重程度、类别和复现步骤
  2. 已修复且在TODOS.md中存在的Bug → 标注「由/qa在{branch}分支于{date}修复」

Additional Rules (qa-specific)

额外规则(QA特定)

  1. Clean working tree required. Refuse to start if
    git status --porcelain
    is non-empty.
  2. One commit per fix. Never bundle multiple fixes into one commit.
  3. Never modify tests or CI configuration. Only fix application source code.
  4. Revert on regression. If a fix makes things worse,
    git revert HEAD
    immediately.
  5. Self-regulate. Follow the WTF-likelihood heuristic. When in doubt, stop and ask.
  1. 要求工作目录干净。如果
    git status --porcelain
    非空,拒绝开始。
  2. 每个修复对应一次提交。绝不要将多个修复合并到一个提交中。
  3. 不要修改测试或CI配置。仅修复应用源代码。
  4. 出现回归立即回退。如果修复导致问题更严重,立即
    git revert HEAD
  5. 自我调节。遵循WTF可能性启发式。如有疑问,停止并询问用户。