test-evidence-review
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTest Evidence Review
测试证据评审
/smoke-checkOutput: Summary report (in conversation) + optional
production/qa/evidence-review-[date].mdWhen to run:
- Before QA hand-off sign-off (Phase 5)
/team-qa - On any story where test quality is in question
- As part of milestone review for Logic and Integration story quality audit
/smoke-check输出:对话中的总结报告 + 可选的文件
production/qa/evidence-review-[date].md运行时机:
- QA交接签字确认前(第5阶段)
/team-qa - 任何测试质量存疑的story
- 作为里程碑评审的一部分,用于Logic和Integration类型story的质量审计
1. Parse Arguments
1. 解析参数
Modes:
- — review a single story's evidence
/test-evidence-review [story-path] - — review all stories in the current sprint
/test-evidence-review sprint - — review all stories in an epic/system
/test-evidence-review [system-name] - No argument — ask which scope: "Single story", "Current sprint", "A system"
模式:
- —— 评审单个story的证据
/test-evidence-review [story-path] - —— 评审当前sprint内的所有story
/test-evidence-review sprint - —— 评审某个epic/system下的所有story
/test-evidence-review [system-name] - 无参数 —— 询问评审范围:「单个story」「当前sprint」「某个system」
2. Load Stories in Scope
2. 加载范围内的Story
Based on the argument:
Single story: Read the story file directly. Extract: Story Type, Test
Evidence section, story slug, system name.
Sprint: Read the most recently modified file in .
Extract the list of story file paths from the sprint plan. Read each story file.
production/sprints/System: Glob . Read each.
production/epics/[system-name]/story-*.mdFor each story, collect:
- field (Logic / Integration / Visual/Feel / UI / Config/Data)
Type: - section — the stated expected test file path or evidence doc
## Test Evidence - Story slug (from file name)
- System name (from directory path)
- Acceptance Criteria list (all checkbox items)
根据参数执行:
单个story:直接读取story文件,提取:Story类型、测试证据章节、story别名(slug)、system名称。
Sprint:读取中最近修改的文件,从sprint计划中提取所有story文件路径,逐个读取。
production/sprints/System:匹配文件,逐个读取。
production/epics/[system-name]/story-*.md针对每个story,收集以下信息:
- 字段(Logic / Integration / Visual/Feel / UI / Config/Data)
Type: - 章节——指定的预期测试文件路径或证据文档
## Test Evidence - Story别名(从文件名提取)
- System名称(从目录路径提取)
- 验收标准列表(所有复选框项)
3. Locate Evidence Files
3. 定位证据文件
For each story, find the evidence:
Logic stories: Glob
tests/unit/[system]/[story-slug]_test.*- If not found, also try: Grep in for files containing the story slug
tests/unit/[system]/
Integration stories: Glob
tests/integration/[system]/[story-slug]_test.*- Also check for playtest records mentioning the story
production/session-logs/
Visual/Feel and UI stories: Glob
production/qa/evidence/[story-slug]-evidence.*Config/Data stories: Glob (any smoke check report)
production/qa/smoke-*.mdNote what was found (path) or not found (gap) for each story.
针对每个story,查找对应证据:
Logic类型story:匹配
tests/unit/[system]/[story-slug]_test.*- 若未找到,尝试在目录下搜索包含该story别名的文件
tests/unit/[system]/
Integration类型story:匹配
tests/integration/[system]/[story-slug]_test.*- 同时检查中提及该story的测试记录
production/session-logs/
Visual/Feel和UI类型story:匹配
production/qa/evidence/[story-slug]-evidence.*Config/Data类型story:匹配(任意冒烟测试报告)
production/qa/smoke-*.md记录每个story找到的文件路径(或未找到的缺口)。
4. Review Automated Test Quality (Logic / Integration)
4. 评审自动化测试质量(Logic / Integration类型)
For each test file found, read it and evaluate:
针对找到的每个测试文件,读取并评估:
Assertion coverage
断言覆盖率
Count the number of distinct assertions (lines containing assert, expect,
check, verify, or engine-specific assertion patterns). Low assertion count is
a quality signal — a test that makes only 1 assertion per test function may
not cover the range of expected behaviour.
Thresholds:
- 3+ assertions per test function → normal
- 1-2 assertions per test function → note as potentially thin
- 0 assertions (test exists but no asserts) → flag as BLOCKING — the test passes vacuously and proves nothing
统计不同断言的数量(包含assert、expect、check、verify或引擎特定断言模式的行)。断言数量少是质量信号——每个测试函数仅含1个断言可能未覆盖预期行为的全部范围。
阈值:
- 每个测试函数含3个及以上断言 → 正常
- 每个测试函数含1-2个断言 → 标记为可能覆盖不足
- 0个断言(测试文件存在但无断言) → 标记为BLOCKING——此类测试无实际验证作用,仅能空通过
Edge case coverage
边界场景覆盖率
For each acceptance criterion in the story that contains a number, threshold,
or "when X happens" conditional: check whether a test function name or
test body references that specific case.
Heuristics:
- Grep test file for "zero", "max", "null", "empty", "min", "invalid", "boundary", "edge" — presence of any is a positive signal
- If the story has a Formulas section with specific bounds: check whether tests exercise at minimum/maximum values
针对story中包含数字、阈值或「当X发生时」条件的验收标准,检查测试函数名称或测试体是否引用了该特定场景。
判断规则:
- 在测试文件中搜索"zero"、"max"、"null"、"empty"、"min"、"invalid"、"boundary"、"edge"——出现任意关键词即为积极信号
- 若story的Formulas章节包含特定边界值,检查测试是否至少覆盖了最小值/最大值
Naming quality
命名质量
Test function names should describe: the scenario + the expected result.
Pattern:
test_[scenario]_[expected_outcome]Flag functions named generically (, , ) as
naming issues — they make failures harder to diagnose.
test_1test_runtestBasic测试函数名称应描述:场景 + 预期结果。格式:
test_[scenario]_[expected_outcome]将通用命名的函数(如、、)标记为命名问题——此类命名会增加故障排查难度。
test_1test_runtestBasicFormula traceability
公式可追溯性
For Logic stories where the GDD has a Formulas section: check that the test
file contains at least one test whose name or comment references the formula
name or a formula value. A test that exercises a formula without mentioning
it by name is harder to maintain when the formula changes.
对于GDD包含Formulas章节的Logic类型story,检查测试文件是否至少有一个测试的名称或注释引用了公式名称或公式值。若测试执行了公式但未提及公式名称,公式变更时将更难维护。
5. Review Manual Evidence Quality (Visual/Feel / UI)
5. 评审手动证据质量(Visual/Feel / UI类型)
For each evidence document found, read it and evaluate:
针对找到的每个证据文档,读取并评估:
Criterion linkage
标准关联度
The evidence doc should reference each acceptance criterion from the story.
Check: does the evidence doc contain each criterion (or a clear rephrasing)?
Missing criteria mean a criterion was never verified.
证据文档应引用story中的每一条验收标准。检查:证据文档是否包含每条标准(或清晰的重述)?缺失的标准意味着该标准从未被验证。
Sign-off completeness
签字完整性
Check for three sign-off lines (or equivalent fields):
- Developer sign-off
- Designer / art-lead sign-off (for Visual/Feel)
- QA lead sign-off
If any are missing or blank: flag as INCOMPLETE — the story cannot be fully
closed without all required sign-offs.
检查是否存在三条签字确认记录(或等效字段):
- 开发人员签字
- 设计师/美术负责人签字(针对Visual/Feel类型)
- QA负责人签字
若任意签字缺失或未填写:标记为INCOMPLETE——缺少所有必要签字的story无法完全关闭。
Screenshot / artefact completeness
截图/工件完整性
For Visual/Feel stories: check whether screenshot file paths are referenced
in the evidence doc. If referenced, Glob for them to confirm they exist.
For UI stories: check whether a walkthrough sequence (step-by-step interaction
log) is present.
针对Visual/Feel类型story:检查证据文档中是否引用了截图文件路径。若有引用,匹配路径确认文件是否存在。
针对UI类型story:检查是否包含分步操作日志(walkthrough sequence)。
Date coverage
时效性
Evidence doc should have a date. If the date is earlier than the story's
last major change (heuristic: compare against sprint start date from the sprint
plan), flag as POTENTIALLY STALE — the evidence may not cover the final
implementation.
证据文档应包含日期。若日期早于story的最后一次重大变更时间(判断规则:与sprint计划中的sprint开始日期对比),标记为POTENTIALLY STALE——该证据可能未覆盖最终实现。
6. Build the Review Report
6. 生成评审报告
For each story, assign a verdict:
| Verdict | Meaning |
|---|---|
| ADEQUATE | Test/evidence exists, passes quality checks, all criteria covered |
| INCOMPLETE | Test/evidence exists but has quality gaps (thin assertions, missing sign-offs) |
| MISSING | No test or evidence found for a story type that requires it |
The overall sprint/system verdict is the worst story verdict present.
markdown
undefined针对每个story,给出评审结论:
| 结论 | 含义 |
|---|---|
| ADEQUATE | 测试/证据存在,通过质量检查,覆盖所有标准 |
| INCOMPLETE | 测试/证据存在,但存在质量缺口(断言覆盖不足、缺失签字等) |
| MISSING | 对应类型的story未找到测试或证据 |
整个sprint/system的最终结论为所有story中的最差结论。
markdown
undefinedTest Evidence Review
Test Evidence Review
Date: [date] Scope: [single story path | Sprint [N] | [system name]] Stories reviewed: [N] Overall verdict: ADEQUATE / INCOMPLETE / MISSING
Date: [date] Scope: [single story path | Sprint [N] | [system name]] Stories reviewed: [N] Overall verdict: ADEQUATE / INCOMPLETE / MISSING
Story-by-Story Results
Story-by-Story Results
[Story Title] — [Type] — [ADEQUATE/INCOMPLETE/MISSING]
[Story Title] — [Type] — [ADEQUATE/INCOMPLETE/MISSING]
Test/evidence path: (found) / (not found)
[path]Automated test quality (Logic/Integration only):
- Assertion coverage: [N per function on average] — [adequate / thin / none]
- Edge cases: [covered / partial / not found]
- Naming: [consistent / [N] generic names flagged]
- Formula traceability: [yes / no — formula names not referenced in tests]
Manual evidence quality (Visual/Feel/UI only):
- Criterion linkage: [N/M criteria referenced]
- Sign-offs: [Developer ✓ | Designer ✗ | QA Lead ✗]
- Artefacts: [screenshots present / missing / N/A]
- Freshness: [dated [date] — current / potentially stale]
Issues:
- BLOCKING: [description] (prevents story-done)
- ADVISORY: [description] (should fix before release)
Test/evidence path: (found) / (not found)
[path]Automated test quality (Logic/Integration only):
- Assertion coverage: [N per function on average] — [adequate / thin / none]
- Edge cases: [covered / partial / not found]
- Naming: [consistent / [N] generic names flagged]
- Formula traceability: [yes / no — formula names not referenced in tests]
Manual evidence quality (Visual/Feel/UI only):
- Criterion linkage: [N/M criteria referenced]
- Sign-offs: [Developer ✓ | Designer ✗ | QA Lead ✗]
- Artefacts: [screenshots present / missing / N/A]
- Freshness: [dated [date] — current / potentially stale]
Issues:
- BLOCKING: [description] (prevents story-done)
- ADVISORY: [description] (should fix before release)
Summary
Summary
| Story | Type | Verdict | Issues |
|---|---|---|---|
| [title] | Logic | ADEQUATE | None |
| [title] | Integration | INCOMPLETE | Thin assertions (avg 1.2/function) |
| [title] | Visual/Feel | INCOMPLETE | QA lead sign-off missing |
| [title] | Logic | MISSING | No test file found |
BLOCKING items (must resolve before story can be closed): [N]
ADVISORY items (should address before release): [N]
---| Story | Type | Verdict | Issues |
|---|---|---|---|
| [title] | Logic | ADEQUATE | None |
| [title] | Integration | INCOMPLETE | Thin assertions (avg 1.2/function) |
| [title] | Visual/Feel | INCOMPLETE | QA lead sign-off missing |
| [title] | Logic | MISSING | No test file found |
BLOCKING items (must resolve before story can be closed): [N]
ADVISORY items (should address before release): [N]
---7. Write Output (Optional)
7. 输出结果(可选)
Present the report in conversation.
Ask: "May I write this test evidence review to
?"
production/qa/evidence-review-[date].mdThis is optional — the report is useful standalone. Write only if the user
wants a persistent record.
After the report:
- For BLOCKING items: "These must be resolved before can mark the story Complete. Would you like to address any of them now?"
/story-done - For thin assertions: "Consider running to see scaffolded assertion patterns for common cases."
/test-helpers [system] - For missing sign-offs: "Manual sign-off is required from [role]. Share
with them to complete sign-off."
[evidence-path]
Verdict: COMPLETE — evidence review finished. Use CONCERNS if BLOCKING items were found.
在对话中展示报告。
询问:「是否将此测试证据评审写入?」
production/qa/evidence-review-[date].md此步骤为可选——报告本身可独立使用,仅在用户需要持久记录时写入。
报告展示后:
- 针对BLOCKING项:「这些问题必须解决后,才能标记该story为完成。是否现在处理其中某些问题?」
/story-done - 针对断言覆盖不足:「建议运行查看常见场景的脚手架断言模板。」
/test-helpers [system] - 针对缺失签字:「需要[角色]的手动签字确认。请将分享给对方完成签字。」
[evidence-path]
结论:COMPLETE——证据评审完成。若发现BLOCKING项,则标记为CONCERNS。
Collaborative Protocol
协作规则
- Report quality issues, do not fix them — this skill reads and evaluates; it does not modify test files or evidence documents
- ADEQUATE means adequate for shipping, not perfect — avoid nitpicking tests that are functioning and comprehensive enough to give confidence
- BLOCKING vs. ADVISORY distinction is important — only flag BLOCKING when the gap leaves a story criterion genuinely unverified
- Ask before writing — the report file is optional; always confirm before writing
- 仅报告质量问题,不修复——本Skill仅读取和评估,不修改测试文件或证据文档
- ADEQUATE表示满足发布要求,而非完美——对于功能正常、覆盖足够的测试,避免过度挑剔
- 明确区分BLOCKING与ADVISORY——仅当缺口导致story标准未被真正验证时,才标记为BLOCKING
- 写入前需确认——报告文件为可选,写入前务必征得用户同意