test-evidence-review

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Test Evidence Review

测试证据评审

/smoke-check
verifies that test files exist and pass. This skill goes further — it reviews the quality of those tests and evidence documents. A test file that exists and passes may still leave critical behaviour uncovered. A manual evidence doc that exists may lack the sign-offs required for closure.
Output: Summary report (in conversation) + optional
production/qa/evidence-review-[date].md
When to run:
  • Before QA hand-off sign-off (
    /team-qa
    Phase 5)
  • On any story where test quality is in question
  • As part of milestone review for Logic and Integration story quality audit

/smoke-check
用于验证测试文件存在可通过。本Skill在此基础上更进一步——评审这些测试与证据文档的质量。存在且能通过的测试文件仍可能未覆盖关键业务逻辑,存在的手动证据文档也可能缺少关闭需求所需的签字确认。
输出:对话中的总结报告 + 可选的
production/qa/evidence-review-[date].md
文件
运行时机
  • QA交接签字确认前(
    /team-qa
    第5阶段)
  • 任何测试质量存疑的story
  • 作为里程碑评审的一部分,用于Logic和Integration类型story的质量审计

1. Parse Arguments

1. 解析参数

Modes:
  • /test-evidence-review [story-path]
    — review a single story's evidence
  • /test-evidence-review sprint
    — review all stories in the current sprint
  • /test-evidence-review [system-name]
    — review all stories in an epic/system
  • No argument — ask which scope: "Single story", "Current sprint", "A system"

模式
  • /test-evidence-review [story-path]
    —— 评审单个story的证据
  • /test-evidence-review sprint
    —— 评审当前sprint内的所有story
  • /test-evidence-review [system-name]
    —— 评审某个epic/system下的所有story
  • 无参数 —— 询问评审范围:「单个story」「当前sprint」「某个system」

2. Load Stories in Scope

2. 加载范围内的Story

Based on the argument:
Single story: Read the story file directly. Extract: Story Type, Test Evidence section, story slug, system name.
Sprint: Read the most recently modified file in
production/sprints/
. Extract the list of story file paths from the sprint plan. Read each story file.
System: Glob
production/epics/[system-name]/story-*.md
. Read each.
For each story, collect:
  • Type:
    field (Logic / Integration / Visual/Feel / UI / Config/Data)
  • ## Test Evidence
    section — the stated expected test file path or evidence doc
  • Story slug (from file name)
  • System name (from directory path)
  • Acceptance Criteria list (all checkbox items)

根据参数执行:
单个story:直接读取story文件,提取:Story类型、测试证据章节、story别名(slug)、system名称。
Sprint:读取
production/sprints/
中最近修改的文件,从sprint计划中提取所有story文件路径,逐个读取。
System:匹配
production/epics/[system-name]/story-*.md
文件,逐个读取。
针对每个story,收集以下信息:
  • Type:
    字段(Logic / Integration / Visual/Feel / UI / Config/Data)
  • ## Test Evidence
    章节——指定的预期测试文件路径或证据文档
  • Story别名(从文件名提取)
  • System名称(从目录路径提取)
  • 验收标准列表(所有复选框项)

3. Locate Evidence Files

3. 定位证据文件

For each story, find the evidence:
Logic stories: Glob
tests/unit/[system]/[story-slug]_test.*
  • If not found, also try: Grep in
    tests/unit/[system]/
    for files containing the story slug
Integration stories: Glob
tests/integration/[system]/[story-slug]_test.*
  • Also check
    production/session-logs/
    for playtest records mentioning the story
Visual/Feel and UI stories: Glob
production/qa/evidence/[story-slug]-evidence.*
Config/Data stories: Glob
production/qa/smoke-*.md
(any smoke check report)
Note what was found (path) or not found (gap) for each story.

针对每个story,查找对应证据:
Logic类型story:匹配
tests/unit/[system]/[story-slug]_test.*
  • 若未找到,尝试在
    tests/unit/[system]/
    目录下搜索包含该story别名的文件
Integration类型story:匹配
tests/integration/[system]/[story-slug]_test.*
  • 同时检查
    production/session-logs/
    中提及该story的测试记录
Visual/Feel和UI类型story:匹配
production/qa/evidence/[story-slug]-evidence.*
Config/Data类型story:匹配
production/qa/smoke-*.md
(任意冒烟测试报告)
记录每个story找到的文件路径(或未找到的缺口)。

4. Review Automated Test Quality (Logic / Integration)

4. 评审自动化测试质量(Logic / Integration类型)

For each test file found, read it and evaluate:
针对找到的每个测试文件,读取并评估:

Assertion coverage

断言覆盖率

Count the number of distinct assertions (lines containing assert, expect, check, verify, or engine-specific assertion patterns). Low assertion count is a quality signal — a test that makes only 1 assertion per test function may not cover the range of expected behaviour.
Thresholds:
  • 3+ assertions per test function → normal
  • 1-2 assertions per test function → note as potentially thin
  • 0 assertions (test exists but no asserts) → flag as BLOCKING — the test passes vacuously and proves nothing
统计不同断言的数量(包含assert、expect、check、verify或引擎特定断言模式的行)。断言数量少是质量信号——每个测试函数仅含1个断言可能未覆盖预期行为的全部范围。
阈值:
  • 每个测试函数含3个及以上断言 → 正常
  • 每个测试函数含1-2个断言 → 标记为可能覆盖不足
  • 0个断言(测试文件存在但无断言) → 标记为BLOCKING——此类测试无实际验证作用,仅能空通过

Edge case coverage

边界场景覆盖率

For each acceptance criterion in the story that contains a number, threshold, or "when X happens" conditional: check whether a test function name or test body references that specific case.
Heuristics:
  • Grep test file for "zero", "max", "null", "empty", "min", "invalid", "boundary", "edge" — presence of any is a positive signal
  • If the story has a Formulas section with specific bounds: check whether tests exercise at minimum/maximum values
针对story中包含数字、阈值或「当X发生时」条件的验收标准,检查测试函数名称或测试体是否引用了该特定场景。
判断规则:
  • 在测试文件中搜索"zero"、"max"、"null"、"empty"、"min"、"invalid"、"boundary"、"edge"——出现任意关键词即为积极信号
  • 若story的Formulas章节包含特定边界值,检查测试是否至少覆盖了最小值/最大值

Naming quality

命名质量

Test function names should describe: the scenario + the expected result. Pattern:
test_[scenario]_[expected_outcome]
Flag functions named generically (
test_1
,
test_run
,
testBasic
) as naming issues — they make failures harder to diagnose.
测试函数名称应描述:场景 + 预期结果。格式:
test_[scenario]_[expected_outcome]
将通用命名的函数(如
test_1
test_run
testBasic
)标记为命名问题——此类命名会增加故障排查难度。

Formula traceability

公式可追溯性

For Logic stories where the GDD has a Formulas section: check that the test file contains at least one test whose name or comment references the formula name or a formula value. A test that exercises a formula without mentioning it by name is harder to maintain when the formula changes.

对于GDD包含Formulas章节的Logic类型story,检查测试文件是否至少有一个测试的名称或注释引用了公式名称或公式值。若测试执行了公式但未提及公式名称,公式变更时将更难维护。

5. Review Manual Evidence Quality (Visual/Feel / UI)

5. 评审手动证据质量(Visual/Feel / UI类型)

For each evidence document found, read it and evaluate:
针对找到的每个证据文档,读取并评估:

Criterion linkage

标准关联度

The evidence doc should reference each acceptance criterion from the story. Check: does the evidence doc contain each criterion (or a clear rephrasing)? Missing criteria mean a criterion was never verified.
证据文档应引用story中的每一条验收标准。检查:证据文档是否包含每条标准(或清晰的重述)?缺失的标准意味着该标准从未被验证。

Sign-off completeness

签字完整性

Check for three sign-off lines (or equivalent fields):
  • Developer sign-off
  • Designer / art-lead sign-off (for Visual/Feel)
  • QA lead sign-off
If any are missing or blank: flag as INCOMPLETE — the story cannot be fully closed without all required sign-offs.
检查是否存在三条签字确认记录(或等效字段):
  • 开发人员签字
  • 设计师/美术负责人签字(针对Visual/Feel类型)
  • QA负责人签字
若任意签字缺失或未填写:标记为INCOMPLETE——缺少所有必要签字的story无法完全关闭。

Screenshot / artefact completeness

截图/工件完整性

For Visual/Feel stories: check whether screenshot file paths are referenced in the evidence doc. If referenced, Glob for them to confirm they exist.
For UI stories: check whether a walkthrough sequence (step-by-step interaction log) is present.
针对Visual/Feel类型story:检查证据文档中是否引用了截图文件路径。若有引用,匹配路径确认文件是否存在。
针对UI类型story:检查是否包含分步操作日志(walkthrough sequence)。

Date coverage

时效性

Evidence doc should have a date. If the date is earlier than the story's last major change (heuristic: compare against sprint start date from the sprint plan), flag as POTENTIALLY STALE — the evidence may not cover the final implementation.

证据文档应包含日期。若日期早于story的最后一次重大变更时间(判断规则:与sprint计划中的sprint开始日期对比),标记为POTENTIALLY STALE——该证据可能未覆盖最终实现。

6. Build the Review Report

6. 生成评审报告

For each story, assign a verdict:
VerdictMeaning
ADEQUATETest/evidence exists, passes quality checks, all criteria covered
INCOMPLETETest/evidence exists but has quality gaps (thin assertions, missing sign-offs)
MISSINGNo test or evidence found for a story type that requires it
The overall sprint/system verdict is the worst story verdict present.
markdown
undefined
针对每个story,给出评审结论:
结论含义
ADEQUATE测试/证据存在,通过质量检查,覆盖所有标准
INCOMPLETE测试/证据存在,但存在质量缺口(断言覆盖不足、缺失签字等)
MISSING对应类型的story未找到测试或证据
整个sprint/system的最终结论为所有story中的最差结论。
markdown
undefined

Test Evidence Review

Test Evidence Review

Date: [date] Scope: [single story path | Sprint [N] | [system name]] Stories reviewed: [N] Overall verdict: ADEQUATE / INCOMPLETE / MISSING

Date: [date] Scope: [single story path | Sprint [N] | [system name]] Stories reviewed: [N] Overall verdict: ADEQUATE / INCOMPLETE / MISSING

Story-by-Story Results

Story-by-Story Results

[Story Title] — [Type] — [ADEQUATE/INCOMPLETE/MISSING]

[Story Title] — [Type] — [ADEQUATE/INCOMPLETE/MISSING]

Test/evidence path:
[path]
(found) / (not found)
Automated test quality (Logic/Integration only):
  • Assertion coverage: [N per function on average] — [adequate / thin / none]
  • Edge cases: [covered / partial / not found]
  • Naming: [consistent / [N] generic names flagged]
  • Formula traceability: [yes / no — formula names not referenced in tests]
Manual evidence quality (Visual/Feel/UI only):
  • Criterion linkage: [N/M criteria referenced]
  • Sign-offs: [Developer ✓ | Designer ✗ | QA Lead ✗]
  • Artefacts: [screenshots present / missing / N/A]
  • Freshness: [dated [date] — current / potentially stale]
Issues:
  • BLOCKING: [description] (prevents story-done)
  • ADVISORY: [description] (should fix before release)

Test/evidence path:
[path]
(found) / (not found)
Automated test quality (Logic/Integration only):
  • Assertion coverage: [N per function on average] — [adequate / thin / none]
  • Edge cases: [covered / partial / not found]
  • Naming: [consistent / [N] generic names flagged]
  • Formula traceability: [yes / no — formula names not referenced in tests]
Manual evidence quality (Visual/Feel/UI only):
  • Criterion linkage: [N/M criteria referenced]
  • Sign-offs: [Developer ✓ | Designer ✗ | QA Lead ✗]
  • Artefacts: [screenshots present / missing / N/A]
  • Freshness: [dated [date] — current / potentially stale]
Issues:
  • BLOCKING: [description] (prevents story-done)
  • ADVISORY: [description] (should fix before release)

Summary

Summary

StoryTypeVerdictIssues
[title]LogicADEQUATENone
[title]IntegrationINCOMPLETEThin assertions (avg 1.2/function)
[title]Visual/FeelINCOMPLETEQA lead sign-off missing
[title]LogicMISSINGNo test file found
BLOCKING items (must resolve before story can be closed): [N] ADVISORY items (should address before release): [N]

---
StoryTypeVerdictIssues
[title]LogicADEQUATENone
[title]IntegrationINCOMPLETEThin assertions (avg 1.2/function)
[title]Visual/FeelINCOMPLETEQA lead sign-off missing
[title]LogicMISSINGNo test file found
BLOCKING items (must resolve before story can be closed): [N] ADVISORY items (should address before release): [N]

---

7. Write Output (Optional)

7. 输出结果(可选)

Present the report in conversation.
Ask: "May I write this test evidence review to
production/qa/evidence-review-[date].md
?"
This is optional — the report is useful standalone. Write only if the user wants a persistent record.
After the report:
  • For BLOCKING items: "These must be resolved before
    /story-done
    can mark the story Complete. Would you like to address any of them now?"
  • For thin assertions: "Consider running
    /test-helpers [system]
    to see scaffolded assertion patterns for common cases."
  • For missing sign-offs: "Manual sign-off is required from [role]. Share
    [evidence-path]
    with them to complete sign-off."
Verdict: COMPLETE — evidence review finished. Use CONCERNS if BLOCKING items were found.

在对话中展示报告。
询问:「是否将此测试证据评审写入
production/qa/evidence-review-[date].md
?」
此步骤为可选——报告本身可独立使用,仅在用户需要持久记录时写入。
报告展示后:
  • 针对BLOCKING项:「这些问题必须解决后,
    /story-done
    才能标记该story为完成。是否现在处理其中某些问题?」
  • 针对断言覆盖不足:「建议运行
    /test-helpers [system]
    查看常见场景的脚手架断言模板。」
  • 针对缺失签字:「需要[角色]的手动签字确认。请将
    [evidence-path]
    分享给对方完成签字。」
结论:COMPLETE——证据评审完成。若发现BLOCKING项,则标记为CONCERNS。

Collaborative Protocol

协作规则

  • Report quality issues, do not fix them — this skill reads and evaluates; it does not modify test files or evidence documents
  • ADEQUATE means adequate for shipping, not perfect — avoid nitpicking tests that are functioning and comprehensive enough to give confidence
  • BLOCKING vs. ADVISORY distinction is important — only flag BLOCKING when the gap leaves a story criterion genuinely unverified
  • Ask before writing — the report file is optional; always confirm before writing
  • 仅报告质量问题,不修复——本Skill仅读取和评估,不修改测试文件或证据文档
  • ADEQUATE表示满足发布要求,而非完美——对于功能正常、覆盖足够的测试,避免过度挑剔
  • 明确区分BLOCKING与ADVISORY——仅当缺口导致story标准未被真正验证时,才标记为BLOCKING
  • 写入前需确认——报告文件为可选,写入前务必征得用户同意