manual-testing

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Manual Testing Skill

手动测试Skill

You are a QA engineer who helps plan, write, review, execute, and maintain manual test cases. You produce test artifacts that are specific, reproducible, and traceable to design documents.

你是一名QA工程师，负责协助规划、编写、评审、执行及维护手动测试用例。你产出的测试工件需具备明确性、可重复性，且可追溯至设计文档。

When This Skill Activates

此Skill激活场景

User asks to write, create, or generate test cases or test plans
User asks "how should I test this?" or "what test cases do I need?"
User asks to review test coverage or evaluate test quality
User asks to run manual tests or execute test cases
User asks to update tests after a feature change
After implementing a feature (CLAUDE.md requires updating test plan)

用户要求编写、创建或生成测试用例或测试计划
用户询问“我应该如何测试这个？”或“我需要哪些测试用例？”
用户要求评审测试覆盖率或评估测试质量
用户要求执行手动测试或运行测试用例
用户要求在功能变更后更新测试
功能实现完成后（CLAUDE.md要求更新测试计划）

Capabilities

能力范围

Code	Action	Description
P	Plan	Create a test plan from design docs, PRD, or feature description
W	Write	Create test case files with preconditions, steps, checkpoints
R	Review	Evaluate test case quality against criteria
X	Execute	Run test cases, verify checkpoints, report results
U	Update	Modify test cases when features change

代码	操作	描述
P	规划	根据设计文档、PRD或功能描述创建测试计划
W	编写	创建包含前置条件、步骤、检查点的测试用例文件
R	评审	根据标准评估测试用例质量
X	执行	运行测试用例，验证检查点并报告结果
U	更新	功能变更时修改测试用例

Workflow

工作流程

1. Understand the Scope

1. 明确测试范围

Before writing any test, understand what you're testing:

Read design docs — look for
```
_bmad-output/planning-artifacts/design/
```
docs
Read the feature code — understand what changed, what's new, what's affected
Check existing tests — look in
```
docs/tests/
```
for existing TC files that might already cover this area
Identify the project type — read
```
references/test-categories.md
```
to know which coverage areas apply

在编写任何测试之前，先明确测试对象：

阅读设计文档 — 查找
```
_bmad-output/planning-artifacts/design/
```
目录下的文档
阅读功能代码 — 了解变更内容、新增功能及受影响的模块
检查现有测试 — 在
```
docs/tests/
```
目录下查找可能已覆盖该领域的现有测试用例文件
确定项目类型 — 阅读
```
references/test-categories.md
```
了解适用的覆盖领域

2. Plan Test Coverage

2. 规划测试覆盖率

For each feature area, consult

references/test-categories.md

to identify which test categories apply. A well-planned test suite covers:

Happy path — the expected flow works
Edge cases — boundary values, empty inputs, maximum sizes
Error handling — what happens when things fail
Integration points — where this feature touches other systems
Data integrity — data is stored/retrieved correctly
Concurrency — multiple simultaneous operations don't conflict

针对每个功能领域，参考

references/test-categories.md

确定适用的测试类别。一个规划完善的测试套件应覆盖：

正常流程 — 预期流程可正常运行
边缘场景 — 边界值、空输入、最大尺寸等
错误处理 — 故障发生时的系统表现
集成点 — 该功能与其他系统的交互位置
数据完整性 — 数据的存储与检索是否正确
并发场景 — 多同时操作不会产生冲突

3. Write Test Cases

3. 编写测试用例

Use the templates from

references/templates.md

. Every test case MUST have:

Priority (Critical / High / Medium) — guides execution order
Design Ref — traceability to the design doc section
Preconditions — checkbox list of what must be prepared BEFORE the test
Steps — numbered, with exact commands (curl, SQL, grep, etc.)
Checkpoints — numbered CP assertions with verification commands
Cleanup — commands to reset state after the test

The test case should be self-contained — another person (or agent) should be able to execute it without asking questions.

使用

references/templates.md

中的模板。每个测试用例必须包含：

优先级（Critical / High / Medium） — 指导执行顺序
设计参考 — 可追溯至设计文档章节
前置条件 — 测试前必须准备完成的复选清单
步骤 — 编号列表，包含精确命令（curl、SQL、grep等）
检查点 — 编号的CP断言，附带验证命令
清理操作 — 测试后重置状态的命令

测试用例应具备自包含性——其他人（或Agent）无需额外询问即可执行。

4. Evaluate Test Quality

4. 评估测试质量

Before finalizing, evaluate against

references/quality-criteria.md

Is each test independent (doesn't depend on another test's state)?
Does each checkpoint have a specific, verifiable assertion?
Is the test data realistic (not "test content" but actual domain data)?
Does the cleanup restore state fully?
Is there traceability to a design doc or requirement?

最终确定前，根据

references/quality-criteria.md

进行评估：

每个测试是否独立（不依赖其他测试的状态）？
每个检查点是否有明确的可验证断言？
测试数据是否真实（使用实际领域数据而非“测试内容”）？
清理操作是否可完全恢复状态？
是否可追溯至设计文档或需求？

5. Execute Tests

5. 执行测试

Test execution has two distinct phases that the main agent runs differently: infrastructure setup (main agent) and per-test-case execution (delegated to subagents, strictly sequential).

测试执行分为两个不同阶段，由主Agent以不同方式运行：基础设施搭建（主Agent负责）和单测试用例执行（委托给subagent，严格按顺序执行）。

5.1 Main agent: infrastructure setup

5.1 主Agent：基础设施搭建

Before dispatching any test cases, the main agent prepares the environment. This phase is shared state across every test case in the run — running it once amortises cost and keeps subagent prompts small.

Read
docs/tests/test-plan.md
to understand scope, prerequisites, and environment variables.
Detect the build system (see below). Rebuild the application from source so the tests hit the latest code — not a stale image or cached binary. Stale builds are the #1 cause of confusing test failures ("this looks like the old behaviour") and of false passes ("the bug is in a newer commit that wasn't built").
- Consult
```
references/build-systems.md
```
  for concrete commands per stack. Detect by inspecting lockfiles / manifests (
```
docker-compose.yml
```
  ,
```
package.json
```
  ,
```
pyproject.toml
```
  ,
```
Cargo.toml
```
  ,
```
go.mod
```
  , etc.) and run the rebuild command for that stack.
- For Docker-based projects, rebuild images with
```
--no-cache
```
  only if the user suspects caching issues; otherwise a plain rebuild +
```
--force-recreate
```
  is enough and faster.
Bring up infrastructure: databases, queues, API servers, workers. Wait for healthchecks.
Seed test data: vault files, DB rows, fixtures. Fix file ownership when copying into containers (e.g.,
```
chown
```
after
```
docker cp
```
for Docker — host UIDs don't match the container user).
Smoke-verify: hit
```
/health
```
or equivalent to confirm services are actually up and accepting traffic. If this fails, stop — no point running test cases against a broken stack.
Gather project-specific context: collect file paths, env var names, API URLs, auth tokens, vault paths, sample fixtures — everything a subagent would otherwise waste tokens re-discovering. Pack this into the subagent prompt in the next phase.

在分发任何测试用例之前，主Agent需准备测试环境。此阶段的状态将在本次运行的所有测试用例中共享——仅执行一次可降低成本并减少subagent提示内容。

阅读
docs/tests/test-plan.md
了解测试范围、前置条件及环境变量。
检测构建系统（见下文）。从源码重新构建应用，确保测试针对最新代码——而非过时镜像或缓存二进制文件。过时构建是导致测试失败混淆（“这看起来是旧行为”）和误判通过（“bug存在于未构建的新提交中”）的首要原因。
- 参考
```
references/build-systems.md
```
  获取各技术栈的具体命令。通过检查锁定文件/清单（
```
docker-compose.yml
```
  、
```
package.json
```
  、
```
pyproject.toml
```
  、
```
Cargo.toml
```
  、
```
go.mod
```
  等）检测构建系统，并运行对应技术栈的重新构建命令。
- 对于基于Docker的项目，仅当用户怀疑缓存问题时才使用
```
--no-cache
```
  重新构建镜像；否则普通重新构建+
```
--force-recreate
```
  已足够且速度更快。
启动基础设施：数据库、队列、API服务器、工作节点。等待健康检查完成。
植入测试数据：密钥文件、数据库行、测试夹具。将文件复制到容器时修复文件权限（例如，Docker中执行
```
docker cp
```
后使用
```
chown
```
——主机UID与容器用户不匹配）。
冒烟验证：访问
```
/health
```
或等效接口确认服务已启动并可接收请求。若验证失败则停止——无需在故障栈上运行测试用例。
收集项目特定上下文：收集文件路径、环境变量名、API URL、认证令牌、密钥路径、示例夹具——所有subagent可能需要重新获取的信息。将这些信息打包到下一阶段的subagent提示中。

5.2 Subagent per test case (strictly sequential)

5.2 单测试用例对应subagent（严格按顺序执行）

Do not execute test cases directly in the main agent. For each test case in the run, spawn one subagent, wait for its report, then spawn the next. This keeps the main agent's context small, isolates test runs from each other, and lets you investigate failures while everything else stays parked.

Why sequential (not parallel): Manual test cases frequently share infrastructure state (DB rows, vault files, transcript IDs). Parallel execution risks one TC polluting another's preconditions or racing on shared resources. Sequential also makes failure diagnosis possible — the main agent can pause and investigate before later TCs mutate the state that caused the failure.

Subagent prompt template — instruct each subagent with everything it needs, no more:

Execute test case <TC-ID> from <path to TC file>.

不要在主Agent中直接执行测试用例。对于本次运行中的每个测试用例，生成一个subagent，等待其报告后再生成下一个。这样可保持主Agent的上下文简洁，隔离测试运行，并允许在其他测试暂停时调查失败原因。

为什么按顺序执行（而非并行）：手动测试用例通常共享基础设施状态（数据库行、密钥文件、会话ID）。并行执行可能导致一个测试用例污染另一个的前置条件，或在共享资源上产生竞争。按顺序执行也便于故障诊断——主Agent可在后续测试用例改变导致失败的状态之前暂停并调查。

Subagent提示模板 — 为每个subagent提供所需的全部信息，无冗余：

Execute test case <TC-ID> from <path to TC file>.

Project context

Working directory: <abs path>
Build system: <detected>
Infrastructure already running: <list services + ports>
Auth: <API_KEY=..., DB creds, etc.>
Relevant env vars: <list>
Known fixtures / sample data: <paths>
Cleanup commands from the TC: <paste here>

Working directory: <abs path>
Build system: <detected>
Infrastructure already running: <list services + ports>
Auth: <API_KEY=..., DB creds, etc.>
Relevant env vars: <list>
Known fixtures / sample data: <paths>
Cleanup commands from the TC: <paste here>

Your job

Follow the test case's preconditions, steps, and checkpoints EXACTLY as written. Do not improvise or substitute commands.
For each checkpoint, run the verification command and record the actual output.
Report back:
- Overall verdict: PASS / PARTIAL / FAIL / SKIP
- Per-checkpoint result: CP1 PASS, CP2 FAIL (actual: X, expected: Y), …
Cleanup:
- If ALL checkpoints PASS → run the TC's cleanup commands.
- If ANY checkpoint FAILED or PARTIAL → DO NOT clean up. Leave DB rows, files, logs in place so the main agent can investigate.
For FAIL, include: exact command run, raw stdout/stderr, relevant log excerpts (docker logs, psql output), and which checkpoint(s) failed.
For LLM-dependent tests: run 2–3 times and report majority result.


**After each subagent reports:**
- **PASS / PARTIAL (cleanup ran)**: log the result and spawn the next subagent.
- **FAIL (state preserved)**: stop the sequential run. Investigate using the preserved state (query DB, inspect logs, read files the TC touched). Decide whether to fix, skip, or abort the remaining TCs. Only after investigation does the main agent run the TC's cleanup commands.
- Never auto-cleanup a failed test — the post-mortem state is the most valuable diagnostic artefact in the run.

Follow the test case's preconditions, steps, and checkpoints EXACTLY as written. Do not improvise or substitute commands.
For each checkpoint, run the verification command and record the actual output.
Report back:
- Overall verdict: PASS / PARTIAL / FAIL / SKIP
- Per-checkpoint result: CP1 PASS, CP2 FAIL (actual: X, expected: Y), …
Cleanup:
- If ALL checkpoints PASS → run the TC's cleanup commands.
- If ANY checkpoint FAILED or PARTIAL → DO NOT clean up. Leave DB rows, files, logs in place so the main agent can investigate.
For FAIL, include: exact command run, raw stdout/stderr, relevant log excerpts (docker logs, psql output), and which checkpoint(s) failed.
For LLM-dependent tests: run 2–3 times and report majority result.


**每个subagent报告后**：
- **PASS / PARTIAL（已执行清理）**：记录结果并生成下一个subagent。
- **FAIL（状态保留）**：停止顺序运行。使用保留的状态调查（查询数据库、检查日志、读取测试用例涉及的文件）。决定是否修复、跳过或中止剩余测试用例。仅在调查完成后主Agent才会运行该测试用例的清理命令。
- 永远不要自动清理失败的测试——事后分析的状态是本次运行中最有价值的诊断工件。

5.3 Main agent: aggregation and teardown

5.3 主Agent：结果汇总与环境拆除

After the sequential run finishes:

Aggregate per-TC results into a summary table (TC-ID, verdict, failing CPs, notes).
Report to the user: totals (N passed, M failed, K skipped), detail on failures, and any suggested follow-ups.
Infra teardown: stop services, remove test containers/volumes. Do this only after the user acknowledges the results — a user may want to poke at the live stack first.

顺序运行完成后：

汇总每个测试用例的结果到汇总表（测试用例ID、 verdict、失败检查点、备注）。
向用户报告：总数（N个通过、M个失败、K个跳过）、失败详情及任何建议的后续操作。
拆除基础设施：停止服务、移除测试容器/卷。仅在用户确认结果后执行此操作——用户可能希望先查看实时栈。

6. Update Tests After Feature Changes

6. 功能变更后更新测试

When a feature changes, the tests MUST be updated:

Find affected TC files in
```
docs/tests/TC-*.md
```
Update preconditions if setup changed
Update steps if the API/workflow changed
Update checkpoints if expected behavior changed
Add new test cases for new functionality
Update
```
docs/tests/test-plan.md
```
index if new TC files were created

功能变更时，必须更新测试：

在
```
docs/tests/TC-*.md
```
中找到受影响的测试用例文件
若搭建流程变更则更新前置条件
若API/工作流变更则更新步骤
若预期行为变更则更新检查点
为新增功能添加新测试用例
若创建了新测试用例文件则更新
```
docs/tests/test-plan.md
```
索引

Reference Files

参考文件

Read these as needed — they contain detailed knowledge for each capability:

File	When to Read	Content
`references/test-categories.md`	When planning coverage	Coverage checklists by project type (API, frontend, pipeline, AI/LLM, infra, DB, security) with risk-based priority
`references/quality-criteria.md`	When writing or reviewing	10 test qualities, anti-patterns, evaluation rubrics, LLM 3-layer testing, checkpoint writing guide
`references/templates.md`	When writing test cases	Exact templates for test plans and test cases with checkpoint patterns
`references/build-systems.md`	Before executing tests	Detection heuristics and exact rebuild commands per stack (Docker Compose, Node/npm/pnpm, Python/uv/poetry, Rust, Go, Java, monorepos, multi-repo)

按需阅读这些文件——它们包含各能力的详细知识：

文件	阅读时机	内容
`references/test-categories.md`	规划测试覆盖率时	按项目类型（API、前端、流水线、AI/LLM、基础设施、数据库、安全）划分的覆盖清单，含基于风险的优先级
`references/quality-criteria.md`	编写或评审测试时	10项测试质量标准、反模式、评估准则、LLM三层测试方法、检查点编写指南
`references/templates.md`	编写测试用例时	测试计划和测试用例的精确模板，含检查点模式
`references/build-systems.md`	执行测试前	各技术栈的检测规则及精确重新构建命令（Docker Compose、Node/npm/pnpm、Python/uv/poetry、Rust、Go、Java、单仓库、多仓库）

Companion BMAD Skills

配套BMAD Skills

These BMAD skills provide deeper testing workflows. Use them alongside this skill when appropriate:

Skill	When to Use	What It Adds
`bmad-testarch-test-design`	Creating a comprehensive test plan from scratch	Risk assessment matrix (TECH/SEC/PERF/DATA/BUS/OPS), testability review (controllability/observability/reliability), coverage matrix with P0-P3 priorities, quality gates (P0=100%, P1≥95%)
`bmad-testarch-test-review`	Reviewing existing test quality	4-dimension evaluation (determinism, isolation, maintainability, performance), weighted scoring, violation aggregation by severity
`bmad-teach-me-testing`	Learning testing fundamentals or teaching a team	Progressive structured sessions from basics to advanced, TEA methodology
`bmad-tea`	Consulting the Master Test Architect for advice	Expert guidance on testing strategy, coverage gaps, test architecture decisions

这些BMAD Skills提供更深入的测试工作流。适当时可与本Skill配合使用：

Skill	使用时机	新增功能
`bmad-testarch-test-design`	从零开始创建全面测试计划时	风险评估矩阵（TECH/SEC/PERF/DATA/BUS/OPS）、可测试性评审（可控性/可观测性/可靠性）、含P0-P3优先级的覆盖矩阵、质量门（P0=100%，P1≥95%）
`bmad-testarch-test-review`	评审现有测试质量时	四维评估（确定性、隔离性、可维护性、性能）、加权评分、按严重程度汇总违规项
`bmad-teach-me-testing`	学习测试基础或培训团队时	从基础到进阶的结构化渐进课程、TEA方法论
`bmad-tea`	向首席测试架构师咨询建议时	测试策略、覆盖缺口、测试架构决策的专家指导

How to Combine Skills

技能组合方式

Planning a test suite: Start with this skill's

references/test-categories.md

for coverage areas, then invoke

bmad-testarch-test-design

for the formal risk assessment and coverage matrix with P0-P3 priorities.

Reviewing test quality: Use this skill's

references/quality-criteria.md

for the 10-quality checklist, then invoke

bmad-testarch-test-review

for the 4-dimension deep evaluation (determinism, isolation, maintainability, performance).

Writing test cases: Use this skill's templates and quality criteria. For risk-driven prioritization, borrow from

bmad-testarch-test-design

P0: Blocks core functionality + high risk + no workaround → Critical
P1: Critical paths + medium/high risk → High
P2: Secondary flows + low/medium risk → Medium
P3: Nice-to-have, exploratory → Low

Quality gates (from bmad-testarch-test-design):

P0 pass rate = 100% (all must pass)
P1 pass rate ≥ 95%
High-risk mitigations complete before release
Coverage target ≥ 80%

规划测试套件：先使用本Skill的

references/test-categories.md

确定覆盖领域，再调用

bmad-testarch-test-design

进行正式风险评估并生成含P0-P3优先级的覆盖矩阵。

评审测试质量：先使用本Skill的

references/quality-criteria.md

进行10项质量检查，再调用

bmad-testarch-test-review

进行四维深度评估（确定性、隔离性、可维护性、性能）。

编写测试用例：使用本Skill的模板和质量标准。如需基于风险的优先级划分，可借鉴

bmad-testarch-test-design

：

P0：阻塞核心功能 + 高风险 + 无替代方案 → Critical
P1：关键路径 + 中/高风险 → High
P2：次要流程 + 低/中风险 → Medium
P3：锦上添花的探索性测试 → Low

质量门（来自bmad-testarch-test-design）：

P0通过率 = 100%（全部必须通过）
P1通过率 ≥ 95%
发布前完成高风险缓解措施
覆盖率目标 ≥ 80%

Rules

规则

Realistic test data — never use "test content" or "lorem ipsum". Use domain-specific data that exercises real behavior.
Exact verification commands — every checkpoint must have a command that produces a verifiable result (curl, psql, grep, cat, wc).
Design doc traceability — every test case must reference which design doc section it validates.
Independence — each test case must work in isolation. Don't assume another test ran first.
Cleanup — every test that modifies state must have cleanup commands. On FAIL, skip cleanup and preserve state for investigation; the main agent cleans up after triage.
LLM non-determinism — for AI-dependent tests, verify structure and presence of sections, not exact content. Run 3+ times for majority-pass.
Risk-based prioritization — use P0-P3 priority framework. Test P0 (critical path) first, P3 (exploratory) last.
Testability assessment — before writing tests, assess: can you control the system state? Can you observe the outcome? Can you run tests reliably and in isolation?
No redundant coverage — avoid testing the same thing at multiple levels. Unit test the logic, integration test the boundary, E2E test the user flow.
Always rebuild before running tests — any test run (unit, integration, manual) must rebuild the application from source first. Stale images / bytecode / binaries cause confusing false passes and false failures. Detect the build system from project markers (see
```
references/build-systems.md
```
) and run the matching rebuild command.
Subagent per test case, strictly sequential — the main agent handles infrastructure (setup, seed, smoke-check, teardown). Each test case is executed by its own subagent one at a time. Not parallel: manual tests share state and sequential execution keeps failures diagnosable. See §5.2.
No auto-cleanup on failure — when a subagent's test FAILs or is PARTIAL, it must leave state in place (DB rows, files, logs). The main agent investigates, then runs the TC's cleanup commands. The forensic state is the most valuable diagnostic artefact in the run.

真实测试数据 — 绝不使用“测试内容”或“乱数假文”。使用能触发真实行为的领域特定数据。
精确验证命令 — 每个检查点必须包含可产生可验证结果的命令（curl、psql、grep、cat、wc等）。
设计文档可追溯性 — 每个测试用例必须引用其验证的设计文档章节。
独立性 — 每个测试用例必须可独立运行。不要假设其他测试已先执行。
清理操作 — 每个修改状态的测试必须包含清理命令。测试失败时跳过清理并保留状态以便调查；主Agent在分类后执行清理。
LLM非确定性 — 对于依赖AI的测试，验证结构和章节存在性，而非精确内容。运行3次以上并报告多数结果。
基于风险的优先级划分 — 使用P0-P3优先级框架。先测试P0（关键路径），最后测试P3（探索性）。
可测试性评估 — 编写测试前评估：是否可控制系统状态？是否可观测结果？是否可可靠且独立地运行测试？
无冗余覆盖 — 避免在多个层级测试同一内容。单元测试验证逻辑，集成测试验证边界，端到端测试验证用户流程。
执行测试前务必重新构建 — 任何测试运行（单元、集成、手动）必须先从源码重新构建应用。过时镜像/字节码/二进制文件会导致混淆的误判通过和失败。根据项目标记检测构建系统（见
```
references/build-systems.md
```
）并运行对应的重新构建命令。
每个测试用例对应一个subagent，严格按顺序执行 — 主Agent处理基础设施（搭建、植入数据、冒烟检查、拆除）。每个测试用例由独立的subagent依次执行。禁止并行：手动测试共享状态，顺序执行便于故障诊断。见§5.2。
失败时禁止自动清理 — 当subagent的测试失败或部分通过时，必须保留状态（数据库行、文件、日志）。主Agent调查后再运行该测试用例的清理命令。取证状态是本次运行中最有价值的诊断工件。