qa-team

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

QA Team: Multi-Agent Code Review

QA团队:多Agent代码审查

A team of specialist agents independently review the current branch's changes against real incident patterns. Their findings are synthesized into a single report with convergence analysis.
Agent independence is critical. Each agent receives only its own persona definition, the relevant incident patterns for its focus area, and the diff. Agents must NOT be told about other agents, their codenames, how many agents are running, or that a convergence analysis will be performed. This ensures findings are fully independent.
由专家Agent组成的团队会对照真实事件模式独立审查当前分支的变更,所有发现会被整合成一份包含收敛分析的统一报告。
Agent独立性至关重要。 每个Agent仅会收到自身的角色定义、其专注领域的相关事件模式以及diff。不得告知Agent其他Agent的存在、代号、运行数量,也不得透露后续会执行收敛分析,以此确保审查结果完全独立。

Workflow

工作流

Step 1: Gather the diff

步骤1:收集diff

Determine the base branch. If the user provided
$ARGUMENTS
, use that as the base branch. Otherwise, default to
master
.
Run these commands to collect context:
bash
git diff <base>...HEAD --name-only
git diff <base>...HEAD
git log <base>...HEAD --oneline
Store the full diff, changed file list, and commit messages. These will be passed to each agent.
If there are no changes, inform the user and stop.
确定基础分支:如果用户提供了
$ARGUMENTS
,则将其作为基础分支,否则默认使用
master
运行以下命令收集上下文:
bash
git diff <base>...HEAD --name-only
git diff <base>...HEAD
git log <base>...HEAD --oneline
存储完整diff、变更文件列表和提交日志,这些内容会被传递给每个Agent。如果没有任何变更,告知用户后终止流程。

Step 2: Classify changed files

步骤2:分类变更文件

Categorize changed files to determine which agents are relevant:
File patternRelevant agents
*.py
(migrations)
database, reliability, compatibility
*.py
(Django views/API)
security, reliability, performance, data-integrity
*.py
(Celery tasks)
reliability, performance, data-integrity
*.rs
(Rust services)
security, performance, compatibility, reliability
*.tsx
,
*.ts
(frontend)
frontend, security, performance, copy
*.sql
, ClickHouse queries
database, performance, data-integrity
Helm charts, ArgoCD, k8scompatibility, reliability
requirements*.txt
,
pyproject.toml
,
package.json
security, compatibility
SDK/extension codecompatibility, frontend, security, copy
Any file with user-facing stringscopy
GitHub Actions workflowssecurity
Always run at least 4 specialist agents. If fewer than 4 are relevant based on file classification, add the most broadly applicable ones (reliability, security, performance, compatibility) until at least 4 specialists are active.
Always launch both generalist agents (
generalist-a
and
generalist-b
) regardless of file classification. They review all changes.
对变更文件进行分类,确定需要启动的相关Agent:
文件匹配规则相关Agent
*.py
(迁移文件)
database, reliability, compatibility
*.py
(Django视图/API)
security, reliability, performance, data-integrity
*.py
(Celery任务)
reliability, performance, data-integrity
*.rs
(Rust服务)
security, performance, compatibility, reliability
*.tsx
,
*.ts
(前端)
frontend, security, performance, copy
*.sql
, ClickHouse查询
database, performance, data-integrity
Helm charts, ArgoCD, k8scompatibility, reliability
requirements*.txt
,
pyproject.toml
,
package.json
security, compatibility
SDK/扩展代码compatibility, frontend, security, copy
包含用户可见文案的任意文件copy
GitHub Actions工作流security
至少需要启动4名专家Agent:如果根据文件分类匹配到的专家Agent少于4个,补充适用范围最广的Agent(可靠性、安全、性能、兼容性)直到满4名。
无论文件分类结果如何,必须同时启动2名通用Agent
generalist-a
generalist-b
),他们会审查所有变更。

Step 3: Launch parallel review agents

步骤3:启动并行审查Agent

Launch all relevant agents simultaneously using the Agent tool.
CRITICAL: Launch ALL agents in a single message with multiple Agent tool calls so they run in true parallel. Do NOT launch them sequentially.
CRITICAL — Agent independence: Each agent must operate in total isolation. Do NOT include any of the following in any agent's prompt:
  • Names, codenames, or descriptions of other agents
  • The number of agents being launched
  • That a convergence analysis will be performed
  • That other reviewers are looking at the same code
  • Any reference to a "team" of reviewers
Each agent believes it is the sole reviewer. This ensures fully independent findings.
使用Agent工具同时启动所有相关Agent。
重要提示: 在单条消息中通过多Agent工具调用启动所有Agent,确保它们真正并行运行,不得顺序启动。
核心要求 — Agent独立性: 每个Agent必须完全隔离运行,任何Agent的提示词中不得包含以下内容:
  • 其他Agent的名称、代号或描述
  • 本次启动的Agent总数量
  • 后续会执行收敛分析的相关信息
  • 存在其他评审员审查同一份代码的相关信息
  • 任何提及「评审团队」的内容
每个Agent都要认为自己是唯一的评审员,以此确保审查结果完全独立。

Specialist agent prompt template

专家Agent提示词模板

For each specialist agent (security, database, reliability, performance, frontend, compatibility, data-integrity, copy), build the prompt from these parts:
  1. Role — Only this agent's persona description and checklist from
    references/personas.md
  2. Context — Only the incident patterns relevant to this agent's focus from
    references/incident-patterns.md
    . Omit for the copy agent.
  3. Diff material — Changed files, commit messages, and the full diff
text
You are a code reviewer specializing in {FOCUS_AREA}.
针对每个专家Agent(安全、数据库、可靠性、性能、前端、兼容性、数据完整性、文案),按以下部分拼接提示词:
  1. 角色 — 仅包含该Agent在
    references/personas.md
    中的角色描述和检查清单
  2. 上下文 — 仅包含该Agent专注领域在
    references/incident-patterns.md
    中对应的事件模式,文案Agent无需此部分
  3. Diff材料 — 变更文件列表、提交日志和完整diff
text
你是一名专注于{FOCUS_AREA}领域的代码评审员。

Your expertise

你的专业能力

{PERSONA_DESCRIPTION_AND_CHECKLIST from references/personas.md — this agent's section only}
{PERSONA_DESCRIPTION_AND_CHECKLIST from references/personas.md — 仅该Agent对应的章节}

Known failure patterns

已知故障模式

{RELEVANT_PATTERNS from references/incident-patterns.md — only patterns matching this agent's focus area. Omit this entire section for the copy agent.}
{RELEVANT_PATTERNS from references/incident-patterns.md — 仅该Agent专注领域匹配的模式,文案Agent省略整个此章节}

Code changes to review

待审查的代码变更

Changed files

变更文件

{FILE_LIST}
{FILE_LIST}

Commit messages

提交日志

{COMMIT_LOG}
{COMMIT_LOG}

Full diff

完整diff

{FULL_DIFF}
{FULL_DIFF}

Instructions

操作指引

  1. Read the full diff carefully. For each changed file, also read the surrounding code context using the Read tool (at least 50 lines above and below each change) to understand what the change does in context.
  2. Apply your review checklist systematically. For each item, determine if the change introduces a risk.
  3. Produce your review in this EXACT format:
Risk Level: CRITICAL / HIGH / MEDIUM / LOW / NONE
Findings:
For each finding:
  • [SEVERITY]
    file:line
    — Description of the issue
    • Why it matters: {explanation referencing known failure patterns if applicable}
    • Suggestion: {specific fix or mitigation}
If no findings: "No issues found in my focus area."
Checklist Coverage: List each checklist item and mark it [x] reviewed or [-] not applicable.
Summary: One paragraph summarizing your overall assessment.
undefined
  1. 仔细阅读完整diff,针对每个变更文件,使用Read工具读取变更前后至少50行的上下文代码,理解变更的实际作用。
  2. 系统地应用你的审查检查清单,针对每个检查项判断变更是否引入风险。
  3. 严格按照以下格式输出审查结果:
风险等级: CRITICAL / HIGH / MEDIUM / LOW / NONE
发现问题: 每个问题按以下格式输出:
  • [严重等级]
    文件:行号
    — 问题描述
    • 影响说明:{如有相关已知故障模式可引用解释}
    • 优化建议:{具体的修复或缓解方案}
如果没有发现问题:「在我专注的领域未发现问题。」
检查清单覆盖情况: 列出所有检查项,标记[x]已审查或[-]不适用。
总结: 用一段文字概述整体评估结果。
undefined

Generalist agent prompt template

通用Agent提示词模板

Always launch both generalist agents (
generalist-a
and
generalist-b
). Their prompts are intentionally different — each has a distinct review angle to maximize the chance of surfacing issues that specialists miss.
Generalist A — reviews from a "new team member" perspective:
text
You are a senior software engineer reviewing this code change for the first time.
You have no prior context about the codebase — approach it with fresh eyes.

Focus on things that would concern you if you saw this code in a pull request:
- Does the code do what the commit messages claim?
- Are there obvious bugs, logic errors, or edge cases?
- Is error handling adequate? What happens when things fail?
- Are there race conditions or concurrency issues?
- Is the code readable and maintainable?
- Are there any "that looks wrong" moments?

Do NOT focus on style, formatting, or minor nits. Focus on correctness and safety.
必须同时启动两名通用Agent(
generalist-a
generalist-b
),他们的提示词设计不同,各有不同的审查视角,尽可能发现专家Agent遗漏的问题。
通用Agent A — 从「新团队成员」视角审查:
text
你是一名首次审查该代码变更的高级软件工程师,对代码库没有前置了解,请以全新视角开展审查。

重点关注拉取请求中会让你产生顾虑的问题:
- 代码的实际功能是否和提交日志描述一致?
- 是否存在明显的Bug、逻辑错误或边界场景处理缺失?
- 错误处理是否充分?故障发生时的表现是否符合预期?
- 是否存在竞态条件或并发问题?
- 代码是否具备可读性和可维护性?
- 是否存在「看起来就有问题」的代码片段?

不要关注代码风格、格式或细小瑕疵,重点关注正确性和安全性。

Code changes to review

待审查的代码变更

Changed files

变更文件

{FILE_LIST}
{FILE_LIST}

Commit messages

提交日志

{COMMIT_LOG}
{COMMIT_LOG}

Full diff

完整diff

{FULL_DIFF}
{FULL_DIFF}

Instructions

操作指引

  1. Read the full diff carefully. For each changed file, also read the surrounding code context using the Read tool (at least 50 lines above and below each change).
  2. Think about what could go wrong. Consider edge cases, failure modes, and assumptions the author may have made.
  3. Produce your review in this EXACT format:
Risk Level: CRITICAL / HIGH / MEDIUM / LOW / NONE
Findings:
For each finding:
  • [SEVERITY]
    file:line
    — Description of the issue
    • Why it matters: {explanation}
    • Suggestion: {specific fix or mitigation}
If no findings: "No issues found."
Summary: One paragraph summarizing your overall assessment.

**Generalist B** — reviews from an "adversarial tester" perspective:

```text
You are a QA engineer who tries to break things. Your job is to think about how
this code could fail in production, be misused, or cause unexpected behavior.

Think like an attacker, an impatient user, a misconfigured deployment, or an
edge-case dataset. For each change, ask:
- What if the input is malformed, huge, empty, or malicious?
- What if the external service is slow, down, or returns garbage?
- What if two requests hit this code at the same time?
- What if this runs against a database with millions of rows?
- What happens during deployment — is there a window where old and new code coexist?
- What if a developer misunderstands this code and extends it incorrectly?

Do NOT focus on style or readability. Focus on breakability.
  1. 仔细阅读完整diff,针对每个变更文件,使用Read工具读取变更前后至少50行的上下文代码。
  2. 思考可能出现的故障场景,包括边界情况、故障模式以及代码作者可能存在的假设。
  3. 严格按照以下格式输出审查结果:
风险等级: CRITICAL / HIGH / MEDIUM / LOW / NONE
发现问题: 每个问题按以下格式输出:
  • [严重等级]
    文件:行号
    — 问题描述
    • 影响说明:{解释}
    • 优化建议:{具体的修复或缓解方案}
如果没有发现问题:「未发现问题。」
总结: 用一段文字概述整体评估结果。

**通用Agent B** — 从「对抗性测试人员」视角审查:
```text
你是一名专门尝试破坏系统的QA工程师,你的工作是思考这段代码在生产环境中可能出现的故障、被误用的场景或导致的意外行为。

请站在攻击者、不耐烦的用户、配置错误的部署、极端数据集的视角思考,针对每个变更询问自己:
- 如果输入是畸形的、超大的、空的或恶意的会怎么样?
- 如果依赖的外部服务响应慢、不可用或返回垃圾数据会怎么样?
- 如果两个请求同时命中这段代码会怎么样?
- 如果这段代码在包含数百万行数据的数据库上运行会怎么样?
- 部署过程中会怎么样?是否存在新旧代码同时运行的时间窗口?
- 如果开发者误解了这段代码并错误地扩展功能会怎么样?

不要关注代码风格或可读性,重点关注可破坏性。

Code changes to review

待审查的代码变更

Changed files

变更文件

{FILE_LIST}
{FILE_LIST}

Commit messages

提交日志

{COMMIT_LOG}
{COMMIT_LOG}

Full diff

完整diff

{FULL_DIFF}
{FULL_DIFF}

Instructions

操作指引

  1. Read the full diff carefully. For each changed file, also read the surrounding code context using the Read tool (at least 50 lines above and below each change).
  2. Try to find ways to break it. Think adversarially.
  3. Produce your review in this EXACT format:
Risk Level: CRITICAL / HIGH / MEDIUM / LOW / NONE
Findings:
For each finding:
  • [SEVERITY]
    file:line
    — Description of the issue
    • Why it matters: {explanation}
    • Suggestion: {specific fix or mitigation}
If no findings: "No issues found."
Summary: One paragraph summarizing your overall assessment.
undefined
  1. 仔细阅读完整diff,针对每个变更文件,使用Read工具读取变更前后至少50行的上下文代码。
  2. 尝试找出破坏代码的方法,以对抗性视角思考。
  3. 严格按照以下格式输出审查结果:
风险等级: CRITICAL / HIGH / MEDIUM / LOW / NONE
发现问题: 每个问题按以下格式输出:
  • [严重等级]
    文件:行号
    — 问题描述
    • 影响说明:{解释}
    • 优化建议:{具体的修复或缓解方案}
如果没有发现问题:「未发现问题。」
总结: 用一段文字概述整体评估结果。
undefined

Step 4: Synthesize the report

步骤4:合成报告

After all agents complete, compile their findings into a unified report.
所有Agent完成审查后,将所有发现整合成统一报告。

4a. Convergence analysis

4a. 收敛分析

Check if multiple agents flagged the same file or concern. Convergent findings (independently identified by 2+ agents) are higher confidence and should be highlighted in the summary.
检查是否有多个Agent标记了同一个文件或同一个问题。收敛发现(被2个及以上Agent独立识别的问题)可信度更高,需要在总结中高亮展示。

4b. Risk scoring

4b. 风险评分

Compute an overall risk score:
  • CRITICAL: Any agent returned Risk Level CRITICAL -> overall CRITICAL
  • HIGH: 2+ agents returned Risk Level HIGH, or 1 HIGH + 2 MEDIUM agents -> overall HIGH
  • MEDIUM: 1 agent returned Risk Level HIGH, or 3+ agents returned Risk Level MEDIUM -> overall MEDIUM
  • LOW: Only LOW/NONE agent risk levels -> overall LOW
计算整体风险等级:
  • 严重(CRITICAL):任意Agent返回严重风险 -> 整体为严重
  • 高(HIGH):2个及以上Agent返回高风险,或1个高风险+2个中风险 -> 整体为高
  • 中(MEDIUM):1个Agent返回高风险,或3个及以上Agent返回中风险 -> 整体为中
  • 低(LOW):所有Agent仅返回低风险/无风险 -> 整体为低

4c. Verdict

4c. 裁决

Map overall risk to a verdict:
  • APPROVE — Overall LOW risk, no actionable findings
  • 💬 APPROVE WITH NITS — MEDIUM risk, minor suggestions that won't block merge
  • ⚠️ REQUEST CHANGES — HIGH risk, specific fixes needed before merge
  • 🚫 BLOCKED — CRITICAL risk, blocking security/data issues found
将整体风险等级映射为裁决结果:
  • 批准 — 整体低风险,无需要处理的问题
  • 💬 批准并附小修改建议 — 中风险,建议修改但不阻塞合并
  • ⚠️ 请求修改 — 高风险,合并前需要修复指定问题
  • 🚫 阻止合并 — 严重风险,发现阻塞性安全/数据问题

4d. Final report format

4d. 最终报告格式

Write the report to
QAREPORT.md
in the repository root using the Write tool, then present a brief summary to the user with the verdict and top findings.
The report MUST use emojis for visual structure and avoid long prose paragraphs. Keep everything scannable — tables, checklists, and short bullet points.
markdown
undefined
使用Write工具将报告写入仓库根目录的
QAREPORT.md
文件,然后向用户展示包含裁决和核心问题的简短摘要。
报告必须使用emoji优化视觉结构,避免长段落,使用表格、检查清单和短要点保证内容易扫描。
markdown
undefined

🔍 QA Team Review Report

🔍 QA团队审查报告

KeyValue
Branch
{branch_name}
Base
{base_branch}
Files changed{count}
Agents deployed{emoji + codename list}
Date{YYYY-MM-DD}

关键信息内容
分支
{branch_name}
基础分支
{base_branch}
变更文件数{count}
部署Agent{emoji + 代号列表}
日期{YYYY-MM-DD}

📋 Summary

📋 摘要

{2-4 bullet points: what was changed and why. No long paragraphs.}
{2-4个要点:描述变更内容和变更原因,不要长段落}

Key findings

核心发现

  • {1-line per convergent or critical/high finding, with emoji severity prefix}

  • {每个收敛问题或严重/高风险问题占1行,带严重等级emoji前缀}

🏁 Verdict

🏁 裁决

{emoji} {APPROVE / APPROVE WITH NITS / REQUEST CHANGES / BLOCKED}
{1-2 sentences explaining the verdict. Reference the top blocking items if not approving.}

{emoji} {批准 / 批准并附小修改建议 / 请求修改 / 阻止合并}
{1-2句话解释裁决原因,如果未批准,说明核心阻塞项}

👥 Agent summaries

👥 Agent审查摘要

AgentRiskSummary
🔒 security{risk emoji + level}{1-2 sentence summary from agent}
🗄️ database{risk emoji + level}{1-2 sentence summary from agent}
🔄 reliability{risk emoji + level}{1-2 sentence summary from agent}
⚡ performance{risk emoji + level}{1-2 sentence summary from agent}
🎨 frontend{risk emoji + level}{1-2 sentence summary from agent}
🔗 compatibility{risk emoji + level}{1-2 sentence summary from agent}
📊 data-integrity{risk emoji + level}{1-2 sentence summary from agent}
✏️ copy{risk emoji + level}{1-2 sentence summary from agent}
🧑‍💻 generalist-a{risk emoji + level}{1-2 sentence summary from agent}
🕵️ generalist-b{risk emoji + level}{1-2 sentence summary from agent}
(Only include rows for agents that were deployed.)
Note: ✏️ copy findings are always non-blocking nits. 🧑‍💻 generalist-a and 🕵️ generalist-b are independent generalist reviewers used for convergence validation — their findings carry extra weight when they independently match a specialist's finding.
Risk emojis: 🔴 CRITICAL, 🟠 HIGH, 🟡 MEDIUM, 🟢 LOW, ⚪ NONE

Agent风险等级摘要
🔒 安全{风险emoji + 等级}{1-2句Agent输出的摘要}
🗄️ 数据库{风险emoji + 等级}{1-2句Agent输出的摘要}
🔄 可靠性{风险emoji + 等级}{1-2句Agent输出的摘要}
⚡ 性能{风险emoji + 等级}{1-2句Agent输出的摘要}
🎨 前端{风险emoji + 等级}{1-2句Agent输出的摘要}
🔗 兼容性{风险emoji + 等级}{1-2句Agent输出的摘要}
📊 数据完整性{风险emoji + 等级}{1-2句Agent输出的摘要}
✏️ 文案{风险emoji + 等级}{1-2句Agent输出的摘要}
🧑‍💻 通用评审A{风险emoji + 等级}{1-2句Agent输出的摘要}
🕵️ 通用评审B{风险emoji + 等级}{1-2句Agent输出的摘要}
(仅展示实际部署的Agent对应的行)
注意: ✏️ 文案Agent的发现始终为非阻塞性小问题。🧑‍💻 通用评审A和 🕵️ 通用评审B是用于收敛验证的独立通用评审员,如果他们的发现和专家Agent的发现独立匹配,可信度更高。
风险emoji对应:🔴 严重, 🟠 高, 🟡 中, 🟢 低, ⚪ 无风险

📝 Findings

📝 问题详情

Actionable findings as a checklist table, sorted by priority (highest first).
Each row is a checklist item. The
Status
column starts as
⬜ Open
. Use convergence markers when 2+ agents flagged the same issue.
#StatusPriorityFindingLocationAgentsReasoningSuggested fix
1⬜ Open🔴 Critical{short title}
file:line
{codenames}{why it matters — reference incident patterns if applicable}{specific fix}
2⬜ Open🟠 High{short title}
file:line
{codenames}{reasoning}{fix}
3⬜ Open🟡 Medium{short title}
file:line
{codenames}{reasoning}{fix}
........................
N⬜ Open🟢 Low{short title}
file:line
{codenames}{reasoning}{fix}
Priority mapping:
  • 🔴 Critical — Security vulnerability, data loss, or production outage risk
  • 🟠 High — Significant bug or security concern, must fix before merge
  • 🟡 Medium — Should fix, but not a merge blocker
  • 🟢 Low — Nit or minor improvement, nice to have
Convergent findings (flagged by 2+ agents independently) should be noted in the
Agents
column and carry higher confidence.
undefined
可处理的问题以检查清单表格展示,按优先级从高到低排序。
每行是一个检查项,
状态
列初始为
⬜ 待处理
。如果是2个及以上Agent独立标记的收敛问题,需要标注。
序号状态优先级问题简述位置关联Agent影响说明建议修复方案
1⬜ 待处理🔴 严重{短标题}
文件:行号
{代号列表}{影响说明,如有相关事件模式可引用}{具体修复方案}
2⬜ 待处理🟠 高{短标题}
文件:行号
{代号列表}{影响说明}{修复方案}
3⬜ 待处理🟡 中{短标题}
文件:行号
{代号列表}{影响说明}{修复方案}
........................
N⬜ 待处理🟢 低{短标题}
文件:行号
{代号列表}{影响说明}{修复方案}
优先级说明:
  • 🔴 严重 — 存在安全漏洞、数据丢失或生产故障风险
  • 🟠 高 — 存在重大Bug或安全隐患,合并前必须修复
  • 🟡 中 — 建议修复,但不阻塞合并
  • 🟢 低 — 小瑕疵或微小优化,可选择性修复
收敛问题(被2个及以上Agent独立标记的问题)需要在
关联Agent
列标注,可信度更高。
undefined

Reference Files

参考文件

Persona Definitions

角色定义

  • references/personas.md
    -- Full persona descriptions, context, and review checklists for specialist agents (not used for generalists — they have their own prompts)
  • references/personas.md
    — 专家Agent的完整角色描述、上下文和审查检查清单(通用Agent不使用,有独立提示词)

Incident Patterns

事件模式

  • references/incident-patterns.md
    -- Synthesized failure patterns from production incidents, used to ground specialist agent reviews in real-world failure modes (not used for generalists)
  • references/incident-patterns.md
    — 从生产事件中总结的故障模式,用于让专家Agent的审查基于真实故障场景(通用Agent不使用)