ai-generated-ut-code-review

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

AI UT Code Review

AI UT代码评审

Overview

概述

Review AI-generated unit tests for effectiveness, coverage, assertions, negative cases, determinism, and maintainability. Output a 0-10 score, a risk level, and a must-fix checklist. Overall line coverage must be >= 80%; otherwise risk is at least High.
评审AI生成的单元测试的有效性、覆盖率、断言、异常场景、确定性及可维护性。输出0-10分的评分、风险等级以及必修复检查清单。整体行覆盖率必须≥80%;否则风险至少为高风险。

When to Use

适用场景

  • AI-generated UT/test code review or quality evaluation
  • Need scoring, risk level, or must-fix checklist
  • Questions about coverage or assertion validity
  • AI生成的UT/测试代码评审或质量评估
  • 需要评分、风险等级或必修复检查清单时
  • 对覆盖率或断言有效性存疑时

Workflow

工作流程

  1. Confirm tests target the intended business code and key paths.
  2. Check overall line coverage (>= 80% required).
  3. Inspect assertions for behavioral validity; flag missing/ineffective assertions.
  4. Verify negative/edge cases and determinism (no env/time dependency).
  5. Score by rubric, assign risk, list must-fix items with evidence.
  1. 确认测试针对目标业务代码及关键路径。
  2. 检查整体行覆盖率(要求≥80%)。
  3. 检查断言的行为有效性;标记缺失或无效的断言。
  4. 验证异常/边界场景及确定性(无环境/时间依赖)。
  5. 根据评分标准打分,分配风险等级,列出带证据的必修复项。

Scoring (0-10)

评分标准(0-10分)

Each dimension 0-2 points. Sum = total score.
Dimension012
Coverage< 80%80%+ but shallow80%+ and meaningful
Assertion QualityNo/invalid assertionsSome weak assertionsBehavior-anchored assertions
Negative & EdgeMissingPartialComprehensive
Data & IsolationFlaky/env-dependentMixedDeterministic, isolated
MaintainabilityHard to read/modifyMixed qualityClear structure & naming
每个维度0-2分,总和为总分。
维度0分1分2分
覆盖率<80%≥80%但覆盖较浅≥80%且覆盖有意义
断言质量无断言/断言无效存在部分弱断言基于行为的断言
异常与边界场景缺失部分覆盖全面覆盖
数据与隔离性不稳定/依赖环境混合情况确定性强、隔离性好
可维护性难以阅读/修改质量参差不齐结构清晰、命名规范

Risk Levels

风险等级

  • Blocker: Coverage < 80% AND key paths untested, or tests have no meaningful assertions
  • High: Coverage < 80% OR assertions largely ineffective
  • Medium: Coverage OK but weak edge cases or fragile design
  • Low: Minor improvements
  • 阻塞级(Blocker):覆盖率<80%且关键路径未测试,或测试无有意义的断言
  • 高风险(High):覆盖率<80%或断言基本无效
  • 中风险(Medium):覆盖率达标但边界场景薄弱或设计脆弱
  • 低风险(Low):仅需少量优化

Must-Fix Checklist

必修复检查清单

  • Overall line coverage >= 80%
  • Each test has at least one behavior-relevant assertion
  • Negative/exception cases exist for core logic
  • Tests are deterministic and repeatable
  • 整体行覆盖率≥80%
  • 每个测试至少包含一个与行为相关的断言
  • 核心逻辑存在异常/异常场景测试
  • 测试具有确定性且可重复执行

AI-Generated Test Pitfalls (Check Explicitly)

AI生成测试的常见陷阱(需重点检查)

  • No assertions or assertions unrelated to behavior (e.g., only not-null)
  • Over-mocking hides real behavior
  • Only happy-path coverage
  • Tests depend on time/network/env
  • Missing verification of side effects
  • 无断言或断言与行为无关(例如仅断言非空)
  • 过度Mock掩盖真实行为
  • 仅覆盖正常流程
  • 测试依赖时间/网络/环境
  • 未验证副作用

Output Format (Required, Semi-fixed)

输出格式(必填,半固定)

  • Score
    : x/10 — Coverage x, Assertion Quality x, Negative & Edge x, Data & Isolation x, Maintainability x
  • Risk
    : Low/Medium/High/Blocker — 简述原因(1 行)
  • Must-fix
    :
    • [动作 + 证据]
    • [动作 + 证据]
  • Key Evidence
    :
    • 引用具体测试用例名或覆盖率报告摘要(1-2 条)
  • Notes
    :
    • 最小修复建议或替代方案(1-2 行)
Rules:
  • 覆盖率 < 80% 风险至少 High,并必须列入
    Must-fix
  • 无断言/无效断言直接提升风险级别,必须列入
    Must-fix
  • 至少 2 条证据;证据不足需说明并降分
  • Score
    : x/10 — 覆盖率x,断言质量x,异常与边界x,数据与隔离性x,可维护性x
  • Risk
    : 低/中/高/阻塞级 — 简述原因(1行)
  • Must-fix
    :
    • [动作 + 证据]
    • [动作 + 证据]
  • Key Evidence
    :
    • 引用具体测试用例名或覆盖率报告摘要(1-2条)
  • Notes
    :
    • 最小修复建议或替代方案(1-2行)
规则:
  • 覆盖率<80%时风险至少为高风险,且必须列入
    Must-fix
  • 无断言/无效断言直接提升风险等级,必须列入
    Must-fix
  • 至少提供2条证据;证据不足需说明并扣分

Common Mistakes

常见错误

  • 仅报告覆盖率,不评价断言有效性
  • 把日志输出当成断言
  • 忽略失败路径/异常路径
  • 仅报告覆盖率,不评价断言有效性
  • 将日志输出当作断言
  • 忽略失败路径/异常路径

Example (Concise)

示例(简洁版)

Score: 5/10 (Coverage 1, Assertion 0, Negative 1, Data 2, Maintainability 1) Risk: High Must-fix:
  • Tests for
    parseConfig()
    contain no behavior assertions (only logs)
  • No negative cases for malformed input Key Evidence:
  • parseConfig()
    tests only assert no crash
  • Coverage report shows 62% lines Notes:
  • Add assertions on outputs and side effects; add invalid input tests.
Score: 5/10(覆盖率1,断言0,异常与边界1,数据与隔离性2,可维护性1) Risk: 高风险 Must-fix:
  • parseConfig()
    的测试无行为断言(仅含日志)
  • 未针对格式错误的输入添加异常场景测试 Key Evidence:
  • parseConfig()
    的测试仅断言无崩溃
  • 覆盖率报告显示行覆盖率为62% Notes:
  • 添加针对输出及副作用的断言;添加无效输入测试。