test-smell-detection
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTest Smell Detection
测试异味检测(Test Smell Detection)
Deep formal audit of test code using an academic test smell taxonomy. Detects symptoms of bad design or implementation decisions that make tests harder to understand, more fragile, less effective at catching bugs, or more expensive to maintain. Produces a severity-ranked report with specific locations and actionable fixes.
基于学术Test Smell分类体系的测试代码深度正式审计。可检测不良设计或实现决策的症状,这些症状会导致测试更难理解、更脆弱、捕获Bug的效率更低或维护成本更高。生成包含具体位置和可操作修复方案的严重程度排名报告。
Why Test Smells Matter
为什么Test Smell至关重要
Test smells erode confidence in a test suite and inflate maintenance costs:
| Problem | Consequence |
|---|---|
| Tests with conditional logic | Some paths never execute — hidden testing gaps |
| Tests that depend on external resources | Flaky failures, slow execution, environment coupling |
| Tests that sleep to wait for results | Non-deterministic timing, slow suites, false failures |
| Tests without assertions | False confidence — coverage looks good but nothing is verified |
| Tests that call many production methods | Hard to diagnose failures, unclear what's being tested |
| Tests with magic numbers | Unreadable intent, unclear boundary conditions |
| Tests relying on ToString for comparison | Brittle to formatting changes, obscure failure messages |
| Tests with exception handling logic | Swallowed failures, tests that pass when they shouldn't |
Test Smell会削弱对测试套件的信心,并增加维护成本:
| 问题 | 后果 |
|---|---|
| 包含条件逻辑的测试 | 部分路径从未执行——隐藏测试漏洞 |
| 依赖外部资源的测试 | 不稳定的失败、执行缓慢、环境耦合 |
| 通过休眠等待结果的测试 | 非确定性时序、套件执行缓慢、误报失败 |
| 无断言的测试 | 虚假信心——覆盖率看似良好但未验证任何内容 |
| 调用多个生产方法的测试 | 难以诊断失败,不清楚测试的目标 |
| 包含魔术数字的测试 | 意图难以理解,边界条件不明确 |
| 依赖ToString进行比较的测试 | 对格式变更敏感,失败信息模糊 |
| 包含异常处理逻辑的测试 | 失败被掩盖,本应失败的测试却通过 |
When to Use
适用场景
- User asks for a comprehensive or formal test smell audit
- User asks "are my tests well-written?" and wants a thorough analysis
- User wants a test quality health check with academic rigor
- User asks for a review of test design or structure using standard smell categories
- User suspects tests are fragile, flaky, or giving false confidence and wants a deep investigation
- 用户要求进行全面或正式的Test Smell审计
- 用户询问“我的测试写得好吗?”并希望得到深入分析
- 用户希望获得具备学术严谨性的测试质量健康检查
- 用户要求使用标准异味类别审查测试设计或结构
- 用户怀疑测试存在脆弱、不稳定或虚假信心问题,需要深入调查
When Not to Use
不适用场景
- User wants a quick pragmatic test review (use — faster, covers the most common issues)
test-anti-patterns - User wants to evaluate assertion diversity specifically (use )
assertion-quality - User wants to find duplicated boilerplate across tests (use )
exp-test-maintainability - User wants to write new tests from scratch (help them directly)
- User wants to fix a specific failing test (diagnose and fix directly)
- 用户希望进行快速务实的测试审查(使用——速度更快,覆盖最常见问题)
test-anti-patterns - 用户希望专门评估断言多样性(使用)
assertion-quality - 用户希望查找测试间重复的样板代码(使用)
exp-test-maintainability - 用户希望从头编写新测试(直接提供帮助)
- 用户希望修复特定失败的测试(直接诊断并修复)
Inputs
输入项
| Input | Required | Description |
|---|---|---|
| Test code | Yes | One or more test files or a test project directory to analyze |
| Production code | No | The code under test, for context on whether patterns are justified |
| 输入 | 是否必填 | 描述 |
|---|---|---|
| 测试代码 | 是 | 要分析的一个或多个测试文件,或测试项目目录 |
| 生产代码 | 否 | 被测代码,用于判断某些模式是否合理 |
Workflow
工作流程
Step 1: Gather the test code
步骤1:收集测试代码
Read all test files the user provides. If the user points to a directory or project, scan for all test files by looking for test framework markers — see the skill for .NET-specific markers.
dotnet-test-frameworksFor a thorough audit, also consult the extended smell catalog which covers 9 additional smell types beyond the core 10 below.
读取用户提供的所有测试文件。如果用户指向目录或项目,通过查找测试框架标记来扫描所有测试文件——有关.NET特定标记,请参考技能。
dotnet-test-frameworks如需进行彻底审计,还可查阅扩展异味目录,其中涵盖了以下核心10种之外的9种额外异味类型。
Step 2: Scan for test smells
步骤2:扫描Test Smell
For each test method and class, check for the following smell categories:
针对每个测试方法和类,检查以下异味类别:
Smell 1: Conditional Test Logic
异味1:条件测试逻辑(Conditional Test Logic)
Test methods containing , , , ternary (), , , or statements. Control flow in tests means some paths may never execute, hiding gaps.
ifelseswitch? :forforeachwhileSeverity: High
Detection: Any control flow statement inside a test method body.
Exception: used solely to assert every item in a known collection is acceptable when the assertion is the loop body.
foreach包含、、、三元运算符()、、或语句的测试方法。测试中的控制流意味着部分路径可能从未执行,从而隐藏漏洞。
ifelseswitch? :forforeachwhile严重程度: 高
检测方式: 测试方法体内存在任何控制流语句。
例外情况: 当断言是循环体时,仅用于断言已知集合中每个项的是可接受的。
foreachSmell 2: Mystery Guest
异味2:神秘依赖(Mystery Guest)
Tests that depend on external resources — files on disk, databases, network endpoints, environment variables — without making the dependency explicit or using test doubles.
Severity: High
Detection: Test methods that read files, open database connections, make HTTP requests (without a test handler), read environment variables, or use hard-coded file paths.
Exception: In-memory fakes or test-specific handlers are fine.
依赖外部资源——磁盘文件、数据库、网络端点、环境变量——但未明确声明依赖或使用测试替身的测试。
严重程度: 高
检测方式: 读取文件、打开数据库连接、发送HTTP请求(未使用测试处理器)、读取环境变量或使用硬编码文件路径的测试方法。
例外情况: 内存中的伪造实现(fake)或测试专用处理器是允许的。
Smell 3: Sleepy Test
异味3:休眠测试(Sleepy Test)
Tests that call sleep or delay functions to wait for a condition. These introduce non-deterministic timing and slow down the suite.
Severity: High
Detection: Calls to sleep/delay functions inside test methods. See the skill for .NET-specific patterns.
dotnet-test-frameworks调用休眠或延迟函数等待条件的测试。这些测试会引入非确定性时序并降低套件执行速度。
严重程度: 高
检测方式: 测试方法体内存在休眠/延迟函数调用。有关.NET特定模式,请参考技能。
dotnet-test-frameworksSmell 4: Assertion-Free Test (Unknown Test)
异味4:无断言测试(Assertion-Free Test,又称Unknown Test)
Tests that execute code but never assert anything. Test frameworks report these as passing even if the code is completely broken, as long as no exception is thrown.
Severity: High
Detection: A test method with no assertion calls (framework-specific: , , , , etc.) and no expected-exception annotation.
Calibration: A method named or is implicitly asserting no exception — still flag it but note it may be intentional.
Assert.*expect()assertShould**_DoesNotThrow*_NoException执行代码但从未进行断言的测试。只要不抛出异常,测试框架就会报告这些测试通过,即使代码完全损坏。
严重程度: 高
检测方式: 没有断言调用(框架特定:、、、等)且无预期异常注解的测试方法。
校准规则: 命名为或的方法隐含断言无异常——仍需标记,但需注明可能是有意设计。
Assert.*expect()assertShould**_DoesNotThrow*_NoExceptionSmell 5: Eager Test
异味5:过度测试(Eager Test)
A test method that calls many different production methods, making it unclear what behavior is being tested. When it fails, diagnosis is difficult because the failure could stem from any of the calls.
Severity: Medium
Detection: A test method that calls 4+ distinct methods on the production object (excluding setup/construction). Count unique method names, not call count.
Calibration: Integration tests or workflow tests may legitimately call multiple methods — note this as a possible exception for end-to-end scenarios.
调用多个不同生产方法的测试方法,导致不清楚测试的目标行为。当测试失败时,诊断会很困难,因为失败可能源于任何一个调用。
严重程度: 中
检测方式: 对生产对象调用4个及以上不同方法的测试方法(排除初始化/构造方法)。统计唯一方法名称,而非调用次数。
校准规则: 集成测试或工作流测试可能合理调用多个方法——需注明这是端到端场景的可能例外情况。
Smell 6: Magic Number Test
异味6:魔术数字测试(Magic Number Test)
Assertions that contain unexplained numeric literals. The intent of is unclear without context — what does 42 represent?
Assert.AreEqual(42, result)Severity: Medium
Detection: Numeric literals (other than 0, 1, -1, and the literal used in the test name) appearing as parameters in assertion methods.
Calibration: Small integers in context (like count checks where 3 items were just added) are acceptable — only flag when the number's meaning is genuinely unclear.
expectedAssert.AreEqual(3, list.Count)包含无法解释的数字字面量的断言。的意图在没有上下文的情况下并不明确——42代表什么?
Assert.AreEqual(42, result)严重程度: 中
检测方式: 数字字面量(0、1、-1及测试名称中使用的字面量除外)出现在断言方法的参数中。
校准规则: 上下文中的小整数(例如添加3个项后进行的计数检查)是可接受的——仅当数字的含义确实不明确时才标记。
expectedAssert.AreEqual(3, list.Count)Smell 7: Sensitive Equality
异味7:敏感相等性(Sensitive Equality)
Tests that use for comparison or assertion. If the implementation changes, the test breaks even though the actual behavior is correct.
ToString()ToString()Severity: Medium
Detection: , or appearing inside an assertion parameter.
Assert.AreEqual(expected, obj.ToString()).ToString()使用进行比较或断言的测试。如果实现变更,即使实际行为正确,测试也会失败。
ToString()ToString()严重程度: 中
检测方式: ,或出现在断言参数中。
Assert.AreEqual(expected, obj.ToString())ToString()Smell 8: Exception Handling in Tests
异味8:测试中的异常处理(Exception Handling in Tests)
Tests that contain / blocks or statements. This typically means the test is manually managing exceptions rather than using the framework's built-in exception assertion facilities.
trycatchthrowSeverity: Medium
Detection: / or / statements inside a test method.
Exception: blocks that capture an exception for further assertion are a lesser concern — note but don't flag as high severity.
trycatchthrowraisecatch包含/块或语句的测试。这通常意味着测试正在手动管理异常,而非使用框架内置的异常断言功能。
trycatchthrow严重程度: 中
检测方式: 测试方法体内存在/或/语句。
例外情况: 捕获异常以进行进一步断言的块问题较小——需注明但不标记为高严重程度。
trycatchthrowraisecatchSmell 9: General Fixture (Over-broad Setup)
异味9:通用夹具(General Fixture,又称Over-broad Setup)
The test setup method or constructor initializes fields that are not used by every test method. This means each test pays the cost of setting up objects it doesn't need.
Severity: Low
Detection: Fields initialized in setup that are referenced by fewer than half the test methods in the class.
测试初始化方法或构造函数初始化了并非每个测试方法都使用的字段。这意味着每个测试都要承担初始化未使用对象的成本。
严重程度: 低
检测方式: 在初始化中设置的字段,被不到一半的测试类方法引用。
Smell 10: Ignored/Disabled Test
异味10:忽略/禁用测试(Ignored/Disabled Test)
Tests marked as skipped or disabled. These add overhead and clutter, and the underlying issue they were disabled for may never be addressed.
Severity: Low
Detection: Skip/ignore annotations or conditional compilation that disables a test. See the skill for framework-specific skip attributes.
dotnet-test-frameworks标记为跳过或禁用的测试。这些测试会增加开销和混乱,且导致其被禁用的潜在问题可能永远不会得到解决。
严重程度: 低
检测方式: 跳过/忽略注解或禁用测试的条件编译。有关框架特定的跳过属性,请参考技能。
dotnet-test-frameworksStep 3: Apply calibration rules
步骤3:应用校准规则
Before reporting, calibrate findings to avoid false positives:
- Integration tests have different norms. A test class clearly marked as integration (by name, annotation, or category) legitimately uses external resources, calls multiple methods, and may use delays for async coordination. Downgrade Mystery Guest, Eager Test, and Sleepy Test severity for integration tests — note them but don't flag as problems.
- Simple loop-assert patterns are fine. Iterating a collection to assert on every item is readable and correct. Only flag loops with complex branching logic.
- Context matters for magic numbers. A count assertion right after adding a known number of items is self-documenting. Only flag numbers whose meaning requires looking at production code to understand.
- Inconclusive/pending markers are not assertion-free. Tests explicitly marked as incomplete should be flagged as Ignored Test, not Assertion-Free.
- Capture-and-assert exception patterns are borderline. Try/catch patterns that capture an exception then assert on its properties are ugly but functional. Note as a smell and suggest the framework's built-in exception assertion instead of calling it broken.
- If the test suite is clean, say so. A report finding few or no smells is perfectly valid.
在报告前,校准检测结果以避免误报:
- 集成测试有不同规范。 明确标记为集成测试(通过名称、注解或类别)的测试类合理使用外部资源、调用多个方法,且可能为异步协调使用延迟。降低集成测试中神秘依赖、过度测试和休眠测试的严重程度——注明但不标记为问题。
- 简单的循环断言模式是可接受的。 遍历集合以断言每个项是可读且正确的。仅标记包含复杂分支逻辑的循环。
- 魔术数字需结合上下文判断。 添加已知数量的项后进行的计数断言是自解释的。仅标记需要查看生产代码才能理解其含义的数字。
- 未决/待定标记不属于无断言测试。 明确标记为未完成的测试应标记为忽略测试,而非无断言测试。
- 捕获并断言异常的模式属于边缘情况。 捕获异常然后断言其属性的Try/catch模式虽不优雅但可行。标记为异味,并建议使用框架内置的异常断言,而非认为其已损坏。
- 如果测试套件干净,请明确说明。 报告中指出几乎没有或没有异味是完全合理的。
Step 4: Report findings
步骤4:报告检测结果
Present the analysis in this structure:
-
Summary Dashboard — Quick overview:
| Severity | Smell Count | Affected Tests | |----------|-------------|----------------| | High | 3 | 7 | | Medium | 2 | 4 | | Low | 1 | 2 | | Total | 6 | 13 | -
Findings by Severity — For each smell found:
- Smell name and category
- Severity level with rationale
- Affected test methods (file and method name)
- Code snippet showing the smell
- Concrete fix: show what the code should look like after remediation
- Risk if left unfixed
-
Smell-Free Patterns — If any test methods are well-written, briefly acknowledge this. Highlighting what's good helps the user understand the contrast.
-
Prioritized Remediation Plan — Rank fixes by:
- Impact (high-severity smells affecting many tests first)
- Effort (quick fixes before refactoring)
- Risk (fixes that prevent false-passes before cosmetic improvements)
按以下结构呈现分析结果:
-
摘要仪表板——快速概览:
| Severity | Smell Count | Affected Tests | |----------|-------------|----------------| | High | 3 | 7 | | Medium | 2 | 4 | | Low | 1 | 2 | | Total | 6 | 13 | -
按严重程度分类的检测结果——针对每个检测到的异味:
- 异味名称和类别
- 严重程度及理由
- 受影响的测试方法(文件和方法名称)
- 显示异味的代码片段
- 具体修复方案:展示修复后的代码示例
- 未修复的风险
-
无异味模式——如果存在编写良好的测试方法,简要认可。突出优点有助于用户理解对比。
-
优先级修复计划——按以下顺序排列修复方案:
- 影响范围(高严重程度且影响多个测试的优先)
- 修复成本(快速修复优先于重构)
- 风险(防止误通过的修复优先于 cosmetic 改进)
Validation
验证清单
- Every finding includes the specific test method name and file location
- Every finding includes a code snippet showing the smell in context
- Every finding includes a concrete fix example (not just "fix this")
- Integration tests are not penalized for patterns that are appropriate for their scope
- Simple foreach-assert loops are not flagged as conditional test logic
- Contextually obvious numbers are not flagged as magic numbers
- If the test suite is clean, the report says so upfront
- Severity levels are justified, not arbitrary
- 每个检测结果都包含具体的测试方法名称和文件位置
- 每个检测结果都包含显示上下文异味的代码片段
- 每个检测结果都包含具体的修复示例(而非仅“修复此问题”)
- 集成测试不会因适合其范围的模式而被扣分
- 简单的foreach-assert循环不会被标记为条件测试逻辑
- 上下文明确的数字不会被标记为魔术数字
- 如果测试套件干净,报告开头会明确说明
- 严重程度等级有合理依据,而非随意设定
Common Pitfalls
常见陷阱
| Pitfall | Solution |
|---|---|
| Flagging integration tests for using real resources | Check for integration test markers and adjust severity accordingly |
| Flagging loop-over-collection-assert as conditional logic | Only flag loops with branching or complex logic, not assertion iterations |
| Flagging obvious count assertions after adding N items | Consider the immediate context — self-documenting numbers are fine |
| Missing framework-specific assertion syntax | Consult the |
| Over-flagging try/catch that captures for assertion | Distinguish swallowed exceptions from capture-and-assert patterns |
| Treating skip annotations with reasons same as bare skips | Note that reasoned skips are less concerning than unexplained ones |
Flagging | These implicitly assert no exception — note but acknowledge the intent |
| 陷阱 | 解决方案 |
|---|---|
| 将使用真实资源的集成测试标记为问题 | 检查集成测试标记并相应调整严重程度 |
| 将遍历集合断言的循环标记为条件逻辑 | 仅标记包含分支或复杂逻辑的循环,而非断言迭代循环 |
| 在添加N个项后将明显的计数断言标记为问题 | 考虑直接上下文——自解释的数字是可接受的 |
| 遗漏框架特定的断言语法 | 参考 |
| 过度标记用于断言的try/catch | 区分被掩盖的异常和捕获并断言的模式 |
| 将带有原因的跳过注解与无理由跳过同等对待 | 注明有理由的跳过比无解释的跳过问题更小 |
将 | 这些测试隐含断言无异常——需注明但认可其意图 |