dev-tdd

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

dev-tdd

dev-tdd

Overview

概述

No production code without a failing test first.
This is a discipline overlay, not a workflow step. It fires during ANY coding task - whether you're working from a plan, fixing a bug, adding a feature, or refactoring. There is no entry gate and no exit gate. If you're writing code, TDD rules are active.
The cycle is simple. The discipline is what matters.
没有先编写失败的测试就不能写生产代码。
这是一个规范约束层,而非工作流步骤。它会在所有编码任务中生效——不管你是在按计划开发、修复bug、新增功能还是重构代码,没有启动和退出的门槛。只要你在写代码,TDD规则就处于激活状态。
周期很简单,关键在于遵守规范。

Announcement

启用声明

When this skill is active, announce:
"I'm using enggenie:dev-tdd to enforce test-driven development."

当此技能激活时,请声明:
"我正在使用 enggenie:dev-tdd 来强制落实测试驱动开发规范。"

Hard Rule: No Production Code Without a Failing Test First

硬性规则:没有先编写失败的测试就不能写生产代码

If you catch yourself writing production code before a test: stop. Delete it. Write the test first.
No exceptions. No "just this once." No "I'll circle back." The test comes first or the code doesn't get written.

如果你发现自己在写测试前就写了生产代码:立刻停下。删掉生产代码,先写测试。
没有例外,没有“就这一次”,没有“我后面再补”。要么先写测试,要么就不要写代码。

Violating the Letter of TDD Is Violating the Spirit

违反TDD的字面规则就是违反其核心精神

There is no distinction between the letter and the spirit of TDD. The spirit IS the letter.
Writing a test that you know will pass is not TDD. Writing two tests before implementing is not TDD. Writing production code and then backfilling a test is not TDD. These aren't "close enough." They're a different practice entirely.
The value of TDD comes from the discipline of the cycle. Skip a step and you lose the feedback loop that makes it work.

TDD的字面规则和核心精神没有区别,精神就是规则本身。
编写你明知会通过的测试不是TDD,实现功能前先写两个测试不是TDD,写完生产代码再补测试也不是TDD。这些都不算“差不多符合要求”,而是完全不同的实践方式。
TDD的价值来自周期的规范性,跳过任何一步你都会失去使其生效的反馈循环。

The Cycle: RED -> GREEN -> REFACTOR

周期:RED -> GREEN -> REFACTOR

Every piece of new behavior follows this cycle. Every time. No shortcuts.
每一段新功能的实现都要遵循这个周期,每次都要遵守,没有捷径。

RED - Write a Failing Test

RED(红)—— 编写失败的测试

  1. Write ONE minimal test that describes the next piece of behavior.
  2. Run the test.
  3. It MUST fail (not error - fail). A compilation error or import error is not RED. RED means the test ran and the assertion failed.
  4. Read the failure message. It should describe the missing behavior clearly. If the failure message is confusing, fix the test before proceeding.
Verification: You saw a clear, expected failure message. The test ran. The assertion failed for the right reason.
  1. 编写一个最小化测试,描述下一段要实现的功能。
  2. 运行测试。
  3. 必须是失败状态(不是报错,是失败)。编译错误或导入错误不算RED,RED指测试正常运行但断言不通过。
  4. 阅读失败提示,它应该清晰描述缺失的功能。如果失败提示令人困惑,先优化测试再继续。
验证标准: 你看到了清晰符合预期的失败提示,测试正常运行,断言因为正确的原因不通过。

GREEN - Make It Pass

GREEN(绿)—— 让测试通过

  1. Write the simplest code that makes the failing test pass. Not the clever code. Not the complete code. The simplest.
  2. Run the test. It MUST pass.
  3. Run ALL tests. They MUST all pass. If something else broke, fix it before moving on.
Verification: The new test passes. All existing tests still pass. You wrote minimal code - nothing beyond what the test demanded.
  1. 编写最简单的代码让失败的测试通过,不要写巧妙的代码,不要写完整的代码,只要最简单的能通过测试的代码。
  2. 运行测试,它必须通过。
  3. 运行所有测试,它们必须全部通过。如果有其他测试挂了,先修复再继续。
验证标准: 新测试通过,所有已有测试仍然通过,你只写了最少的代码——没有超出测试要求的多余内容。

REFACTOR - Clean Up

REFACTOR(重构)—— 清理优化

  1. Look at the code you just wrote. Look at the test you just wrote. Is there duplication? Poor naming? Unnecessary complexity?
  2. Clean it up. Extract. Rename. Simplify.
  3. Run ALL tests after every change. They MUST stay green.
  4. No new behavior during refactor. If you want new behavior, start a new RED.
Verification: Tests are still green. Code is cleaner. No new behavior was added.
Then start the cycle again.

  1. 检查你刚写的代码和测试,有没有重复?命名不好?不必要的复杂度?
  2. 清理优化,提取公共逻辑,重命名,简化代码。
  3. 每次修改后都运行所有测试,它们必须保持全绿状态。
  4. 重构阶段不要新增功能,如果你想要新功能,开启新的RED阶段。
验证标准: 测试仍然全绿,代码更整洁,没有新增任何功能。
然后重新开始下一个周期。

Worked Example: Bug Fix with TDD

实例:用TDD修复Bug

Scenario: Email validation accepts "user@" as valid.
RED - Write the failing test:
python
def test_rejects_email_without_domain():
    assert validate_email("user@") == False
Run:
pytest tests/test_email.py::test_rejects_email_without_domain
FAILED - assert True == False
Good. The test fails because the current code does not check for a domain.
GREEN - Write the simplest fix:
python
def validate_email(email: str) -> bool:
    if "@" not in email:
        return False
    local, domain = email.rsplit("@", 1)
    return len(local) > 0 and len(domain) > 0
Run:
pytest tests/test_email.py
4 passed
All tests pass. Do NOT add more validation yet.
REFACTOR - Clean up while tests stay green: No refactoring needed - the code is already clean.

Good test vs Bad test:
AspectGoodBad
Name
test_rejects_email_without_domain
test_email_validation
Assertion
assert validate_email("user@") == False
assert result is not None
ScopeTests ONE behaviorTests multiple behaviors
Failure message"assert True == False" tells you what broke"AssertionError" tells you nothing

场景: 邮箱校验把"user@"判定为合法地址。
RED(红)—— 编写失败的测试:
python
def test_rejects_email_without_domain():
    assert validate_email("user@") == False
运行:
pytest tests/test_email.py::test_rejects_email_without_domain
FAILED - assert True == False
很好,测试失败了,因为当前代码没有校验域名部分。
GREEN(绿)—— 编写最简单的修复代码:
python
def validate_email(email: str) -> bool:
    if "@" not in email:
        return False
    local, domain = email.rsplit("@", 1)
    return len(local) > 0 and len(domain) > 0
运行:
pytest tests/test_email.py
4 passed
所有测试通过,现在不要加更多校验逻辑。
REFACTOR(重构)—— 保持测试全绿的前提下清理代码: 不需要重构,代码已经很整洁了。

好测试 vs 坏测试:
维度好测试坏测试
命名
test_rejects_email_without_domain
test_email_validation
断言
assert validate_email("user@") == False
assert result is not None
范围仅测试一个功能测试多个功能
失败提示"assert True == False" 明确告诉你哪里出了问题"AssertionError" 没有任何有效信息

The Shortcut Tax

捷径税

Every shortcut has a cost. Here's what you're actually paying.
ShortcutWhat it costs you
"I'll write tests after"Tests pass immediately - you've proved nothing. Bugs ship.
"Too simple to test"Simple code breaks. 30 seconds to test. 30 minutes to debug in prod.
"Already manually tested"No record. Can't re-run. You'll re-test every change by hand forever.
"TDD slows me down"TDD is faster than debugging. Systematic beats ad-hoc.
"Just this once"That's what you said last time. Discipline compounds.
"Keep the code as reference"You'll adapt it instead of writing tests first. Delete means delete.
"Need to explore first"Fine. Explore. Then throw it away. Start fresh with TDD.
"The test is hard to write"Hard to test = hard to use = bad design. Simplify the interface.
"Tests after achieve same goals"After answers "what does this do?" First answers "what should it do?"
"Existing code has no tests"You're improving it. Add tests for what you touch.
"I already see the problem"Seeing symptoms does not equal understanding root cause. Write the test.

每走一次捷径都有代价,以下是你实际要付出的成本:
捷径你要付出的代价
"我后面再补测试"测试一写就过,你什么都验证不了,Bug会发到生产环境
"逻辑太简单不需要测试"简单代码也会出问题,写测试花30秒,生产环境排查bug要花30分钟
"我已经手动测试过了"没有记录,无法重复运行,以后每次改代码你都要手动重新测一遍
"TDD拖慢我的速度"TDD比调试快得多,系统性的方法永远好过临时凑合
"就这一次例外"你上次也是这么说的,自律是复利效应
"保留代码当参考"你会直接改这段代码而不是先写测试,删掉就是彻底删掉
"我需要先探索下实现思路"可以,探索完把代码全删了,用TDD重新写
"测试太难写了"难测试=难用=设计不好,简化接口
"后补测试也能达到同样的效果"后补测试回答“这段代码做了什么”,先写测试回答“这段代码应该做什么”
"现有代码没有测试"你正在优化它,给你改动的部分加测试
"我已经知道问题在哪了"看到症状不等于理解根本原因,先写测试

Deep Rebuttals -- When the Shortcut Tax Table Is Not Enough

深度反驳——当捷径税表格还不够有说服力时

The table above is quick reference. Below are the full arguments for the 5 most dangerous rationalizations. Use these when you (or the user) need the complete reasoning.
上面的表格是快速参考,以下是对5种最危险的侥幸心理的完整论证,当你(或者用户)需要完整逻辑时可以使用。

"I'll write tests after to verify it works"

"我后面补测试也能验证功能正常"

Tests written after code pass immediately. Passing immediately proves nothing:
  • You might test the wrong thing (testing what you built, not what was required)
  • You might test implementation details instead of behavior
  • You missed edge cases you forgot about during implementation
  • You never saw the test catch the bug -- so you cannot trust it catches anything
Test-first forces you to see the test fail, proving it actually tests something. A test that never failed is a test you cannot trust.
写完代码再补的测试一跑就过,直接通过证明不了任何东西:
  • 你可能测错了东西(测试你写出来的内容,而不是要求实现的内容)
  • 你可能测试的是实现细节而非功能本身
  • 你会漏掉实现过程中忘记的边界 case
  • 你从来没见过这个测试抓到Bug,所以你根本没法信任它能抓到任何问题
测试优先的模式强制你先看到测试失败,证明这个测试真的能检测到问题。从来没有失败过的测试是不值得信任的测试。

"I already manually tested all the edge cases"

"我已经手动测过所有边界case了"

Manual testing is ad-hoc. You think you tested everything but:
  • There is no record of what you tested
  • You cannot re-run the same tests when code changes next week
  • Under pressure, you forget cases
  • "It worked when I tried it" is anecdote, not evidence
Automated tests are systematic. They run the same way every time. They catch regressions you forgot to re-test. They document what was verified.
手动测试是临时随机的,你以为你测了所有内容,但:
  • 没有任何你测过什么的记录
  • 下周代码改动时你没法重新运行同样的测试
  • 压力大的时候你会忘记一些case
  • “我试的时候是好的”是个例,不是证据
自动化测试是系统性的,每次运行方式都完全一致,它们会抓到你忘了重新测的回归问题,它们就是已经验证过的内容的文档。

"Deleting X hours of work is wasteful"

"删掉已经做了X小时的工作太浪费了"

Sunk cost fallacy. The time is already spent. Your choice now:
  • Delete and rewrite with TDD (X more hours, but high confidence the code works)
  • Keep it and add tests after (30 minutes, but low confidence -- likely bugs hiding)
The "waste" is keeping code you cannot trust. Working code without real tests is technical debt that compounds every sprint. The faster path is the disciplined path.
这是沉没成本谬误,时间已经花出去了,你现在的选择是:
  • 删掉用TDD重写(再花X小时,但对代码能正常运行有很高的信心)
  • 保留代码后面补测试(花30分钟,但信心很低——很可能藏着Bug)
真正的“浪费”是保留你不信任的代码,没有有效测试的可运行代码是技术债务,每个迭代都会利滚利。更快的路径就是守规矩的路径。

"TDD is dogmatic -- being pragmatic means adapting"

"TDD太教条了,务实就应该灵活调整"

TDD IS pragmatic:
  • Finds bugs before commit (faster than debugging after deployment)
  • Prevents regressions (tests catch breaks immediately, not in QA or production)
  • Documents behavior (tests show how to use the code -- they are living documentation)
  • Enables fearless refactoring (change anything, tests catch breaks)
"Pragmatic" shortcuts lead to debugging in production, which is slower, more expensive, and more stressful. Discipline IS pragmatism.
TDD本身就是务实的:
  • 在提交前就发现Bug(比部署后再调试快得多)
  • 预防回归问题(测试会立刻抓到改动导致的问题,而不是到QA或者生产环境才发现)
  • 是功能的文档(测试展示了怎么用代码——它们是活的文档)
  • 让你可以放心重构(随便改,测试会抓到问题)
“务实”的捷径最后会导致你在生产环境调试,更慢、更贵、压力更大。守规矩就是务实。

"Tests after achieve the same goals -- it's the spirit, not the ritual"

"后补测试也能达到同样的目标——重要的是精神不是形式"

No. Tests-after answer "What does this code do?" Tests-first answer "What should this code do?"
Tests-after are biased by your implementation. You test what you built, not what was required. You verify the edge cases you remembered, not the ones you would have discovered by writing the test first.
Tests-first force edge case discovery BEFORE implementation. You think about failure modes while designing the interface, not while verifying the output. 30 minutes of tests-after gives you coverage. TDD gives you proof that the tests actually work.

不对,后补测试回答“这段代码做了什么?”,先写测试回答“这段代码应该做什么?”。
后补测试会被你的实现思路带偏,你测试的是你已经写出来的内容,而不是要求实现的内容。你只会验证你记得的边界case,而不会发现先写测试时才能发现的边界case。
测试优先强制你在实现前就发现边界case,你在设计接口的时候就会思考失败场景,而不是验证输出的时候才想。花30分钟后补测试能给你覆盖率,TDD能给你测试确实有效的证明。

Gut Check

自查点

STOP and start over if any of these are true:
  • You wrote production code before writing a test
  • Your test passed immediately (you never saw RED)
  • You're writing multiple tests at once before implementing
  • You're "just going to quickly add" something without a test
  • You're thinking "this case is different because..."
  • You're keeping deleted code "as reference"
  • You wrote tests after implementation → Delete the code. Write the test. Watch it fail. Rewrite.
  • You can't explain why a test failed → You don't understand the code. Investigate before proceeding.
  • You're thinking "I already spent X hours on this code, deleting is wasteful" → Sunk cost fallacy. Delete it. TDD code is faster to rewrite than debug.
  • You're thinking "TDD is dogmatic, I'm being pragmatic" → Pragmatic means following processes that work. TDD works. Skipping it is not pragmatic, it's reckless.
If you hit any of these: stop. Delete the production code. Go back to RED.

如果出现以下任何一种情况,停下全部工作重新开始:
  • 你在写测试前就写了生产代码
  • 你的测试一写就直接通过了(你从来没见过RED状态)
  • 你在实现前一次性写了多个测试
  • 你打算“快速加个东西”不写测试
  • 你在想“这个情况不一样因为……”
  • 你把删掉的代码留着“当参考”
  • 你在实现完代码后补的测试 → 删掉代码,写测试,看它失败,再重写代码
  • 你解释不了测试为什么失败 → 你不懂这段代码,先搞清楚再继续
  • 你在想“我已经在这段代码上花了X小时,删掉太浪费了” → 沉没成本谬误,删掉它。用TDD重写比调试快得多
  • 你在想“TDD太教条了,我这是务实” → 务实就是用有效的流程,TDD是有效的,跳过它不是务实,是鲁莽
如果你碰到以上任何一种情况:停下,删掉生产代码,回到RED阶段。

Exceptions (Require Explicit User Permission)

例外情况(需要用户明确许可)

These are the ONLY acceptable reasons to skip TDD. Each requires the user to explicitly say "skip TDD for this":
  • Throwaway prototypes - Code that will be deleted before merge. Not "might be deleted" - WILL be deleted.
  • Generated code - Auto-generated files (migrations, scaffolds, codegen output). Not hand-written code that "feels generated."
  • Configuration files - Pure config with no logic (JSON, YAML, env files). Not config that contains conditional logic.
If you catch yourself thinking "this is basically an exception" - it's not. Ask the user.

以下是唯一可以跳过TDD的合理理由,每一种都需要用户明确说“这个场景跳过TDD”:
  • 一次性原型 - 合并前会被完全删掉的代码,不是“可能会被删”——是一定会被删。
  • 生成的代码 - 自动生成的文件(迁移文件、脚手架、代码生成工具的输出),不是你手写的“感觉像是生成的”代码。
  • 配置文件 - 没有任何逻辑的纯配置(JSON、YAML、环境变量文件),不是包含条件逻辑的配置。
如果你觉得“这基本上算是例外”——它不算,问用户。

When Stuck

遇到问题时

ProblemSolution
Don't know how to testWrite the wished-for API. Write the assertion first. Work backwards.
Test too complicatedDesign too complicated. Simplify the interface.
Must mock everythingCode too coupled. Use dependency injection. Reduce dependencies.
Test setup hugeExtract helpers. Simplify the design. If setup is painful, the API is painful.

问题解决方案
不知道怎么测试先写你期望的API,先写断言,倒推实现
测试太复杂设计太复杂,简化接口
必须Mock所有东西代码耦合度太高,用依赖注入,减少依赖
测试准备工作太多提取辅助工具,简化设计。如果准备工作很痛苦,说明API本身用起来也很痛苦

Good Tests

好测试的标准

A good test has three qualities:
Minimal - Tests one thing. One behavior. One assertion where possible. When it fails, you know exactly what broke.
Clear - The test name describes the behavior, not the implementation.
test_returns_empty_list_when_no_items
tells you more than
test_get_items
.
Shows intent - The test demonstrates the desired API. Reading the test tells you how the code should be used. It's the first consumer of your design.

好测试有三个特质:
最小化 - 只测一件事,一个功能,尽可能只有一个断言。失败的时候你能立刻知道哪里出了问题。
清晰 - 测试名描述的是功能,不是实现细节。
test_returns_empty_list_when_no_items
test_get_items
信息量大得多。
意图明确 - 测试展示了期望的API,读测试就能知道代码应该怎么用,它是你设计的第一个消费者。

Verification Checklist

验证检查清单

Before marking ANY coding task complete, verify:
  • Every new function/method has a test
  • Watched each test fail before implementing (saw RED)
  • Each test failed for the expected reason (correct failure message)
  • Wrote minimal code to pass (no speculative generality)
  • All tests pass (full suite, not just the new test)
  • Tests use real code (mocks only when unavoidable - external services, file systems, network)
  • Edge cases and error paths are covered
If any box is unchecked, you're not done.

在标记任何编码任务完成前,验证:
  • 每个新函数/方法都有测试
  • 实现前看到过每个测试失败的状态(看到过RED)
  • 每个测试都是因为预期的原因失败的(失败提示正确)
  • 写了最少的代码来通过测试(没有多余的通用设计)
  • 所有测试都通过(全量测试套件,不只是新测试)
  • 测试用的是真实代码(只有不可避免的情况才用Mock——外部服务、文件系统、网络)
  • 边界case和错误路径都覆盖了
如果有任何一项没满足,你还没做完。

Supporting References

参考资料

  • references/testing-anti-patterns.md
    - Never test mock behavior, never add test-only methods to production code, mock COMPLETE data structures not partial ones
  • references/defense-in-depth.md
    - Validate at every layer, make bugs structurally impossible through types and constraints
These references contain detailed patterns. Read them when you need specifics on test structure or validation strategy.

  • references/testing-anti-patterns.md
    - 永远不要测试Mock的行为,永远不要给生产代码加仅测试用的方法,Mock完整的数据结构而不是部分结构
  • references/defense-in-depth.md
    - 每层都做校验,通过类型和约束从结构上避免Bug
这些参考资料包含详细的模式,当你需要测试结构或者校验策略的具体说明时可以阅读。

Relationship to enggenie:dev-implement

和 enggenie:dev-implement 的关系

enggenie:dev-implement
orchestrates implementation work - it manages plans, dispatches subagents, and coordinates multi-step tasks.
enggenie:dev-tdd
is a discipline overlay that fires during ANY coding, whether or not
dev-implement
is active. It is not a step in a workflow. It is the way code gets written.
When
dev-implement
is active,
dev-tdd
's rules are enforced through the implementer subagent prompt. The subagent follows the RED-GREEN-REFACTOR cycle for every piece of code it writes.
When
dev-implement
is NOT active (ad-hoc coding, quick fixes, explorations that turn into real code),
dev-tdd
still fires. The discipline does not depend on having a plan.

enggenie:dev-implement
负责编排实现工作——它管理计划、调度子Agent、协调多步骤任务。
enggenie:dev-tdd
是一个规范约束层,在所有编码过程中都生效,不管
dev-implement
有没有激活。它不是工作流里的一个步骤,是写代码的方式。
dev-implement
激活时,
dev-tdd
的规则会通过实现子Agent的Prompt来强制落实,子Agent写的每一段代码都要遵守红-绿-重构周期。
dev-implement
没有激活时(临时编码、快速修复、探索性代码最后变成了正式代码),
dev-tdd
仍然生效,规范的执行不依赖于有没有计划。

Recommended Model

推荐模型

Primary: sonnet Why: TDD requires balanced speed and code quality. Sonnet writes good tests and clean implementations without the latency of opus. For complex domain logic, override to opus.
This is a recommendation. Ask the user: "Confirm model selection or override?" Do not proceed until the user responds.

首选: sonnet 原因: TDD需要平衡速度和代码质量,Sonnet能写出好的测试和整洁的实现,没有Opus那么高的延迟。如果是复杂的领域逻辑,换成Opus。
这是推荐配置,问用户:“确认模型选择还是要替换?”,用户回复前不要继续。

Entry / Exit

启动/退出

Entry: None. This skill fires during any coding task. No trigger required.
Exit: None. This is a discipline overlay, not a workflow step. It remains active as long as code is being written.
启动: 无,这个技能在所有编码任务中都生效,不需要触发条件。
退出: 无,这是规范约束层,不是工作流步骤。只要还在写代码,它就一直激活。